The ability to enable virtual Fault Tolerance in nested virtual machines running in vESX(i) is not a new feature in vSphere 5, vFT has been an unsupported feature since vSphere 4 and was initially identified by Simon Gallagher. The process is exactly the same in vSphere 5 in which three virtual machine configuration options need to be configured for the virtual machine to be enabled with FT, not the vESXi VM.
replay.supported = "true"
replay.allowFT = "true"
replay.allowBTOnly = "true"
During the beta of vSphere 5, I did enable vFT but on an offline virtual machine to conserve on unnecessary compute resources. Today there was a question on the beta community around configuring vFT for vSphere 5 and I wanted to quickly validate the configurations still hold true. I ran into a interesting error when trying to enable vFT, the power on process for the secondary virtual machine failed with the following error:
This was not an error I had seen before in vSphere 4 and looking at the vmkernel and vmware.log files, I noticed the following:
2011-07-31T17:31:39.314Z| vcpu-0| [vob.vmotion.stream.keepalive.read.fail] vMotion migration [ac1e0050:1312133702562144] failed to read stream keepalive: Connection closed by remote host, possibly due to timeout
2011-07-31T17:31:39.314Z| vcpu-0| [msg.checkpoint.precopyfailure] Migration to host <> failed with error Connection closed by remote host, possibly due to timeout (0xbad003f).
2011-07-31T17:31:39.324Z| vcpu-0| Migrate: secondary failure during migration: error Connection closed by remote host, possibly due to timeout.
I tried changing the advanced option on the vESX(i) host to increase the vMotion timeout but continued to hit the same error. I decided to look more into the first error message "failed to read stream keepalive" and found an advanced ESX(i) setting called /Migrate/VMotionStreamDisable, this advanced option has been available since ESX(i) 4.x.
I decided to disable vMotion Stream and to my surprised, it allowed FT to power on the secondary virtual machine and no longer ran into that error.
Note: You may or may not run into this error message and the configuration may not be necessary. If you enable vFT on an offline VM, you should not have any issues as long as you meet the minimum Fault Tolerance requirements.
You can configure the advanced ESXi option using either esxcli or legacy esxcfg-advcfg commands:
esxcli system settings advanced set -o /Migrate/VMotionStreamDisable -i 0
esxcfg-advcfg -s 0 /Migrate/VMotionStreamDisable
It is important to understand that even though one can setup a vESX(i) hosts and test and play with some of the advanced functionality such as vMotion and FT that the actual behavior is unpredictable as these configurations are unsupported by VMware. This of course is also great feature for home labs and studying for VMware certifications such as VCP and VCAP-DCA, but that should be the extent of leveraging these unsupported configurations.
Unknown says
We were trying to setup FT across physically separate sites for disaster recovery. However, since FT only allows 1 secondary VM, we are unable to have a coinfig in which we have can have a local failover and if a site is lost then a failover to the secondary site.
We thought if we can run an FT VM within an FT VM, we may get close to what we need. Parent VM failure can be handled with a local failover and if both primary FT VM and its nested FT VM fail, the nested FT VM's secondary that is running on the secondary site will continue processing.
What do you think?
William says
If you're looking for site failover, you may want to look into something like SRM. Not quite sure I follow how a nested FT VM would even work for you nor would I rely/recommend it. This article was to show you how you could enable it for testing it out and seeing it in action and specifically for studying for certifications.
Mike says
Having difficulty getting this to work in my setup. I'm starting to think my outermost host's CPU doesn't support FT, but I'm hopeful that there's just a setting I can update in a vmx file or something. I'm running two ESXi 5.5 hosts in Workstation 11. In the summary tab for these hosts it reads "Hardware virtualization not supported by the host CPU" in the fault tolerance requirements field. The CPU is an Intel i7-4700MQ which admittedly isn't in the HCL. The i7-4700EQ is however, and the only difference I can spot between these is VT-d capability, which as I understand is not a requirement for FT. vMotion and HA both work fine, only FT is not working. Am I totally out of luck, or can I persuade this to work somehow?
Vaibhav says
When I try to power on the machine it returns an error Binary translation with Record/Replay is not supported for this guest