fdm

How Fast is the New vSphere 5 HA/DRS on 64 Node Cluster? FAST!

08.05.2011 by William Lam // 2 Comments

**** Disclaimer: 32 nodes is still the maximum supported configuration for vSphere 5 from VMware, this has not changed. This is purely a demonstration, use at your own risk ****

Recently while catching up on several episodes of the the weekly VMTN Community Podcast, an interesting comment was made by Tom Stephens (Sr. Technical Marketing for vSphere HA) in episode #150 regarding the size of a vSphere cluster. Tom mentioned that there was no "technical" reason a vSphere cluster could not scale beyond 32 nodes. I decided to find out for myself as this was something I had tried with vSphere 4.x and though the configuration of the cluster completed, only 32 hosts were property configured.

Here is a quick video on enabling the new HA (FDM) and DRS on a vSphere 5 cluster with 64 vESXi hosts, you should watch the entire video as it only took an astonishing 2minutes and 37seconds to complete! Hats off to the VMware HA/DRS engineering teams, you can really see the difference in the speed and performance of the new vSphere HA/DRS architecture in vSphere 5.

vSphere 5 - 64 Node Cluster from lamw on Vimeo.

BTW - If someone from VMware is watching this, what does CSI stand for? I believe this was the codename for what is now known as FDM

New vSphere 5 HA, DRS and SDRS Advanced/Hidden Options

07.21.2011 by William Lam // 7 Comments

While testing the new HA (FDM) in vSphere 5 during the beta, I had noticed a new warning message on one of the ESXi 5.0 hosts "The number of heartbeat datastores for host is 1, which is less than required: 2"

I wondered if this was something that could be disabled as long as the user was aware of this. Looking at the new availability guide, I found that two new advaned HA have been introduced relating to datastore heartbeat which is a secondary means of determining whether or not a host has been partitioned, isolated or has failed.

das.ignoreinsufficienthbdatastore - Disables configuration issues created if the host does not
have sufficient heartbeat datastores for vSphere HA. Default
value is false.
das.heartbeatdsperhost - Changes the number of heartbeat datastores required. Valid
values can range from 2-5 and the default is 2.

To disable the message, you will need to add this new advanced setting under the "vSphere HA" Advanced Options second and set the value to be true.

You then need to perform a reconfiguration of vSphere HA for this to take into effect. One method is to just disable/re-enable vSphere HA and the message is now gone. If you know you will have less than the minimal 2 datastores for heartbeating, you can configure this option when you first enable vSphere HA.

I was curious (obviously) to see if there were other advanced options and searching through the vpxd binary, I located some old and new advanced options that maybe applicable to vSphere DRS, DPM and SDRS.

Disclaimer: These options may or may not have been properly documented from my research/digging and it is most likely not supported by VMware. Please take caution if you decide to play with this advanced settings.

Setting	Description
AvgStatPeriod	Statistical sampling period in minutes
CapRpReservationAtDemand	Caps the RP entitled reservation at demand during reservation divvying
CompressDrmdumpFiles	Set to 1 to compress drmdump files & to 0 to not compress them
CostBenefit	Enable/disable the use of cost benefit metric for filtering moves
CpuActivePctThresh	Active percentage threshold above which the VM's CPU entitlement cap is increased to cluster maximum Mhz. Set it to 125 to disable this feature
DefaultDownTime	Down time (millisecs) to use for VMs w/o history (-1 -> unspecified)
DefaultMigrationTime	Migration time (secs) to use for VMs w/o history (-1 -> unspecified)
DefaultSioCapacityInIOPS	Default peak IOPS to be used for datastore with zero slope
DefaultSioDeviceIntercept	Default intercept parameter in device model for SDRS in x1000
DemandCapacityRatioTarget	unknown
DemandCapacityRatioToleranceHost	DPM/DRS: Consider recent demand history over this period for DPM power performance & DRS cost performance decisions
DumpSpace	Disk space limit in megabytes for dumping module and domain state, set to 0 to disable dumping, set to -1 for unlimited space
EnableMinimalDumping	Enable or Disable minimal dumping in release builds
EnableVmActiveAdjust	Enable Adjustment of VM Cpu Active
EwmaWeight	Weight for newer samples in exponential weighted moving averagein 1/100's
FairnessCacheInvalSec	Maximum age of the fairness cache
GoodnessMetric	Goodness metric for evaluating migration decisions
GoodnessPerStar	Maximum goodness in 1/1000 required for a 1-star recommendation
IdleTax	Idle tax percentage
IgnoreAffinityRulesForMaintenance	Ignore affinity rules for datastore maintenance mode
IgnoreDownTimeLessThan	Ignore down time less than this value in seconds
IoLoadBalancingAlwaysUseCurrent	Always use current stats for IO load balancing
IoLoadBalancingMaxMovesPerHost	Maximum number of moves from or to a datastore per round
IoLoadBalancingMinHistSecs	Minimum number of seconds that should have passed before using current stats
IoLoadBalancingPercentile	IO Load balancing default percentile to use
LogVerbose	Turn on more verbose logging
MinGoodness	Minimum goodness in 1/1000 required for any balance recommendation; if <=0, min set to abs value; if >0, min set to lessor of option & value set proportionate to running VMs, hosts, & rebal resources
MinImbalance	Minimum cluster imbalance in 1/1000 required for any recommendations
MinStarsForMandMoves	Minimum star rating for mandatory recommendations
NumUnreservedSlots	Number of unreserved capacity slots to maintain
PowerOnFakeActiveCpuPct	Fake active CPU percentage to use for initial share allocation
PowerOnFakeActiveMemPct	Fake active memory percentage to use for initial share allocation
PowerPerformanceHistorySecs	unknown
PowerPerformancePercentileMultiplier	DPM: Set percentile for stable time for power performance
PowerPerformanceRatio	DPM: Set Power Performance ratio
PowerPerformanceVmDemandHistoryNumStdDev	DPM: Compute demand for history period as mean plus this many standard deviations, capped at maximum demand observed
RawCapDiffPercent	Percent by which RawCapacity values need to differ to be signicant
RelocateThresh	Threshold in stars for relocation
RequireMinCapOnStrictHaAdmit	Make Vm power on depend on minimum capacity becoming powered on and on any recommendations triggered by spare Vms
ResourceChangeThresh	Minimum percent of resource setting change for a recommendation
SecondaryMetricWeight	Weight for secondary metric in overall metric
SecondaryMetricWeightMult	Weight multiplier for secondary metric in overall metric
SetBaseGoodnessForSpaceViolation	-1*Goodness value added for a move exceeding space threshold on destination
SetSpaceLoadToDatastoreUsedMB	If 0, set space load to sum of vmdk entitlements [default]; if 1, set space load to datastore used MB if higher
SpaceGrowthSecs	The length of time to consider in the space growth risk analysis. Should be an order of magnitude longer than the typical storage vmotion time.
UseDownTime	Enable/disable the use of downtime in cost benefit metric
UseIoSharesForEntitlement	Use vmdk IO shares for entitlement computation
UsePeakIOPSCapacity	Use peak IOPS as the capacity of a datastore
VmDemandHistorySecsHostOn	unknown
VmDemandHistorySecsSoftRules	Consider recent demand history over this period in making decisions to drop soft rules
VmMaxDownTime	Reject the moves if the predicted downTime will exceed the max (in secs) for non-FT VM
VmMaxDownTimeFT	Reject the moves if the predicted downTime will exceed the max (in Secs) for FT VM
VmRelocationSecs	Amount of time it takes to relocate a VM

As you can see the advanced/hidden options in the above table can be potentially applicable to DRS, DPM and SDRS and I have not personally tested all of the settings. There might be some interesting and possibly useful settings, one such setting is SDRS IgnoreAffinityRulesForMaintenance which ignores the affinity rules for datastore maintenance mode. To configure SDRS Advanced Options, you will need to navigate over to the "Datastore" view and edit a Storage Pod under "SDRS Automation" and selecting "Advanced Option"

There's a new mob in town, FDM MOB for ESXi 5

07.15.2011 by William Lam // 1 Comment

That's right, vSphere is not the only one with a MOB, the new FDM (Fault Domain Manager) feature also includes a MOB view on an ESXi 5.0 hosts that is part of an FDM/HA enabled cluster. I originally noticed this new URL while parsing through the systems logs an ESXi host to get a better understanding of the startup process and found this little nugget. This page contains private APIs that are currently not exposed for public consumption with respect to FDM service, please use at your own risk.

To access the FDM MOB, you will need to point your browser to the following URL:

https://[esxi5_hostname]/mobfdm

Here is a screenshot of the main summary page:

On the summary page, you have some basic information about the particular host in question, one interesting property is the "clusterState" which will be either a master or slave node, this can be useful in troubleshooting if you do not have access to vCenter

The are two interesting methods that can provide some useful information: RetrieveClusterInfo and RetrieveHostList which should be pretty self explanatory in what they will be doing.

To generate the URL for the RetrieveClusterInfo you will need to point your browser to the following URL:

https://[esxi5_hostname]/mobfdm/?moid=fdmService&method=retrieveClusterInfo

As you can see from the screenshot, it provides a summary for the particular ESXi host within the FDM cluster, including the masterID, this ID will be useful when we call the other method to identify the master node in the FDM cluster.

To generate the URL for the RetrieveHostList you will need to point your browser to the following URL:

https://[esxi5_hostname]/mobfdm/?moid=fdmService&method=retrieveHostList

This method extracts all hosts from the FDM cluster and provides quite a bit of information about each host including the hostname and also the hostID. You can now translate ID found in the last method to identify the master node of the FDM cluster.

When you login to the FDM MOB for an ESXi host that is a master node in the cluster, the page will look slightly different with even more details including all slave nodes and protected VMs within the cluster.

As you can see this can be a useful tool for quickly identifying the master and slave nodes within an FDM cluster without going to your vCenter Server.

You can also get this information within the ESXi Shell, there is a hostlist file in an XML format that you can view the same information found in the RetrieveClusterInfo method located in /etc/opt/vmware/fdm/hostlist

~ # cat /etc/opt/vmware/fdm/hostlist
host-70
FB43716F-84A5-45AD-A5BB-F08BC64148DF-14-5db552f-vcenter50-133host-205esxi50-2.primp-industries.com58:C9:81:F1:3D:A1:47:B8:7A:C0:33:93:71:3A:B9:A1:51:AD:25:51172.30.0.7300:19:bb:26:25:8e00:19:bb:26:25:7e/vmfs/volumes/664220b6-9628e4e3/vmfs/volumes/f0613bc2-56e80c59443host-70esxi50-1.primp-industries.com25:C3:FE:23:B1:DB:5C:F8:94:13:A3:CD:B0:DC:EA:51:72:F1:53:4F172.30.0.7200:1f:29:c9:48:e200:1f:29:c9:48:f8/vmfs/volumes/664220b6-9628e4e3/vmfs/volumes/f0613bc2-56e80c59443

You also get the details of RetrieveHostList and cleaner output of the FDM host using the following script: /opt/vmware/fdm/fdm/prettyPrint.sh. The script can accept three different arguments: hostlist, clusterconfig and compatlist

Here is a screenshot of the hostlist:

Here is a screenshot of the clusterconfig:

Here is screenshot the compatlist: