WilliamLam.com

  • About
    • About
    • Privacy
  • VMware Cloud Foundation
  • VKS
  • Homelab
    • Resources
    • Nested Virtualization
  • VMware Nostalgia
  • Apple

How Fast is the New vSphere 5 HA/DRS on 64 Node Cluster? FAST!

08.05.2011 by William Lam // 2 Comments

**** Disclaimer: 32 nodes is still the maximum supported configuration for vSphere 5 from VMware, this has not changed. This is purely a demonstration, use at your own risk ****

Recently while catching up on several episodes of the the weekly VMTN Community Podcast, an interesting comment was made by Tom Stephens (Sr. Technical Marketing for vSphere HA) in episode #150 regarding the size of a vSphere cluster. Tom mentioned that there was no "technical" reason a vSphere cluster could not scale beyond 32 nodes. I decided to find out for myself as this was something I had tried with vSphere 4.x and though the configuration of the cluster completed, only 32 hosts were property configured.

Here is a quick video on enabling the new HA (FDM) and DRS on a vSphere 5 cluster with 64 vESXi hosts, you should watch the entire video as it only took an astonishing 2minutes and 37seconds to complete! Hats off to the VMware HA/DRS engineering teams, you can really see the difference in the speed and performance of the new vSphere HA/DRS architecture in vSphere 5.

vSphere 5 - 64 Node Cluster from lamw on Vimeo.

BTW - If someone from VMware is watching this, what does CSI stand for? I believe this was the codename for what is now known as FDM

Categories // Uncategorized Tags // cluster, drs, ESXi 5.0, fdm, ha, vSphere 5.0

New vSphere 5 HA, DRS and SDRS Advanced/Hidden Options

07.21.2011 by William Lam // 7 Comments

While testing the new HA (FDM) in vSphere 5 during the beta, I had noticed a new warning message on one of the ESXi 5.0 hosts "The number of heartbeat datastores for host is 1, which is less than required: 2"

I wondered if this was something that could be disabled as long as the user was aware of this. Looking at the new availability guide, I found that two new advaned HA have been introduced relating to datastore heartbeat which is a secondary means of determining whether or not a host has been partitioned, isolated or has failed.

das.ignoreinsufficienthbdatastore - Disables configuration issues created if the host does not
have sufficient heartbeat datastores for vSphere HA. Default
value is false.
das.heartbeatdsperhost - Changes the number of heartbeat datastores required. Valid
values can range from 2-5 and the default is 2.

To disable the message, you will need to add this new advanced setting under the "vSphere HA" Advanced Options second and set the value to be true.

You then need to perform a reconfiguration of vSphere HA for this to take into effect. One method is to just disable/re-enable vSphere HA and the message is now gone. If you know you will have less than the minimal 2 datastores for heartbeating, you can configure this option when you first enable vSphere HA.

I was curious (obviously) to see if there were other advanced options and searching through the vpxd binary, I located some old and new advanced options that maybe applicable to vSphere DRS, DPM and SDRS.

Disclaimer: These options may or may not have been properly documented from my research/digging and it is most likely not supported by VMware. Please take caution if you decide to play with this advanced settings.

Setting Description
AvgStatPeriod Statistical sampling period in minutes
CapRpReservationAtDemand Caps the RP entitled reservation at demand during reservation divvying
CompressDrmdumpFiles Set to 1 to compress drmdump files & to 0 to not compress them
CostBenefit Enable/disable the use of cost benefit metric for filtering moves
CpuActivePctThresh Active percentage threshold above which the VM's CPU entitlement cap is increased to cluster maximum Mhz. Set it to 125 to disable this feature
DefaultDownTime Down time (millisecs) to use for VMs w/o history (-1 -> unspecified)
DefaultMigrationTime Migration time (secs) to use for VMs w/o history (-1 -> unspecified)
DefaultSioCapacityInIOPS Default peak IOPS to be used for datastore with zero slope
DefaultSioDeviceIntercept Default intercept parameter in device model for SDRS in x1000
DemandCapacityRatioTarget unknown
DemandCapacityRatioToleranceHost DPM/DRS: Consider recent demand history over this period for DPM power performance & DRS cost performance decisions
DumpSpace Disk space limit in megabytes for dumping module and domain state, set to 0 to disable dumping, set to -1 for unlimited space
EnableMinimalDumping Enable or Disable minimal dumping in release builds
EnableVmActiveAdjust Enable Adjustment of VM Cpu Active
EwmaWeight Weight for newer samples in exponential weighted moving averagein 1/100's
FairnessCacheInvalSec Maximum age of the fairness cache
GoodnessMetric Goodness metric for evaluating migration decisions
GoodnessPerStar Maximum goodness in 1/1000 required for a 1-star recommendation
IdleTax Idle tax percentage
IgnoreAffinityRulesForMaintenance Ignore affinity rules for datastore maintenance mode
IgnoreDownTimeLessThan Ignore down time less than this value in seconds
IoLoadBalancingAlwaysUseCurrent Always use current stats for IO load balancing
IoLoadBalancingMaxMovesPerHost Maximum number of moves from or to a datastore per round
IoLoadBalancingMinHistSecs Minimum number of seconds that should have passed before using current stats
IoLoadBalancingPercentile IO Load balancing default percentile to use
LogVerbose Turn on more verbose logging
MinGoodness Minimum goodness in 1/1000 required for any balance recommendation; if <=0, min set to abs value; if >0, min set to lessor of option & value set proportionate to running VMs, hosts, & rebal resources
MinImbalance Minimum cluster imbalance in 1/1000 required for any recommendations
MinStarsForMandMoves Minimum star rating for mandatory recommendations
NumUnreservedSlots Number of unreserved capacity slots to maintain
PowerOnFakeActiveCpuPct Fake active CPU percentage to use for initial share allocation
PowerOnFakeActiveMemPct Fake active memory percentage to use for initial share allocation
PowerPerformanceHistorySecs unknown
PowerPerformancePercentileMultiplier DPM: Set percentile for stable time for power performance
PowerPerformanceRatio DPM: Set Power Performance ratio
PowerPerformanceVmDemandHistoryNumStdDev DPM: Compute demand for history period as mean plus this many standard deviations, capped at maximum demand observed
RawCapDiffPercent Percent by which RawCapacity values need to differ to be signicant
RelocateThresh Threshold in stars for relocation
RequireMinCapOnStrictHaAdmit Make Vm power on depend on minimum capacity becoming powered on and on any recommendations triggered by spare Vms
ResourceChangeThresh Minimum percent of resource setting change for a recommendation
SecondaryMetricWeight Weight for secondary metric in overall metric
SecondaryMetricWeightMult Weight multiplier for secondary metric in overall metric
SetBaseGoodnessForSpaceViolation -1*Goodness value added for a move exceeding space threshold on destination
SetSpaceLoadToDatastoreUsedMB If 0, set space load to sum of vmdk entitlements [default]; if 1, set space load to datastore used MB if higher
SpaceGrowthSecs The length of time to consider in the space growth risk analysis. Should be an order of magnitude longer than the typical storage vmotion time.
UseDownTime Enable/disable the use of downtime in cost benefit metric
UseIoSharesForEntitlement Use vmdk IO shares for entitlement computation
UsePeakIOPSCapacity Use peak IOPS as the capacity of a datastore
VmDemandHistorySecsHostOn unknown
VmDemandHistorySecsSoftRules Consider recent demand history over this period in making decisions to drop soft rules
VmMaxDownTime Reject the moves if the predicted downTime will exceed the max (in secs) for non-FT VM
VmMaxDownTimeFT Reject the moves if the predicted downTime will exceed the max (in Secs) for FT VM
VmRelocationSecs Amount of time it takes to relocate a VM

As you can see the advanced/hidden options in the above table can be potentially applicable to DRS, DPM and SDRS and I have not personally tested all of the settings. There might be some interesting and possibly useful settings, one such setting is SDRS IgnoreAffinityRulesForMaintenance which ignores the affinity rules for datastore maintenance mode. To configure SDRS Advanced Options, you will need to navigate over to the "Datastore" view and edit a Storage Pod under "SDRS Automation" and selecting "Advanced Option"

Categories // Uncategorized Tags // ESXi 5.0, fdm, ha, SDRS, vSphere 5.0

There's a new mob in town, FDM MOB for ESXi 5

07.15.2011 by William Lam // 1 Comment

That's right, vSphere is not the only one with a MOB, the new FDM (Fault Domain Manager) feature also includes a MOB view on an ESXi 5.0 hosts that is part of an FDM/HA enabled cluster. I originally noticed this new URL while parsing through the systems logs an ESXi host to get a better understanding of the startup process and found this little nugget. This page contains private APIs that are currently not exposed for public consumption with respect to FDM service, please use at your own risk.

To access the FDM MOB, you will need to point your browser to the following URL:

https://[esxi5_hostname]/mobfdm

Here is a screenshot of the main summary page:

On the summary page, you have some basic information about the particular host in question, one interesting property is the "clusterState" which will be either a master or slave node, this can be useful in troubleshooting if you do not have access to vCenter

The are two interesting methods that can provide some useful information: RetrieveClusterInfo and RetrieveHostList which should be pretty self explanatory in what they will be doing.

To generate the URL for the RetrieveClusterInfo you will need to point your browser to the following URL:

https://[esxi5_hostname]/mobfdm/?moid=fdmService&method=retrieveClusterInfo

As you can see from the screenshot, it provides a summary for the particular ESXi host within the FDM cluster, including the masterID, this ID will be useful when we call the other method to identify the master node in the FDM cluster.

To generate the URL for the RetrieveHostList you will need to point your browser to the following URL:

https://[esxi5_hostname]/mobfdm/?moid=fdmService&method=retrieveHostList

This method extracts all hosts from the FDM cluster and provides quite a bit of information about each host including the hostname and also the hostID. You can now translate ID found in the last method to identify the master node of the FDM cluster.

When you login to the FDM MOB for an ESXi host that is a master node in the cluster, the page will look slightly different with even more details including all slave nodes and protected VMs within the cluster.

As you can see this can be a useful tool for quickly identifying the master and slave nodes within an FDM cluster without going to your vCenter Server.

You can also get this information within the ESXi Shell, there is a hostlist file in an XML format that you can view the same information found in the RetrieveClusterInfo method located in /etc/opt/vmware/fdm/hostlist

~ # cat /etc/opt/vmware/fdm/hostlist
host-70
FB43716F-84A5-45AD-A5BB-F08BC64148DF-14-5db552f-vcenter50-133host-205esxi50-2.primp-industries.com58:C9:81:F1:3D:A1:47:B8:7A:C0:33:93:71:3A:B9:A1:51:AD:25:51172.30.0.7300:19:bb:26:25:8e00:19:bb:26:25:7e/vmfs/volumes/664220b6-9628e4e3/vmfs/volumes/f0613bc2-56e80c59443host-70esxi50-1.primp-industries.com25:C3:FE:23:B1:DB:5C:F8:94:13:A3:CD:B0:DC:EA:51:72:F1:53:4F172.30.0.7200:1f:29:c9:48:e200:1f:29:c9:48:f8/vmfs/volumes/664220b6-9628e4e3/vmfs/volumes/f0613bc2-56e80c59443

You also get the details of RetrieveHostList and cleaner output of the FDM host using the following script: /opt/vmware/fdm/fdm/prettyPrint.sh. The script can accept three different arguments: hostlist, clusterconfig and compatlist

Here is a screenshot of the hostlist:

Here is a screenshot of the clusterconfig:

Here is screenshot the compatlist:

Categories // Uncategorized Tags // ESXi 5.0, fdm, fdmmob, mob, vSphere 5.0

Search

Thank Author

Author

William is Distinguished Platform Engineering Architect in the VMware Cloud Foundation (VCF) Division at Broadcom. His primary focus is helping customers and partners build, run and operate a modern Private Cloud using the VMware Cloud Foundation (VCF) platform.

Connect

  • Bluesky
  • Email
  • GitHub
  • LinkedIn
  • Mastodon
  • Reddit
  • RSS
  • Twitter
  • Vimeo

Recent

  • Programmatically accessing the Broadcom Compatibility Guide (BCG) 05/06/2025
  • Quick Tip - Validating Broadcom Download Token  05/01/2025
  • Supported chipsets for the USB Network Native Driver for ESXi Fling 04/23/2025
  • vCenter Identity Federation with Authelia 04/16/2025
  • vCenter Server Identity Federation with Kanidm 04/10/2025

Advertisment

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Copyright WilliamLam.com © 2025