UPDATE 07/13/2012 – vSphere 5.0 Update 1a has just been released which resolves this issue, please take a look here for more details for the patch as this script is no longer required.
Duncan Epping recently wrote an article about Clarifying the SvMotion / VDS problem in which he describes the scenario that would impact your VMs as well as a way to remediate those impacted VMs. I would recommend you go through Duncan's article before moving any further.
The challenge now, is how to easily identify all VMs that are currently impacted by this problem in your environment? The answer is of course Automation and leveraging the vSphere API! I created the following vSphere SDK for Perl script called querySvMotionVDSIssue.pl which searches for all VMs that are connected to a VDS and checks whether or not it's expected dvPortgroup file exists in the appropriate datastore. To use the script, you just need a system with the vCLI installed or you can just use the vMA appliance.
UPDATE: The script has now been updated to support remediation for VMs connected to both a VMware VDS as well as Cisco N1KV. The solution, thanks to one of our internal engineers was to "move" the VM's dvport from one to another, all while staying within the existing dvPortgroup which will also force the creation of the .dvsdb port file. Once the dvport move has successfully completed, we will move it back to it's original dvport that it initially resided on. We no longer have to rely on creating a temporally dvPortgroup and best of all, we can now remediate both VDS and N1KV. The script now combines both the "query" and "remediation" into single script. Please take a look at the examples below on usage.
Disclaimer: This script is not officially supported by VMware, please test this in a development environment before using on production systems.
Here is a sample output of the script running in "query" mode:
Only impacted VMs will be listed in the output. To remediate, I have combined the remediation script into the query script, if you wish to remediate ALL VMs that were listed as being impacted, you can specify the –fix flag and providing the option "true". This will go ahead and remediate all impacted VMs that were listed as before.
Here is a sample output of the script running in "remediation" mode:
In the screenshot above, you may noticed a few interesting details with VM3 and VM4. If you run out of dvports in a dvPortgroup, the script will automatically increase the number of ports to satisfy the swap (max of 10 due to number of ethernet interfaces a VM can have). Once the VM has been remediated, the dvportgroup will be reconfigured to it's original configured number of ports as shown with VM3.
If you have an impacted VM that is connected to an ephemeral dvportgroup, we will not be able to remediate due to the nature of how an ephemeral binding works. You will get a message on the specific interface and you will need to manually remediate using the steps outlined by Duncan or using the "old" remediation script which will create a temporally dvPortgroup (again, this will only work for VMware VDS' only).
If you run into any issues or have questions, feel free to leave a comment.