UPDATE 07/13/2012 - vSphere 5.0 Update 1a has just been released which resolves this issue, please take a look here for more details for the patch as this script is no longer required.
In my previous article Identifying & Fixing Virtual Machines Affected By SvMotion / VDS Issue, I provided a script for users to easily identify the impacted VMs as well as a way to remediate them. Though the the issue was only temporarily fixed, as any of the remediated VMs can be re-impacted if they are Storage vMotion again (manually or automatically) by Storage DRS. This meant that users would to re-run these scripts every so often to ensure their environment is not affected by this problem.
I decided to look into a more automated and hands-off approach in which a Storage vMotion of a VM will automatically trigger the execution of the remedation script. I was able to successfully accomplish this by leveraging vCenter Alarms and running a script on the vCenter Server (Here's a cool thing I did with alarms awhile back) .
Disclaimer: This script is not officially supported by VMware, please test this in a development environment before using on production systems.
You can create the alarm at any level of the inventory hierarchy, but I would recommend placing it at least at the datacenter or cluster level. The alarm type will be for a VirtualMachine and it we use "monitor for specific events". For the trigger, we will need to use "VM migrated" and set the status to "Unset" which will not create an alarm icon when it is triggered.
You might wonder why we selected "VM migrated" versus "VM relocated" and this is actually due to the fact that a Storage vMotion starts out just like a vMotion and if you manually perform a vMotion or Storage vMotion, only this event type will be triggered. Due to this single event being triggered by two completely different operations, it has an interesting impact which we will discuss in a bit.
Next we need to create an action for this alarm which will be running a command, you will need to specify the full path to perl.exe (assuming you're using my script which is based on vSphere SDK for Perl and you will need to have vCLI installed on the vCenter Server) as well as the path to the alarm script which in this example is called alarm.pl. Also ensure you set the green->yellow action to execute once.
You will need to create the alarm.pl script on your vCenter Server and here is what it looks like:
#!/usr/bin/perl -w # William Lam # http://www.virtuallyghetto.com/ use strict; use warnings; my $scriptlocation = "C:\\querySvMotionVDSIssue.pl"; my $server = "localhost" my $username = "VC-USERNAME"; my $password = "VC-PASSWORD"; my $debug = 0; ########################### # DO NOT MODIFY PAST HERE # ########################### my $start1 = "from"; my $start2 = "to"; my $end = ","; # extract VMware env variables from alarm my $eventstring = $ENV{'VMWARE_ALARM_EVENTDESCRIPTION'}; my $vmname = $ENV{'VMWARE_ALARM_EVENT_VM'}; my @sourcehost = $eventstring =~ /$start1 (.*?)$end/; my @destinationhost = $eventstring =~ /$start2 (.*?)$end/; # Output environmental variables to see what's up if($debug) { open(FILE,">C:\\output.txt"); foreach my $key (keys %ENV) { print FILE $key . "=" . $ENV{$key} . "\n"; } close(FILE); } # if the source/destination host is the same, means we had a Storage vMotion instead of vMotion # and we execute the remediation script on the VM if($sourcehost[0] eq $destinationhost[0]) { `"$scriptlocation --server $server --username $username --password $password --vmname $vmname --fix true"`; }
You will need to fill in the script location, in this example I have all scripts stored in C:\ and you will also need to populated the credentials which will be used to execute the script.
Earlier we mentioned that both a Storage vMotion and vMotion trigger the same event and because of that, we need to be able to identify when a Storage vMotion actually happens to run the script. The alarm.pl script above will be executed when the alarm is triggered and using the VMware specific environmental variables that is populated from the vCenter Alarm, we can extract from the event description to figure out whether it was a vMotion or Storage vMotion. Once we confirm it is a Storage vMotion, we then execute our remediation script which is from my previous article.
Note: Ensure you download the latest version of of the querySvMotionVDSIssue.pl from the previous article, as it has been updated to handle single remediation and targeted for this use case.
Now to verify that our alarm is functioning as expected, we can perform a manual Storage vMotion of a VM and we should see our alarm.pl execute and then after the Storage vMotion has completed, we should see some VM reconfiguration tasks which is from our remediation script.
So there you have it, you no longer have to worry about running the script every so often to ensure your VMs are not being impacted by the SvMotion / VDS problem. Again, I would like to stress though we are able to automate this remediation, this is not a real solution and VMware is actively working on a fix for this problem.
If you have any questions, feel free to leave a comment.
Ralf says
Thanks for your script and alarm settings. The querySvMotionVDSIssue.pl script runs fine started from cli. I also see that the alarm was triggerd and the script executed Alarm ('svMotion - VDS Problem (RFG)' ran script "C:\Program Files (x86)\VMware\VMware vSphere CLI\Perl\bin\perl.exe"
D:\Skripte\vmware_svmotion_alarm.pl), but querySvMotionVDSIssue.pl shows the VM that was moved with svmotion still as 'impacted'. I don't see much information in the output.txt file, is there a way to get more information what happens when the script runs?
William says
@Ralf,
You can redirect the output of the script to a file so you can see if there's any errors. Regarding the VM that's impacted with the issue, how many DvPorts are in it's DvPortgroup & what type of binding is the DvPortgroup?
Thanks
Ralf says
I've followed http://blogs.vmware.com/networking/2011/11/vds-best-practices-rack-server-deployment-with-eight-1-gigabit-adapters.html -> Design Option 2 – Dynamic configuration with NIOC and LBT. So I've one dvs with 8 uplink ports and 3 portgroups (Mgmt., vMotion, Guests) all with LBT load balancing and all uplinks are active.
Ralf says
I guess it must be a path problem. This is the output I capure.
Can't load 'C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/site/lib/auto/XML/LibXML/Common/Common.dll' for module XML::LibXML::Common: load_file:The specified module could not be found at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/lib/DynaLoader.pm line 230.
at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/site/lib/XML/LibXML.pm line 12
Compilation failed in require at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/site/lib/XML/LibXML.pm line 12.
BEGIN failed--compilation aborted at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/site/lib/XML/LibXML.pm line 12.
Compilation failed in require at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/lib/VMware/VICommon.pm line 11.
BEGIN failed--compilation aborted at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/lib/VMware/VICommon.pm line 11.
Compilation failed in require at C:/Program Files (x86)/VMware/VMware vSphere CLI/Perl/lib/VMware/VIRuntime.pm line 15.
Compilation failed in require at D:\Skripte\querySvMotionVDSIssue.pl line 34.
BEGIN failed--compilation aborted at D:\Skripte\querySvMotionVDSIssue.pl line 34.
Ralf says
I don't get it. perl is in my system path (C:\Program Files (x86)\VMware\VMware vSphere CLI\Perl\site\bin;C:\Program Files (x86)\VMware\VMware vSphere CLI\Perl\bin;) and querySvMotionVDSIssue.pl is working fine in dos window.
William says
When the system executes the script, it probably does not have vCLI in it's default path. You can enable the debug param in the script to check the PATH variable, but I suspect it's not in your system path.
Ralf says
You were right. I didn't reboot after installing vCLI. Now after rebooting the debug output shows the correct path and teh alarm action works. I Didn't think about the path because it was working ok in dos window. Thanks for the pointer.
William says
@Ralf,
NP. I'm glad it's working now.