By now, you have probably heard about or have directly been impacted by the recent CrowdStrike software update to Microsoft Windows system causing an unprecedented global outage. I know IT administrators are working around the clock to remediate thousands if not tens of thousands of Windows systems, the current recommended remediation process from CrowdStrike is definitely painful since it requires users to go into Windows safe mode to remove the offending file. To further complex things, most organizations enable Microsoft Bitlocker, which adds additional step to the already painful manual remediation process as you now have to locate your recovery keys before you login to apply the fix.
Within hours of the CrowdStrike news, I already saw a number of inquiries from our customers and field asking if there were any automated solutions or scripts that could aide in their remediation as asking any organization to manually remediate is a non-starter with the scale of deployments for most Enterprises. While getting up to speed on the remediation steps and thinking about how our vSphere platform can help users automate, what is typically a manual task, I had a few ideas that folks might find useful.
Disclaimer: The scripts provided in this article are meant as examples, please test and adapt them based on your own environment as these have not been tested in any official capacity and the behaviors may vary from environment to environment. Please use at your own risk.
The vSphere platform has a very powerful capability, that still is not very well known, which allows users to automate sending keystrokes to a VM, which does not require VMware Tools or even a running guest operating system.
To demonstrate what an automated solution for remediating the CrowdStrike issue could look like, I have created a prototype PowerCLI script called crowdstrike-example-prototype-remediation-script.ps1 PowerCLI script, which depends on the Set-VMKeystrokes function. My setup has a Windows Server 2019 running with Bitlocker enabled and I have "simulated" the CrowdStrike directory and configuration file that should be removed as part of the remediation process. Instead of booting into safe mode, which is a bit more complex, I have decided to just boot into Windows Server 2019 installer and go into the repair mode which would allow me to apply same remediation workflow.
Below is a video that demonstrates the automation and the steps happening between the PowerCLI script and what is happening during the VM Console, there were no manual interactions:
Note: Depending on your environment and scale, you may need to adjust the various sleeps values and this should be done in a test or development environment before rolling out in a staged fashion 🙂
Thanks to Pedro who confirmed the script works as expected 😄
Thakyou so much, i allready tested and it works like a charm
— Pedro Plata (@pedroplatal) July 20, 2024
Alternatively, I also had a customer call yesterday who was working on automating their remediation for their organization and took a slightly different approach. They ended up creating a custom WindowsPE ISO that contained a script that would remove the offending CrowdStrike file and all they had to do was to mount the ISO, change the boot order from hard disk to CDROM and then the ISO would automatically handle the remediation rather than using safe mode type of environment, which I thought was quite clever!
In any case, here is an example PowerCLI snippet that would reconfigure the VM (supports VM being powered off) to mount desired ISO from vSphere Datastore and updates the boot order so that it will automatically boot from ISO rather than hard disk.
$vmName = "CrowdStrike-VM" $isoPath = "[nfs-datastore-synology] ISO/en_windows_server_version_1903_x64_dvd_58ddff4b.iso" $primaryDisk = "Hard disk 1" $vm = Get-VM $vmName $vmDevices = $vm.ExtensionData.Config.Hardware.Device $cdromDevice = $vmDevices | where {$_.getType().name -eq "VirtualCdrom"} $bootDevice = $vmDevices | where {$_.getType().name -eq "VirtualDisk" -and $_.DeviceInfo.Label -eq $primaryDisk} # Configure Boot Order to boot from ISO and then Hard Disk $cdromBootDevice = New-Object VMware.Vim.VirtualMachineBootOptionsBootableCdromDevice $diskBootDevice = New-Object VMware.Vim.VirtualMachineBootOptionsBootableDiskDevice $diskBootDevice.DeviceKey = $bootDevice.key $bootOptions = New-Object VMware.Vim.VirtualMachineBootOptions $bootOptions.bootOrder = @($cdromBootDevice,$diskBootDevice) # Mount ISO from vSphere Datastore $cdromBacking = New-Object VMware.Vim.VirtualCdromIsoBackingInfo $cdromBacking.FileName = $isoPath $deviceChange = New-Object VMware.Vim.VirtualDeviceConfigSpec $deviceChange.operation = "edit" $deviceChange.device = $cdromDevice $deviceChange.device.Backing = $cdromBacking $deviceChange.device.Connectable.StartConnected = $true $deviceChange.device.Connectable.Connected = $true $spec = New-Object VMware.Vim.VirtualMachineConfigSpec $spec.deviceChange = @($deviceChange) $spec.bootOptions = $bootOptions $task = $vm.ExtensionData.ReconfigVM_Task($spec) $task1 = Get-Task -Id ("Task-$($task.value)") $task1 | Wait-Task | Out-Null
To confirm the changes were applied successfully, you can use the vSphere UI or use the following PowerCLI snippet:
$vm = Get-VM $vmName $vm.ExtensionData.Config.BootOptions | Select BootOrder $vm | Get-CDDrive | select IsoPath
The only thing left is to simply power up the VM and then once the remediation has complete, you can reverse the operation by un-mounting the ISO and removing the boot order configuration which will default back to the original behavior of the VM.
If folks have other suggestions or workarounds that they have found useful or have implemented to help remediate the CrowdStrike issue at scale, feel free to share as these are purely potential techniques that can be used to help with the CrowdStrike issue.
Imy says
This is great but we get a minimum of there different scenarios with our servers on boot up!
William Lam says
Yea, Windows certainly doesn’t make it easy and a more advanced workflow would be to use vSphere screenshot API to convert to OCR to determine which of the screens you’re on to handle these scenarios and also having to deal with random sleeps … lots of possibilities but I’ve not personally seen what’s been observed, so again … platform can do a lot but does require some additional customization
SemoTech says
Another eye opener to stop using Windows for anything mission critical!
James says
No, an eye opener to stop deploying things without proper testing, and to have contingency plans for mission critical machines. Falcon "kernel paniced" Debian instances some months ago, it just happened that this time was on Windows and it's used in way more machines.
It boggles my mind the number of companies that don't have recovery scenarios in the event of critical machines not booting up (for whatever reason), I hope things change.
viletasteofdeath says
More like an eye opener to stop abdicating change control to cloud applications/providers. If Crowdstrike had provided a content update to on-prem IT to test in Dev, then Prod would have been unaffected.
Disasm says
What about a script to crawl the Datastores looking for the file to delete that doesn't rely on the VMs?
viletasteofdeath says
It is difficult because the crowdstrike files are within vmdks, they are not discrete files in the datastore
William Lam says
while this is accurate, if you're not using Bitlocker, you can certainly use a "bastion" Windows or even Linux VM to mount affected Windows VM's VMDK, then remove the offending file and not having to boot up the original impacted VM, so thats another method that some customers have shared that has worked for them and you can certainly scale that with Automation using vSphere API
Dillon says
William,
We did not come across this link in time to use it, however, we did use your "Automating VM keystrokes using the vSphere API & PowerCLI" to develop our own automation. This enabled us to get over 20,000 devices backup and running with minimal impact to customers. We cannot thank you enough for the work you do, and the blogs you post. Thank you!!!
-Dillon