vSphere Snapshots is an amazing technology that has enabled our customers to do so many amazing things from application lifecycle and testing to operating system updates and many other use cases. Like any technology, if it is miss-used, the benefits can quickly turn into challenge or nightmare.
As Peter Parker once said, "With great power comes great responsibility" which I think is one way to summarize VM snapshot usage 😆
I am pretty sure that every VI Admin out there has at least one story about vSphere snapshots gone wrong. Due to the convenience, ease of use and some times miss-understanding of how vSphere snapshot works, it can lead to a number of issues including filling up your storage and impacting other running workloads.
Now, imagine if you could implement a snapshot retention policy for your VM(s) based on the size of a given snapshot or maybe the number of days the snapshot has existed? Would that not be cool!?
This was actually an idea I had been thinking about and after a discussion with my colleague Michael Gasch, I realized we can easily do so with a bit of event-driven automation using our VMware Event Broker Appliance (VEBA) solution to run scheduled job for managing snapshot policies for a set of VM(s). Knative, which is the underlying backend technology that powers VEBA has an event source called PingSource, that can help with this type of use case.
Just like you have an event from vCenter Server or VMware Horizon, where VEBA subscribe to and then run a function, a PingSource is an event that is produced with a fixed payload running on a cron schedule that can then run some function or code. Using this information, I was able to build a basic PowerCLI function that can be used to manage VM snapshot retention based on age of a snapshot and/or the size of a given snapshot. This function can be deployed to monitor a set of VM(s) and can have different policies and/or schedules and a really cool capability of Knative when needing to run scheduled functions.
For more details on how to deploy and set this up in your own environment, check out the VEBA github example function. This is provided as an example, you can certainly create a more advanced function to evaluate other criteria's before deleting a given snapshot including sending a Slack or Microsoft Teams notification prior to snapshot removal.