As some of you may have heard, there is currently a known issue with NFS based datastores (includes VSA NFS datastores) after upgrading to vSphere 5.5 Update 1. The issue causes NFS datastores to disconnect and go into an APD (All Paths Down) state. VMware is currently aware of the problem and you can follow KB 2076392 for the latest updates.
While going through my Twitter stream this morning, I noticed an interesting question from fellow Blogger and friend Jase McCarty who asked the following:
I was quite surprised to hear that there were no vCenter Alarms being triggered for this issue. I decided to take a look at the KB to better understand the symptoms and see if there was anything I could do to help. From what I can tell, the only way to identify this particular problem is by looking at the logs which the KB has an example of what you would see.
Once I took a look at the logs, I knew there was at least two methods in which one could get alerts. One option would be to leverage vCenter Log Insight and create a query based on the particular string but no every customer is using Log Insight and it does require a bit of setup. The second more obvious option for me would be to key off of the VMkernel VOBs that are being generated which I have written about in the past for detecting duplicate IP Addresses for ESXi and VSAN component threshold count.
Here are the steps to create vCenter Alarm:
Step 1 - Create a new vCenter Alarm and give it a name. Select "Hosts" for Monitor and "Specific event occurring ..." for Monitor for
Step 2 - For the Trigger, you will add the following VOB entries (just copy/paste them in)
- esx.problem.storage.apd.start
- esx.problem.vmfs.nfs.server.disconnect
- esx.problem.storage.apd.timeout
Note: The alarm will activate if ANY of the VOBs are seen since it is an OR statement. It would have been nice to be able to group these together to generate the alarm
Once the alarm has been created, you will at least have a way to get notified if you are potentially affected by this problem. I would still highly recommend you subscribe to KB 2076392 for all the latest updates.