Okay, maybe I am being a bit dramatic 😉 However, this is what crosses my mind every time I hear someone trying to query for something in the VCDB (vCenter Server Database) instead of using the vSphere API. Every so often I see a customer request asking for a specific SQL Query to find something in the VCDB. It just baffles me on why someone would want to go through such pain, not to mention this is not supported nor a recommended best practice.
Disclaimer: No kittens were harmed during the writing of this article
There are many disadvantages by going directly to the VCDB versus going through the vSphere API:
- There are no guarantees on backwards compatibility of the database schema or database views from release to release
- The data is in a very raw form, there is no abstraction which is usually provided by an API to make consumption easier
- Risk of crafting queries that could negatively impact the performance of vCenter Server
- Not officially supported by VMware, unless directed by VMware GSS
- Pulling hair out while trying to understand the internal relationships of various objects, especially with a complex system like vCenter Server
Although the vSphere data is eventually persisted in the VCDB, the main purpose of the vSphere API, like many other software APIs is to provide a well defined interface for interacting with the underlying platform. I think there are still many vSphere/System Administrators who fear the word "API". Instead of being afraid of APIs, we should all embrace it and start to better understand how they work as it is just a way of interacting with the underlying system.
I can speak from my own personal experience, the first task for me out of college and into the real world was to look into building an application impact assessment report when an vSphere HA event occurred. We knew which Virtual Machines were restarted, but we did not have a good idea of what applications were affected. I had to correlate the impacted Virtual Machines with a list of applications and application owner from our homegrown CMDB which ran on MS SQL Server. At the time, I did not know any better and I thought best way of getting this information was to go directly to the VCDB (at the time this was Virtual Center 2.5). I found myself creating super complex SQL queries and trying to reverse engineer the relationships between all the different tables to get basic information from vCenter Server. If I knew what I know now back then, I would have gone straight to learning the VMware APIs, as I could have gotten everything I needed in just a couple of lines of code. After completing the project among few others which I leverage the VCDB directly, I realized that I needed to learn this VMware API. This early lesson has greatly helped me in my career to further automate IT/VMware infrastructure and help us move towards this new world of a Software-Defined Datacenter.
The diagram below provides an overview of some of the ways the vSphere API can be consumed by customers. An example is the vSphere Web Client and the legacy vSphere C# Client, this is a UI interface that customers can use to interact with the vSphere platform which in turns uses the vSphere API. There are also a variety of CLIs and scripting/programming languages that are built for easily extracting information and performing operations against vSphere that could be as simple as one-liner. The vSphere API can also be consumed by 3rd party companies to further abstract and provide new and interesting ways of interacting with vSphere, a great example of this is the CloudPhysics Card Builder platform that makes creating custom vSphere reports a breeze.
Note: The diagram above is just an example of some of the SDK/CLI/UIs that VMware provides to our customers. This is not meant as an exhaustive list and you can find all SDKs/CLIs for vSphere here.
Hopefully with this information, customers will better understand the benefits of using the vSphere API versus going directly to the VCDB.
Andreas Diemer says
Thx a lot - I pray this since years - many kittens had to die meanwhile 😉
Johan says
William, I feel bad as we're an org that make use of our SQL DBA's to gather monthly reporting data from VCDB. I would like to change but have little knowledge when it comes to using the API. I can get thing going with PowerCli and get great assistance from the PowerCli Community.
Is there such a support community when it comes to say "pyvmomi" or is that purely up to an individual with some python skills?
Thanks for the great work.
William Lam says
Hi Johan,
There's still time to save those kittens! 🙂 Since pyvmomi is released as a Fling, there is not a dedicated VMTN Forum, but if you can always leave questions on Github project page https://github.com/vmware/pyvmomi or general API questions here https://developercenter.vmware.com/forums?id=sdk_vsphere_management
The only difference between this SDK and the rest is language specific, once you understand how the API works then it's all the same
Jimmy Gauvin says
PROS of direct sql query of VCDB:
-Getting information from Vcenter is a LOT slower than getting directly from VCDB
-Getting it with PowerCLI can take hours even with optimized query
William Lam says
When you say "slower" is this through the use of PowerCLI? The reason I ask is often times customers use PowerCLI to perform reporting and don't realize their queries are not optimized and even in the case that they are, it may not always be the most efficient as PowerCLI provides an abstraction against the vSphere API to provide ease of use. As with anything, you will always be giving something up for ease of use. You can get quite efficient when leveraging the API directly and I've seen customers including internal solutions that can query several thousand objects and get real time updates without issues. It's all about how it's done, so though PowerCLI may "seem slower" compared to the VCDB, it's not the only way. The point of this article is that for any data queries, you should be leveraging the API as that is the "contract" VMware provides for a consistent/documented way of getting the data.
Chris says
Suppose a person doesn't know how to access the vCenter API (where is this information?) or just wants to return a small amount of data from CLI. For example I want to automate system monitoring and don't want to have to manually open a GUI to get the information I need.
Thanks
William Lam says
Take a look http://www.virtuallyghetto.com/2015/01/list-of-vmware-clis-sdks-devops-tools.html and http://developercenter.vmware.com/home as a starting point. Lots of resources online, it all just depends on what you're trying to do. If you're looking monitoring solutions, you may also want to take a look at vRealize Operations Manager
Chris says
Thank you. At first glance, it doesn't make much sense to me, but I will see if I can get the information I need from one of those links on the page.
Chris Nakagaki says
Apptio kills kittens every day... and I helped it do it... I'm ashamed.
Lewis B says
Should I be worried that the .Net API is left out of your diagram? I assume it has a future!
William Lam says
Hi Lewis,
That's a typo on my end, I think I was trying to make the boxes "even" and I sort of assumed vSphere SDK for .NET under "PowerCLI" which though it does use, are two separate things. I'll append a note so others are aware but .NET SDK is healthy and has a bright future. Sorry about that
V. J. says
No kitten dies accessing VCDB without the VM SDKs - they died, because the data model is not fully documented.
William Lam says
It's apparent you didn't actually read the article or else you would not be making that statement. If you prefer to do things the painful way, more power to you 🙂
Sonny Singh says
Thanks, this article could not have come at a better time. I was just having conversation with my colleague about this because he is working with our DBAs to build a reporting application by quering VCDB directly. He wants monthly report going to managers showing how many resources their VMs are consuming (based on owner identified in custom annotation field of VM and looking up their manager in CMDB). Can this type of report be built using Cloud Physics or another tool?