Since the early days of Virtual Center and ESX, the only method for creating and sharing arbitrary metadata between the vSphere Management layer and the guest operating system was to use either guest variables (guestinfo) or the OVF runtime environment.
While both of these capabilities have enabled a ton of interesting use cases and have even inspired creative solutions such as this, this, this, this and this to just name a few, it certainly has its challenges and nuances from an end user experience perspective.
For example, the persistency or the non-persistency of guest variables solely depended on when it was applied to a Virtual Machine and the power state it was in, which can be very frustrating to discover for the first time and the inconsistent behavior for end users. The lack of security and access control in both guest variables and the OVF runtime environment also means the metadata could easily be overwritten or removed by users in either the vSphere Management layer or guest operating system, making this challenging to scale for larger organizations.
This is why I am excited for vSphere 8 and the new vSphere Dataset feature!
Use Cases
Here are some of the use cases that can benefit from vSphere Datasets:
- Arbitrary metadata for a Virtual Machine (e.g. System Owner, Application Owner, Location, etc.)
- Coordinating Application Workflow/Installation signals to or from the vSphere Management layer
- Application Development and build system (e.g. dynamic application configuration and build artifacts)
- Configuration management system and tools to publish installed applications and OS details
With the new capabilities of vSphere Datasets, I am pretty sure our customers will find plenty more use cases that this solution can now address.
Requirements
- vCenter Server and ESXi must be running vSphere 8.0
- Virtual Machine must be configured VM Compatibility Version 20
- VMware Tools 11.3 or greater
vSphere Datasets
So, what are vSphere Datasets? It is currently a vSphere API only capability that provides a facility to share data through a collection of key/value pairs between the vSphere Management layer and guest operating system. The type of data should be relatively small and change infrequently.
What are the benefits of vSphere Datasets over the previous solutions?
- Better Security Model
- Improved vSphere API and access management
- Privileged access from guest operating system
- Support for Large Scale
- Large number of vSphere Dataset per VM
- Up to 100MB in capacity
- Easy User Experience
- Dataset-entry hierarchy
- vSphere REST API for management
- Guest Operating system commands
- Data Persistence
- Persist data across power cycle
- Optional omit or include data for snapshot / clone operation
How do vSphere Datasets work? Using the vSphere REST API, you would first create a dataset, which acts as a container for the actual data which are stored as dataset entries (key/value) pairs. A dataset includes basic information such as the name and description but it also includes access control policies for both the vSphere Management layer as well as the guest operating system, which can be NONE, READ_ONLY or READ_WRITE. Lastly, you can also specify whether a given dataset will be included as part of a VM clone or snapshot operation.
Once a dataset has been created, dataset entries can be added from either the vSphere Management layer and/or guest operating system, which is determined by the access control policies configured for a given dataset. One huge improvement over the previous solutions is that you can have multiple datasets that have different access control policies for different use cases for a given VM, which makes this an extremely flexible and powerful capability.
Lets take a look at a few concrete examples:
In the example below, entries in this dataset can be read/created/updated/deleted by the vSphere Management layer, but the guest will have no access
Property | Value |
---|---|
Name | admin-ds |
Host Access | READ_WRITE |
Guest Access | NONE |
In the example below, entries in this dataset can be read/created/updated/deleted by the vSphere Management layer and guest will have read only access
Property | Value |
---|---|
Name | shared-admin-ds |
Host Access | READ_WRITE |
Guest Access | READ_ONLY |
In the example below, entries in this dataset can be read/created/updated/deleted by the guest and the vSphere Management layer will have read only access
Property | Value |
---|---|
Name | shared-user-ds |
Host Access | READ_ONLY |
Guest Access | READ_WRITE |
In the example below, entries in this dataset can be read/created/updated/deleted by the guest but the vSphere Management layer will have no access
Property | Value |
---|---|
Name | user-ds |
Host Access | NONE |
Guest Access | READ_WRITE |
Here is a quick overview of the different vSphere Management and Guest APIs to manage both vSphere Datasets and vSphere Dataset Entries. The vSphere Management API is available through the existing vCenter Server REST API and the Guest APIs are available through the guest operating system via the vmtoolsd command-line utility.
vSphere REST API for vSphere Datasets & Entries
When vSphere 8 GA's, you will be able to find the complete REST API documentation here, which will be broken into two sections: datasets and datasets entries. The vSphere Dataset REST API is very straight forward and can be consumed using any REST-based Client. Since a large majority of VMware customers already leverage PowerCLI for Automation purposes, I have created a PowerCLI Community Module called VMware.Community.Dataset which uses the CIS Server cmdlets to interact with the vSphere Dataset REST APIs.
- New-VMDataset
- Get-VMDataset
- Remove-VMDataset
- New-VMDatasetEntry
- Get-VMDatasetEntry
- Remove-VMDatasetEntry
Step 1 - Install the VMware.Community.Dataset module using the following command:
Install-Module VMware.Community.Datasets
Step 2 - Next, connect to the CIS Server endpoint which will be the IP Address/FQDN of your vCenter Server using:
Connect-CisServer -Server 192.168.30.213 -User *protected email* -Password VMware1!
Step 3 - Import the VMware.Community.Dataset module and you are now ready to start automating vSphere Datasets
Import-Module VMware.Community.Dataset
Here is an example creating several datasets using the New-VMDataset function with different access control policies based on the concrete examples described above earlier.
$vm_moref = "vm-26" $adminDataSetParam = @{ Name = "admin-ds"; Description = "Dataset for Admins"; VMMoref = $vm_moref; GuestAccess = "NONE"; HostAccess = "READ_WRITE"; OmitFromSnapshotClone = $false; } New-VMDataset @adminDataSetParam $sharedDataSet1Param = @{ Name = "shared-admin-ds"; Description = "Dataset for Admins and RO for Users"; VMMoref = $vm_moref; GuestAccess = "READ_ONLY"; HostAccess = "READ_WRITE"; OmitFromSnapshotClone = $false; } New-VMDataset @sharedDataSet1Param $sharedDataSet2Param = @{ Name = "shared-user-ds"; Description = "Dataset for Users and RO for Admins"; VMMoref = $vm_moref; GuestAccess = "READ_WRITE"; HostAccess = "READ_ONLY"; OmitFromSnapshotClone = $false; } New-VMDataset @sharedDataSet2Param $userDataSetParam = @{ Name = "user-ds"; Description = "Dataset for Users"; VMMoref = $vm_moref; GuestAccess = "READ_WRITE"; HostAccess = "NONE"; OmitFromSnapshotClone = $false; } New-VMDataset @userDataSetParam
Here is an example listing all datasets using the Get-VMDataset function.
Here is an example retrieving the configuration for specific dataset by using the Get-VMDataset function and specifying the name of a dataset.
Here is an example creating several dataset entries for different datasets using the New-VMDatasetEntry function.
$adminDataSetEntry1Param = @{ Name = "Location"; VMMoref = "vm-26"; Dataset = "admin-ds"; Value = "Palo Alto"; } New-VMDatasetEntry @adminDataSetEntry1Param $adminDataSetEntry2Param = @{ Name = "Building"; VMMoref = "vm-26"; Dataset = "admin-ds"; Value = "Promontory E"; } New-VMDatasetEntry @adminDataSetEntry2Param $sharedDataSetEntry1Param = @{ Name = "AppID"; VMMoref = "vm-26"; Dataset = "shared-admin-ds"; Value = "app-1234"; } New-VMDatasetEntry @sharedDataSetEntry1Param $sharedDataSetEntry2Param = @{ Name = "SystemOwner"; VMMoref = "vm-26"; Dataset = "shared-admin-ds"; Value = "William Lam"; } New-VMDatasetEntry @sharedDataSetEntry2Param
Here is an example listing all dataset entries for a specific dataset using the Get-VMDatasetEntry function.
Here is an example retrieving the value for a specific dataset entry also using the Get-VMDatasetEntry function.
To delete specific dataset entry, you can use Remove-VMDatasetEntry function. To delete a dataset, all dataset entries must be first removed and then you can use the Remove-VMDataset function.
If you attempt to access a dataset that you do not have access to, you will get an unauthorized error like the following:
Guest API for vSphere Datasets & Entries
vSphere Datasets and their entries can also be accessed from within the guest operating system using the vmtoolsd utility. Depending on the access control policies, you may or may not have the permissions to list and/or manipulate individual datasets.
To list all configured datasets, you can run the following command:
vmtoolsd --cmd 'datasets-list' | python -m json.tool
Note: The python command is optional and is only used to nicely format the JSON output for readability
To view the configuration for a given dataset, you can run the following command:
vmtoolsd --cmd 'datasets-query {"dataset":"shared-admin-ds"}' | python -m json.tool
With more complex dataset commands, using the --cmd may not be ideal and vmtoolsd provides another parameter called --cmdfile which accepts a file that contains the commands (between the single tick marks) and simply processes that instead. Below is the previous command by now reading from a file instead
# cat ds-command datasets-query {"dataset":"shared-admin-ds"} # vmtoolsd --cmdfile=ds-command | python -m json.tool { "result": true, "info": { "name": "shared-admin-ds", "description": "Dataset for Admins and RO for Users", "used": 35, "hostAccess": "READ_WRITE", "guestAccess": "READ_ONLY", "omitFromSnapshotAndClone": false } }
Note: The size limit for JSON requests is 64KB and for responses it is 1MB
To list entries for a given dataset, you can run the following command:
vmtoolsd --cmd 'datasets-list-keys {"dataset":"shared-admin-ds"}'
To view a specific entry from a dataset, you can run the following command:
vmtoolsd --cmd 'datasets-get-entry {"keys": ["SystemOwner"], "dataset":"shared-admin-ds"}' | python -m json.tool
If you recall earlier when creating our datasets, we had two datasets (shared-user-ds and user-ds) where the vSphere Management layer does not have permissions to create entries. Let's now take a look at dataset entry management using vmtoolsd utility.
To create/update one or more entries for a given dataset, you can run the following command:
vmtoolsd --cmd 'datasets-set-entry {"dataset":"user-ds", "entries": [{"key": "AppConfigPath", "value": "/opt/vmware/mycustomapp/config.json"}, {"key": "AppRetry", "value": "88"}]}' | python -m json.tool
To view one of more specific entries from a dataset, we can run the following commands:
# vmtoolsd --cmd 'datasets-get-entry {"keys": ["AppConfigPath"], "dataset":"user-ds" }' | python -m json.tool { "result": true, "entries": [ { "AppConfigPath": "/opt/vmware/mycustomapp/config.json" } ] } # vmtoolsd --cmd 'datasets-get-entry {"keys": ["AppConfigPath", "AppRetry"], "dataset":"user-ds" }' | python -m json.tool { "result": true, "entries": [ { "AppConfigPath": "/opt/vmware/mycustomapp/config.json" }, { "AppRetry": "88" } ] }
To delete one or more specific entries from a dataset, we can run the following command:
vmtoolsd --cmd 'datasets-delete-entry {"keys":["AppConfigPath","AppRetry"], "dataset": "user-ds"}
I think vSphere Datasets will open a ton of new possibilities whether that is for our customers, partners and even second party solutions from VMware. I can not wait to hear how you and your organization will leverage the powerful new vSphere Dataset feature!
pretty cool. Thanks William for the great blog on this!