As the adoption of vSphere Content Library continues to grow, I am seeing more questions from our field and customers around content distribution. In case you did not know, vSphere Content Library (CL as I will be refering to it going forward) has its own built-in native replication mechanism which allows customers to easily publish and subscribe to libraries from either within a single vCenter Server instance or even between two completely different vCenter Servers (regardless of deployment topology and/or SSO Domain configurations).
Content distribution or replication is handled by CL which is a service within the vCenter Server. If content is being replicated from within a single vCenter Server and the ESXi hosts can communicate with each other, then direct host to host transfer is used, also referred to as Network File Copy (NFC), rather than going through vCenter Server. When content is transfered between two vCenter Servers, then the data travels through vCenter Server using standard HTTPS (443) by default. In the latter scenario, if you have configured Enhanced Linked Mode for your vCenter Servers, then NFC will be used unless ESXi hosts can not communicate with each other than, it will automatically fall back to the default HTTPS which is pretty cool.
One thing that may not be very well known is that customers actually have a choice in how their CL content is replicated. In addition to native replication which currently does not support incremental/delta updates, meaning all file transfers are full copies, CL can also support external replication. In fact, many customers today already have existing methods for efficiently replicating large amounts of data across multiple datacenters whether that is replication built into their storage arrays, network appliances or some other means. For these customers, you can still benefit from CL while continue to take advantage of your existing methods of replication.
So how does it work? When you create a published library, there is an associated set of metadata that describes the content itself and its location within the underlying storage system. This metadata is stored internally within CL and is used to communicate to other subscriber CLs on what content to synchronize and make available in their respective vCenter Server. This all happens transparently between a publisher and subscriber CL without any user involvement as one would expect. If you just copied the underlying CL files without this additional metadata, when you go and subscribe to the published CL, it will have no idea about these existing files and simply download the content again.
To prevent this "double" copy for externally replicated CLs, there is actually an advanced library setting called persist_json_enabled that can only be configured when using the Content Library REST API to persist and store the metadata that we had talked about earlier. With both the content and the metadata files being available during a subscriber CL creation, we are effectively performing a zero copy of the data since we already have the content and can make available for use immediately. To demonstrate this and some other useful CL APIs, I have updated my Content Library PowerCLI Module to include some new additional functions to aide in setting up an externally replicated CL.
Lets now make this more concrete by walking through an example. Below is a screenshot of VC1 (vcenter65-1) and it has a published CL (VC1-ContentLibrary) which is stored on an datastore (iSCSI-01) and I will configure it to support external replication so that I can have the exact same content residing on VC2 (vcenter65-3) with a subscribed CL (VC2-ContentLibrary) without having the CLs transfer any data between the two. You will need to have the latest PowerCLI release installed if you wish to make use of my CL PowerCLI module (Content Library API can be accessed through variety of vSphere Automation SDKs).
As mentioned earlier, the persist_json_enabled property is only available when using the CL REST API, so I have enhanced my Get-ContentLibrary function to include a bunch more useful information including this property (JSONPersistence) as shown in the screenshot below.
If we now login to an ESXi host which has access to the underlying storage of the CL, we can see the CL layout which includes unique IDs for each item that is uploaded to the CL and if we go inside one of the directories, we can see the actual file items as you would expect.
To enable the persistence of the JSON metadata file, you can do this when creating a new CL by using the New-LocalContentLibrary function and passing in the -JSONPersistence $true option or you can update an existing CL that you had already created by using the Set-ContentLibrary function. To do so, first login to the CIS API endpoint by using the Connect-CiSServer cmdlet and then run one of the following commands:
Here is an example of enabling the setting:
Set-ContentLibrary -LibraryName VC1-ContentLibrary -JSONPersistenceEnabled
Here is an example of disabling the setting:
Set-ContentLibrary -LibraryName VC1-ContentLibrary -JSONPersistenceDisabled
Note: Enabling/Disabling of JSON persistence is merely storing or deleting the metadata files. It has no impact to CL usage and can be done while CL is in use. It should also be noted that a CL configured with JSON persistence can continue to work with standard subscribed CLs, there is no impact to making the CL available through the traditional method which is also really nice.
If we now take a look at our storage system again, you should see several JSON files that have now been created which reflects the current CL metadata. These files will automatically be updated based on changes made within CL itself.
At this point, you are now ready to "replicate" your CL to your remote location. As mentioned earlier, this can be done through a variety of tools such as native array replication or even something as a simple as rsync or SCP in my case for demonstration purposes. When duplicating the CL content directory, you can rename the top level directory name from contentlib-[UUID] to anything you want, but make sure to leave all other directory and files names alone. Once you have completed replicating the CL, you can disconnect from your CIS API endpoint of your source vCenter Server and connect to your destination CIS API endpoint using the Connect-CiSServer cmdlet again. You will also need to connect to the vCenter Server using Connect-VIServer cmdlet, this is needed to perform an ID lookup of the datastore you wish to create the new CL on.
In my environment, I have copied the content to another datastore (iSCSI-02) which you can see from the screenshot below. This datastore is also being managed by a different vCenter Server (vcenter65-3) than the publisher CL and I have also renamed the top level replicated CL to myExtReplicatedContentLibrary. If you decide to rename the top level directory, please make a note of this as you will need this for later when creating your new subscriber CL.
New-ExtReplicatedContentLibrary -LibraryName VC2-ContentLibrary -DatastoreName iSCSI-02 -SubscribeLibraryName myExtReplicatedContentLibrary
The function is pretty straight forward, you simply provide the name of the new CL, the datastore in which the CL has been replicated to and the directory name of the subscribed library you had replicated to earlier. If everything was successful, you should now have a new CL that is subscribing to the replicated content that you had copied over earlier. You can kind of think kind of a loop back mount and no data is actually being sent across the wire between the two vCenter Servers. Pretty cool, huh!?
If we now run the Get-ContentLibrary function, we should see our new subscribed CL and you will notice the subscribed URL is actually a datastore reference rather than a URL, which is what we expect for consuming an externally replicated CL.
Note1: Currently there is not a way to distinguish between a regular subscribe CL versus an externally replicated CL other than the subscription URL with prefix URI of "ds://". This is only visible using either the CL REST API or vSphere Web (Flex) Client as the H5 Client does not currently display the subscription URL. Hopefully this will be updated in a future H5 update to include this useful bit of information on the source of the subscription URL.
Note2: When deleting an externally replicated CL using either the UI or API, the CL will be removed but the actual content on the filesystem will still persist. To delete the files, you will need to go to the datastore view and then delete the top level directory of the CL.
Note3: In case it was not apparent, when consuming an externally replicated CL, it is the customers responsibility to ensure that both the content and the JSON metadata files are synchronize on some perodic schedule to ensure that the subscribed CLs will pick up any changes made from the source published CL.
- Content Library Technical Deep Dive @ VMworld
- The Content Library PowerCLI module also includes three other useful functions: Remove-SubscribedContentLibrary, Remove-LocalContentLibrary & Copy-ContentLibrary that maybe worth checking out for Automation purposes
- Content Library Developer Blog Series