How to split vCenter Servers configured in an Enhanced Linked Mode (ELM)?

03.16.2017 by William Lam // 22 Comments

An interesting question that came up on the VMTN forum the other day (thanks to Andreas Peetz for sharing via Twitter) was how to split two vCenter Servers configured in an Enhanced Linked Mode (ELM)? Due to an organization changes in the customers environment, they needed to separate out their two vCenter Servers and run them independently of each other. Although this may sound like an rare event, I have actually seen this use case come up several times now which maybe from a business unit restructuring, spinning out or selling off company assets which then requires the customer to split their existing vCenter Servers that is configured with ELM.

Below is a diagram depicting an example where the original source environment (left) which is composed of two vCenter Servers and two external Platform Services Controller (PSC) configured in an ELM and the desired destination environment (right) which are two separate vCenter Server instances no longer configured in ELM.

The solution to this problem is actually pretty straight forward and leverages the existing vCenter Server and/or Platform Services Controller (PSC) "decommission" workflow. Rather than decommissioning the nodes, we are just simply keeping them around. Below are the instructions on how to achieve this outcome.

UPDATE (05/31/22) - I was recently made aware of the following VMware KB 2106736 article that provides official guidance for splitting/unregistering your vCenter Server from ELM. This should be followed as the officially supported method

UPDATE (01/28/19) - As of vSphere 6.7 Update 1, splitting an Enhanced Linked Mode (ELM) configuration is now supported by using the repointing workflow provided by the enhanced cmsso-util tool.

Disclaimer: Although this solution uses an existing supported workflow, this particular use case has not been tested by VMware. As such, this would not be officially supported by VMware until the appropriate testing has been done by our Engineering teams. One potential option in the short term if you are looking for support from VMware is to file an RPQ request through your VMware account team.

Prerequisite:

Environment running vSphere 6.0 or greater
Enhanced Linked Mode configured w/External PSC (e.g. No ELM using Embedded vCenter Server)
SSH/RDP access and VM Console access to PSCs

Here is a screenshot of my vSphere 6.5 environment configured like the diagram above. At the end of this article, we would have walked through the process to split up our ELM configuration and have two independent vCenter Server instances running while preserving their existing configurations.

Step 1 - Verify the existing environment to ensure there are no unknown PSCs or vCenter Servers that are attached to the environment that you may not be aware of, this could include decommissioned PSCs that were not properly removed. To do so, you will need to SSH into each of the PSCs and run the following commands.

You can use the updated dir-cli command to list all nodes (VC and PSCs) within an SSO Domain which is what an ELM is comprised of. Specify the SSO Administrator username as well as the password as shown in the example below:

/usr/lib/vmware-vmafd/bin/dir-cli nodes list --login '*protected email*' --password 'VMware1!' --server-name localhost

As you can see from the output, we are able to list all nodes (VC and PSC) along with their respective PSC replication partners. What we are looking for is to verify the environment and that PSCs are replicating with the expected systems before we proceed to the next step.

Note: In case you are not seeing the "nodes list" option (new in 6.5 if I recall correctly), you will need to use the vdcrepadmin utility instead.

/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartners -h localhost -u administrator -w VMware1!

vdcrepadmin only outputs the replication partners of the PSC, it will not show the VC nodes. However, you can get the list of all VC nodes connected to SSO Domain by simply logging into the vSphere Web Client using any one of the VCs to see the rest (this is basically what ELM provides).

Step 2 - Next, we need to prevent both psc-01 and psc-02 from talking to each other before we break the ELM configuration. This can be done in a variety of ways, but the quickest method is to simply disconnect the vNIC temporarily on psc-02 (only needs to be done on one node, so I chose the second). When a given PSC node is unreachable and when we perform the "decomission" operation, it will automatically comply and allow us to remove the replication partner. A verification step before moving on to the next step is to ensure you can no longer ping psc-02 from psc-01.

Step 3 - Login via SSH to the first PSC (psc-01) and using the cmsso-util to decomission the second PSC (psc-02) by running the following command (replace with your SSO admin credentials):

cmsso-util unregister --node-pnid psc-02.primp-industries.com --username '*protected email*' --passwd 'VMware1!'

This operation should be successful before you proceed to the next step. At this point you would have broken replication between psc-01 and psc-02.

Step 4 - After decommissioning the PSC, we will also need to decommission the other VC (vcenter65-2). We will use the exact same command but now replace it with the second VC by running the following command (replace with your SSO admin credentials):

cmsso-util unregister --node-pnid vcenter65-2.primp-industries.com --username '*protected email*' --passwd 'VMware1!'

At this point you have successfully completed the split of the first VC, you can login to its vSphere Web Client to confirm everything was successful as shown in the screenshot below.

Step 5 - Now, we need to perform a similiar operation for the second PSC (psc-02). Ensure you do NOT re-enable the vNIC on the second PSC yet. Login to the VM Console of psc-02 and then decommission the first PSC (psc-01) and then do the same for the first VC (vcenter65-1) by running the following two commands (replace with your SSO admin credentials):

cmsso-util unregister --node-pnid psc-01.primp-industries.com --username '*protected email*' --passwd 'VMware1!'
cmsso-util unregister --node-pnid vcenter65-1.primp-industries.com --username '*protected email*' --passwd 'VMware1!'

Step 6 - Once you have successfully completed Step 5, you can now re-enable the vNIC on psc-02.

At this point, you have now successfully split the second VC and you can confirm by logging into the second VC's vSphere Web Client as shown in the screenshot below.

The instructions above assume that you are using a vCenter Server Appliance (VCSA) but this can also be applied to a Windows-based vCenter Server and PSC. The paths for the respective utilities are as follows:

"%VMWARE_CIS_HOME%"\vmdird\vdcrepadmin
"%VMWARE_CIS_HOME%"\vmafdd\dir-cli
C:\Program Files\VMware\vCenter Server\bin\cmsso-util

Note: The techniques described above for breaking up two vCenter Servers configured using ELM should also work for n-vCenter Servers. You simply just need to ensure that all PSCs can not talk to each other while you perform the decommissioning of the other nodes.

Comments

Mike O'Donnell says

03/16/2017 at 3:02 pm

Great article, and perfect timing for me. A couple of days ago I opened a support ticket with VMWare asking the exact issue, we have a single SSO domain, two sites, a VCenter in each site. I wanted to basically break the SSO domain into two separate ones, with each VCenter on it's own PSC/Domain (exactly what you're showing in the first picture).

I was told by support today that it's not supported and that you need to install a new PSC/Domain.

I've seen references to "submitting this for an RFQ", what exactly is that? Is it asking VMware to support our specific configuration?

Reply
Kam says

04/17/2017 at 5:30 pm

I'm guessing this could also be used to divide a PSC installation that is close to, or will exceed, the maximum number of PSCs in a domain (8 for 6.0 and 10 for 6.5)?

It's something I'm facing as a combination of our scale and our design.

Reply
Richard Hughes says

06/26/2017 at 4:23 pm

Can the newly split PSC's use the same Active Directory?

Reply
- Kam says
  
  06/28/2017 at 4:01 pm
  
  I'm sure they can as it's just an authentication source. What's different between the SSO domains is the actual domain name, eg vsphere.local, vsphere2.local, vsphere3.local etc. At least, that's what I'm planning on doing.
  
  Reply
_nd345 says

08/03/2017 at 12:19 pm

How could I unlink two VCSA using the embedded PSC's and continue to use only embedded PSC's? VMWare informed me this is a deprecated and unsupported topology. Due this, I wish to make my primary vCenter and my DR-site vCenter completely separate.

Reply
JR says

08/11/2017 at 12:18 pm

Can they be reconnected at some point? Case in point, in a vblock environment, where vce requires us to break apart the vcenters for RCM upgrade, a yearly task. After the upgrades, i'd like to reconnect them into linked mode.

Reply
- Steve says
  
  08/23/2017 at 7:39 pm
  
  I too would like to be able to rejoin a split-off PSC/VCS back into ELM.
  
  Reply
Telmo says

10/05/2017 at 3:31 am

Could we do this split if we are talking about VCs with embedded PSC?

Reply
johannstander says

10/18/2017 at 11:01 am

Hi William, thanks as for the great article as always. Since you wrote this back in March, do you know if VMware now supports this use case or do we still have to file an RPQ request?

Reply
Rajesh says

10/27/2017 at 6:00 am

Hello,

I have a different question, we have 2 sites and 3 PSCs. PSC1 and PSC2 on Site1 and PSC3 on Site2. Site1 has 4 vCenters and Site2 has 2. But when we login to the web client the vCenters are not showing in a proper order, Is there any way to change the order of appearance ? We are looking forward to see site1 vCenters first in alphabetical order and then Site2 vCenter.

Reply
Gaurav Khanna says

02/01/2018 at 1:22 am

Hi William, Thanks for this wonderful article, I have the same environment, And need to upgrade to latest release, Can you please let me know how Can I rejoin after split linked mode

Reply
_nd345 says

02/01/2018 at 5:35 am

don't expect an answer unless someone else obliges. my question still sits there as we are still running linked in a prod-env.

Reply
Nawal Singh says

03/22/2019 at 6:50 am

I deployed 2 vCenter 6.5 U2C and first vCenter is SSO domain and joined another vCenter to same domain. However, I am unable to see the second VC If login through first VC. I login from second VC I am able to see both vCenter. May I know what is the issue?

Note: Both vCenter deployed and configured with Embedded PSC.

Reply
Manu says

04/05/2019 at 12:40 am

Hello,

i have 2x external PSC Appliance 6.0 & 2x Windows vCenter Server 6.0 configured in ELM but no load balancer. The goal is to migrate both vCenter Servers to vCSA 6.7U1 with embedded PSC & embedded linked mode.

Running migration assistant on 1st vCenter Server throws this error:

"Cannot upgrade vCenter Server (Appliance) with an embedded Platform Services Controller, because Platform Services Controller is installed remotly on x.x.x.x."

Do i have to split vCenters / break ELM prior to migration ??

On 2nd vCenter Server Migration Assistant is running fine.

Regards
Manuel

Reply
João Pedro Figueiró Pavan says

05/06/2019 at 11:17 am

Hello,

I have two Vcenter and one PSC, but now I need to remove the PSC and keep the Vcenters separeted (without SSO). How should I procede? Could you help me?

Thanks you!

Reply
_n345 says

07/11/2019 at 1:09 pm

anyone still having issues, this article helped me: https://techbrainblog.com/2015/10/02/issues-and-errors-when-decommissioning-the-vcenter-server-or-a-platform-services-controller-vcsa-6-0/

also note, use `administrator` rather than `*protected email*` helped

Reply
Cantique says

10/21/2019 at 2:17 pm

IHAC with 15 vCenters (6.5) with embedded PSC mode in one SSO domain. For some reason we might need to split them, either into 2 or 15 domains. Is there a reason that the procedure won't be applicable to embedded PSC? Thanks.

Reply
Steffen says

10/24/2019 at 1:55 am

Hello.
How to re-join 2 vCenters VCSA 6.5 (each with embedded PSC) after breaking up the linked-mode?
I have trouble here with one vCenter that won't replicate and vmdird always enters readonly-mode...

Reply
supportperson says

12/05/2019 at 7:53 am

Please do NOT perform this action in production. As a PSC specialist, I can say that this process has 'bricked' customer environments before. VMware support should never provide you with an external link representing it as an official VMware recommendation.

The problem with this process is that it will leave many stale PSC database entries, which can lead to a plethora of issues, including a permanent read-only state of the PSC database (which, in some cases, is impossible to reverse).

If you decide to perform this activity anyways, PLEASE take offline snapshots of all VCs and PSCs at the same time (shut down all PSCs and VCs at the same time, and take snapshots). Then, if you revert one, you must revert all of them. this is the only way to safely snapshot a PSC in ELM.

Reply
- Cantique says
  
  12/05/2019 at 8:10 am
  
  Hi,
  
  Thanks for the advice. I have a question about taking offline snapshot "at the same time". I bet there's no guarantee way we can shutdown all the vCenters/PSCs at the same time, right? I'm looking for a different approach and wondering if that can deliver better result:
  
  Assume we have PSC[123456] and the replication ring is 1->2->3->4->5->6->1.
  
  1. Shutdown PSC1.
  2. Make sure other replications still working, then shutdown PSC2.
  3. Repeat step 2 for PSC[3456].
  4. Create snapshots.
  5. Do whatever works needed.
  6. If the conditions render rollback, then return back to snapshots for all PSCs.
  7. Start PSC6 first and then PSC5.
  8. Make sure the replication is in good shape, then start PSC4.
  9. Repeat step 8 for PSC[321], in that order.
  
  Reply
Bret says

01/31/2020 at 9:04 am

This is a great article, but is missing the fact that the PSC DB needs to be cleaned up after the split.
Only way I have found to do this is through VMware support.
2020-01-30T23:36:38.801Z | INFO | state-manager1 | InvProviderClientFactory | Closing IS client as it could not be initialized https://hostname.company.com:443/invsvc
2020-01-30T23:36:38.801Z | INFO | state-manager1 | HealthStatusCollectorImpl | HEALTH ORANGE vAPI Router failed to load Inventory Service.
2020-01-30T23:36:39.885Z | WARN | state-manager1 | InvProviderClientFactory | Error communicating to IS https://hostname.company.com:443/invsvc
com.vmware.vim.query.client.exception.ClientException: java.util.concurrent.ExecutionException: com.vmware.vim.vmomi.client.exception.ConnectionException: java.net.NoRouteToHostException: No route to host (Host unreachable)
at com.vmware.vim.query.client.impl.QueryAuthenticationManagerImpl.loginBySamlToken(QueryAuthenticationManagerImpl.java:232)
at com.vmware.vapi.endpoint.cis.router.InvProviderClientFactory.createProviderClient(InvProviderClientFactory.java:105)
at com.vmware.vapi.endpoint.cis.router.InvSvcBuilder.createInvServiceClientList(InvSvcBuilder.java:345)
at com.vmware.vapi.endpoint.cis.router.InvSvcBuilder.buildInt(InvSvcBuilder.java:296)
at com.vmware.vapi.endpoint.cis.router.InvSvcBuilder.rebuild(InvSvcBuilder.java:254)
at com.vmware.vapi.state.impl.DefaultStateManager.rebuild(DefaultStateManager.java:406)
at com.vmware.vapi.state.impl.DefaultStateManager$2.doReconfig(DefaultStateManager.java:444)
at com.vmware.vapi.state.impl.DefaultStateManager$2.run(DefaultStateManager.java:433)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: com.vmware.vim.vmomi.client.exception.ConnectionException: java.net.NoRouteToHostException: No route to host (Host unreachable)
at com.vmware.vim.vmomi.core.impl.BlockingFuture.get(BlockingFuture.java:81)
at com.vmware.vim.query.client.impl.QueryAuthenticationManagerImpl.loginBySamlToken(QueryAuthenticationManagerImpl.java:230)
... 14 more
Caused by: com.vmware.vim.vmomi.client.exception.ConnectionException: java.net.NoRouteToHostException: No route to host (Host unreachable)
at com.vmware.vim.vmomi.client.common.impl.ResponseImpl.setError(ResponseImpl.java:256)
at com.vmware.vim.vmomi.client.http.impl.HttpExchange.run(HttpExchange.java:51)
at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingBase.executeRunnable(HttpProtocolBindingBase.java:226)
at com.vmware.vim.vmomi.client.http.impl.HttpProtocolBindingImpl.send(HttpProtocolBindingImpl.java:110)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.sendCall(MethodInvocationHandlerImpl.java:613)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl$CallExecutor.executeCall(MethodInvocationHandlerImpl.java:594)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.completeCall(MethodInvocationHandlerImpl.java:345)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invokeOperation(MethodInvocationHandlerImpl.java:305)
at com.vmware.vim.vmomi.client.common.impl.MethodInvocationHandlerImpl.invoke(MethodInvocationHandlerImpl.java:179)
at com.sun.proxy.$Proxy91.loginBySamlToken(Unknown Source)
at com.vmware.vim.query.client.impl.QueryAuthenticationManagerImpl.loginBySamlToken(QueryAuthenticationManagerImpl.java:228)
... 14 more

Reply
Douglas Ferguson says

01/06/2022 at 1:16 pm

How does this change when your environment uses embedded PSCs? I'm contemplating separating our linked mode instances because they add tremendous complexity to the VCSA upgrade/patching process - particularly when you have to take the whole ELM group down for offline snapshot, fix xyz problems in SSO directory, snap again, upgrade, etc. Single VCSAs are no biggie because of the limited blast radius and steps involved to CYA.

Reply

Thanks for the comment!Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

More from my site

Comments

Thanks for the comment!Cancel reply