sso replication

For those of you who read my previous article (if you have not read it, please do so before proceeding forward), at the very end I showed off a screenshot of a script that I had created for the vCenter Server Appliance (VCSA) which automatically monitors the health of the primary Platform Services Controller (PSC) it is connected to and in the event of a failure, it would automatically repoint and failover to another healthy PSC. The way it accomplishes this is by first deploying two externally replicated PSC's and then associating the VCSA with just the first PSC which we will call our primary/preferred PSC node. Both PSC's are in an Active/Active configuration using a multi-master replication and any changes made in SSO on psc-01 (as shown in the diagram) will automatically be replicated to psc-02.

From a vCenter Server's point of view, it is only get requests serviced by a single PSC, which is psc-01 as shown in the diagram above. Within the VCSA, there is a script which runs a cronjob that will periodically check psc-01's connectivity by performing a simple GET operation on the /websso endpoint. If it is unable to connect, the script will retry for a certain number of times before declaring that the primary/preferred PSC node is no longer available. At this point, the script will automatically re-point the VCSA to the secondary PSC and in a couple of minutes, any users who might have tried to login to the vSphere Web Client will be able login and this happens transparently behind the scenes without any manual interaction. For users that have already logged in to vCenter Server, those sessions will continue to work unless they have timed out, in which case you would need to log back in.

The script is configurable in terms of the number of times to check the PSC for connectivity as well as the amount of time to wait in between each check. In addition, if you have an SMTP server configured on the vCenter Server, you can also specify an email address which the script can send a notification after the failover and alert administrators to the failed PSC node. Although this example is specific to the VCSA, a similar script could be developed on a Windows platform using the same core foundation.

Disclaimer: This script is not officially supported by VMware, it is intended as an example of what can be done with the cmsso-util utility. Use at your own risk.

To setup a similar configuration, you will need to perform the following:

Step 1 - Deploy two External PSCs that are replicated with each other. Ensure you select the "Join an SSO domain in an existing vCenter 6.0 platform services controller" option to setup replication and ensure you are joining the same SSO Site.

Step 2 - Deploy your VCSA and when asked to specify the PSC to connect to, specify the primary/preferred PSC node you had deployed earlier. In my example, this would be psc-01 as seen in the screenshot below.

Step 3 - Download the checkPSCHealth.sh script which can be found here.

Step 4 - SCP the script to your VCSA and store it under /root and then set it to be executable by running the following command:

chmod +x /root/checkPSCHealth.sh

Step 5 - Next, you will need to edit the script and adjust the following variables listed below. They should all be self explanatory and if you do not have an SMTP server setup, you can leave the EMAIL_ADDRESS variable blank.

PRIMARY_PSC - IP/Hostname of Primary PSC
SECONDARY_PSC- IP/Hostname of Secondary PSC (must already be replicating with Primary PSC)
NUMBER_CHECKS- Number of times to check PSC connectivity before failing over (default: 3)
SLEEP_TIME - Number of seconds to wait in between checks (default: 30)
EMAIL_ADDRESS- Email when failover occurs

Step 6 - Lastly, we need to setup a scheduled job using cron. To do so you can run the following command:

crontab -e

Copy the following snippet as shown below into the crontab of the root user account. The first half just covers all the default paths and the expected libraries to perform the operation. I found that without having these paths, you will run into issues calling into the cmsso-util and I figured it was easier to take all the VMware paths from running the env command and just making it available. The very last last line is actually what setups the scheduling and in the example below, it will automatically run the script every 5 minutes. You can setup even more complex rules on how to run the script, for more info, take a look here.

PATH=/sbin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/games:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/java/jre-vmware/bin:/opt/vmware/bin

SHELL=/bin/bash
VMWARE_VAPI_HOME=/usr/lib/vmware-vapi
VMWARE_RUN_FIRSTBOOTS=/bin/run-firstboot-scripts
VMWARE_DATA_DIR=/storage
VMWARE_INSTALL_PARAMETER=/bin/install-parameter
VMWARE_LOG_DIR=/var/log
VMWARE_OPENSSL_BIN=/usr/bin/openssl
VMWARE_TOMCAT=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
VMWARE_RUNTIME_DATA_DIR=/var
VMWARE_PYTHON_PATH=/usr/lib/vmware/site-packages
VMWARE_TMP_DIR=/var/tmp/vmware
VMWARE_PERFCHARTS_COMPONENT=perfcharts
VMWARE_PYTHON_MODULES_HOME=/usr/lib/vmware/site-packages/cis
VMWARE_JAVA_WRAPPER=/bin/heapsize_wrapper.sh
VMWARE_COMMON_JARS=/usr/lib/vmware/common-jars
VMWARE_TCROOT=/opt/vmware/vfabric-tc-server-standard
VMWARE_PYTHON_BIN=/opt/vmware/bin/python
VMWARE_CLOUDVM_RAM_SIZE=/usr/sbin/cloudvm-ram-size
VMWARE_VAPI_CFG_DIR=/etc/vmware/vmware-vapi
VMWARE_CFG_DIR=/etc/vmware
VMWARE_JAVA_HOME=/usr/java/jre-vmware

*/5 * * * * /root/checkPSCHealth.sh

Step 7 - Finally, you will probably want to test the script to ensure it is doing what you expect. The easiest way to do this is by disconnecting the vNIC on psc-01 and depending on how you have configured the script, in a short amount of time it should automatically start the failover. All operations are automatically logged to the system logs which you can find under /var/log/messages.log and I have also tagged the log entries with a prefix of vGhetto-PSC-HEALTH-CHECK, so you can easily filter out those message in Syslog as seen in the screenshot below.

If a failover occurs, the script will also log additional output to /root/psc-failover.log which can be used to troubleshoot in the case a failover was attempted but failed. To ensure that the script does not try to failover again, it creates an empty file under /root/ran-psc-failover which the script checks at the beginning before proceeding. Once you have verified the script is doing what you expect, you will probably want to manually fail back the VCSA to the original PSC node and then remove the /root/ran-psc-failover file else the script will not run when it is schedule to.

As mentioned earlier, though this is specifically for the the VCSA, you can build a similar solution on a Windows system using Windows Task Scheduler and the scripting language of your choice. I, of course highly recommend customers to take a look at the VCSA for its simplicity in management and deployment, but perhaps thats just my bias 🙂

In this article, I will share alternative methods of deploying replicated Platform Services Controller Node (PSCs) using the VCSA 6.0 appliance. Take a look at the various deployment methods below and their respective instructions for more details. If you are deploying using one of the scripts below, you will need to extract the contents of the VCSA ISO. If you are deploying to Workstation/Fusion, you will need to extract the VCSA ISO and add the .ova extension to the following file VMware-VCSA-all-6.0.0-2562643->vcsa->vmware-vcsa before deploying.

Disclaimer: Though these alternative deployment options work, they are however not officially supported by VMware. Please use at your own risk.

Deploying to an existing vCenter Server using ovftool (shell script)

I have created a shell script called deploy_vcsa6_replicated_psc_to_vc.sh which requires using ovftool 4.1 (included in the VCSA ISO) to specify the appropriate OVF "guestinfo" properties for a replicated PSC deployment. You will need to edit the script and modify several variables based on your environment.

Here is an example of executing the script:

Deploying to an ESXi host using ovftool (shell script)

I have created a shell script called deploy_vcsa6_replicated_psc_to_esxi.sh which requires using ovftool 4.0 or greater to specify the appropriate OVF "guestinfo" properties for a replicated PSC deployment. You will need to edit the script and modify several variables based on your environment. The behavior of this script is similar to the one above, except you are deploying directly to an ESXi host.

Deploying to an existing vCenter Server using ovftool (PowerCLI)

I have created a PowerCLI script called Deployment-PSC-Replication.ps1 which uses ovftool and specifies the appropriate OVF "guestinfo" properties for a replicated PSC deployment. You will need to edit the script and modify several variables based on your environment.

Deploying to VMware Fusion & Workstation

To properly deploy the new VCSA 6.0, the proper OVF properties MUST be set prior to the booting of the VM. Since VMware Fusion and Workstation do not support OVF properties, you will need to manually deploy the VCSA, but not power it on. Once the deployment has finished, you will need to add the following entries to the VCSA's VMX file and replace it with your environment settings. Once you have saved your changes, you can then power on the VM and the configurations will then be read into the VM for initial setup.

guestinfo.cis.deployment.node.type = "infrastructure"
guestinfo.cis.vmdir.domain-name = "vghetto.local"
guestinfo.cis.vmdir.site-name = "vghetto"
guestinfo.cis.vmdir.password = "VMware1!"
guestinfo.cis.vmdir.first-instance = "false"
guestinfo.cis.vmdir.replication-partner-hostname = "192.168.1.50"
guestinfo.cis.appliance.net.addr.family = "ipv4"
guestinfo.cis.appliance.net.addr = "192.168.1.63"
guestinfo.cis.appliance.net.pnid = "192.168.1.63"
guestinfo.cis.appliance.net.prefix = "24"
guestinfo.cis.appliance.net.mode = "static"
guestinfo.cis.appliance.net.dns.servers = "192.1681.1"
guestinfo.cis.appliance.net.gateway = "192.168.1.1"
guestinfo.cis.appliance.root.passwd = "VMware1!"
guestinfo.cis.appliance.ssh.enabled = "true"

For more information, you can take a look at this article here.

Deploying using new supported scripted install (bonus)

As mentioned earlier, there is also a new scripted installer included inside of the VMware-VCSA ISO under /vcsa-cli-installer which supports Windows, Mac OS X and Linux, but must be connected directly to an ESXi host. There are several templates that are also included within the /vcsa-cli-installer/templates. I thought as a bonus I would also share the template I have been using to deploy replicated PSC instances using a static IP Address which some of you may find useful.

{
    "__comments":
    [
        "William Lam - www.virtuallyghetto.com",
        "Example VCSA 6.0 Replicated Platform Service Controller Node Deployment w/Static IP Address"
    ],

    "deployment":
    {
        "esx.hostname":"192.168.1.200",
        "esx.datastore":"mini-local-datastore-1",
        "esx.username":"root",
        "esx.password":"vmware123",
        "deployment.network":"VM Network",
        "deployment.option":"infrastructure",
        "appliance.name":"psc-02",
        "appliance.thin.disk.mode":true
    },

    "vcsa":
    {
        "system":
        {
            "root.password":"VMware1!",
            "ssh.enable":true,
            "ntp.servers":"0.pool.ntp.org"
        },

        "sso":
        {
            "password":"VMware1!",
            "domain-name":"vghetto.local",
            "site-name":"virtuallyGhetto",
            "first-instance":false,
            "replication-partner-hostname":"192.168.1.50"
        },

        "networking":
        {
            "ip.family":"ipv4",
            "mode":"static",
            "ip":"192.168.1.51",
            "prefix":"24",
            "gateway":"192.168.1.1",
            "dns.servers":"192.168.1.1",
            "system.name":"192.168.1.51"
        }
    }
}

The use the scripted installer, you just need to change into the appropriate OS platform directory (win32,mac or lin64) and there should be a binary called vcsa-deploy. To use this template, you just need to save the JSON to a file and then specify that as the first argument to vcsa-deploy utility.

Here is an example of deploying a PSC using the vcsa-deploy scripted installer.

How to automatically repoint & failover VCSA to another replicated Platform Services Controller (PSC)?

Ultimate automation guide to deploying VCSA 6.0 Part 3: Replicated Platform Service Controller Node

Deploying to an existing vCenter Server using ovftool (shell script)

Deploying to an ESXi host using ovftool (shell script)

Deploying to an existing vCenter Server using ovftool (PowerCLI)

Deploying to VMware Fusion & Workstation

Deploying using new supported scripted install (bonus)