In vSphere 7.0 Update 1, a new capability was introduced called the vCenter Cluster Services (vCLS) which provides a new framework for decoupling and managing distributing control plane services for vSphere. To learn more, I highly recommend the detailed blog post linked above by Niels. In addition, Duncan also has a great blog post about common question/answers and considerations for vCLS, which is definitely worth a read as well.
vSphere DRS is one of the vSphere features which relies on this new vCLS service and this is made possible by the vCLS VMs which are deployed automatically when it detects there are ESXi hosts within a vSphere Cluster (regardless if vSphere DRS is enabled or not). For customers who may be using the ESXi-Arm Fling with a vSphere 7.0 Update 1 environment, you may have noticed continuous "Delete File" tasks within vCenter that seems to loop forever.
This occurs because the vCLS service will first test to see if it can upload a file to the datastore, once it can, it will delete it. The issue is that the vCLS VMs are x86 and can not be deployed to an ESXi-Arm Cluster as the CPU architecture is not supported. There is a workaround to disable vCLS for the ESXi-Arm Cluster, which I will go into shortly. However, because vCLS can not properly deploy, it means vSphere DRS capabilities will not be possible when using vSphere 7.0 Update 1 with ESXi-Arm hosts. If this is desirable, it is recommended that to use either vSphere 7.0c or vSphere 7.0d if you wish to use vSphere DRS.
Note: vSAN does not rely on vCLS to function but to be able to use it, you must place your ESXi-Arm hosts into a vSphere Cluster and hence applying this workaround would be desirable for that use case.
VMware recently published a KB 80472 which outlines several methods in disabling vCLS, which can only be done on a per-cluster basis. In general, there is not a good reason to do this as more and more vSphere Clustering Services will depend on vCLS. In the case of an ESXi-Arm cluster, it may not be desirable to move the hosts outside of the vSphere Cluster to disable the file deletion behavior.
The easiest method is to use the vSphere UI to add the vCenter Advanced Setting to disable vCLS for your ESXi-Arm Cluster.
Step 1 - We need the Managed Object (MoRef) ID for the ESXi-Arm Cluster, you can do this by click on your cluster and then in browser window look for domain-cXXXX value and make a note of this. In my example, the MoRef is domain-c1035
Step 2 - Click on the vCenter Server inventory object and navigate to Configure->Advanced Settings and then click on Edit Settings to add a new advanced vCenter Server setting. At the bottom, add both the name/value and replace the "domain-c1035" with the MoRef ID from Step 1
- Name = config.vcls.clusters.domain-c1035.enabled
- Value = False
vCLS checks every 30 seconds and after that, you should no longer see the file deletion tasks occurring anymore. One thing to note is that although vCLS has been disabled on the ESXi-Arm Cluster, the following warning banner will still be displayed and this is true whether vSphere DRS is actually turned on or off in the cluster.
Any support of nsx in the arm esxi release
Not currently. Was there something in particular you were looking to do with NSX-T?
low cost homelab!
I'm running vCenter 7.0.1 and this work around doesn't reflect the new interface, I don't see any way to add new settings, I can edit existing ones, but I see no way to add any.
very useful. This activity was not harmless.
We deploy FLING on 80 cores / 256 GB server; I expect performance data soon (our developers deploy k8s cluster there). One weird bug - it works with embedded SSD (and with iSCSI) but if insert any SATA disk into SATA slot both system and installer fail into magenta screen.
Thank you for sharing! What hardware platform are you using? If you can provide more information, it can help our Engr understand why you're seeing PSOD (purple screen of death) which is usually caused by HW. If you can also share the make/model of the SATA drive, that would also help but I suspect its possible that ESXi-Arm may not have drivers which is usually why it wouldn't recognize the device until there's an issue
I posted exact information in bug report already, but I can post it here when I will be on computer. If you want some logs etc I can post them too.
It is 1 cpu / 80 x 3Hhz / 256 Gb memory server. We started to test our build systems, k8s clusters, etc.
Server - Ampere Altra R272-P31.
(It is actually Gigabit ARM server)
We installed onto embedded SSD and add 2 ports 10GB card. Works well, especially when I disabled cluster service. But once try to add SATA disks, system did not see it after hot insert, and fail to PSOD on reboot. Installation media fail same way. Disks was 960 and 800 GB SSD, I did not have any 2.5 SATA HD to test.
Vendor promise to test + promise to test LSI raid controller, too.
I can post more details now.
1) System as seen in VMware
Hypervisor: VMware ESXi, 7.0.0, 20133114
Model: R272-P31-00
Processor Type: ARM Limited Neoverse N1 r3p1
Logical Processors: 80
NICs: 4
Virtual Machines: 12
State: Connected
2) Storage adapters as seen by system:
vmhba0 is PCIE controller. System is installed on PCIE SSD. No issues here.
vmhba1 - SATA controller. Says just this:
General
Name vmhba1
Model SATA controller
vmhba2 - same. We tried to insert SATA SSD into both with the same effect.
Other is software iSCSI which we configured to use QLogic 57810, without issues.
Disks we tried - I have them installed in other systems so can easily see exact type - HPE disk which is SSD but I am not sure if PSOD is for SSD only or not.
XA960ME10063
3) Let's look thru management port. OK, I see SATA controller type -
ASMedia Technology Inc. Vendor ID 0x1B21, device ID 0x1164
System is in the lab so I can borrow 2.5" SATA disk (fortunately one of our guys just order one and I have it available) and test, but we want our R&D team to finish their performance testing and k8s configuration first. First performance data was great - just simple 6 cores classical application (our previous version which use dockers but not kubernetus and not micro services) shows approx 20% improvement comparing with HPE Gen10 with Intel Xeon Gold 6246 CPU, so cores themelves looks comparable. We wait for the wider test but, most critical, is testing microservices/kubernetus and comparing with AWS provided ARM servers.
4) One more bug we found - system repeats this in the logs all the time (I filtered it out from log analyzer as it was flooded)
Aug 15 12:38:37 eqx-lab-arm01 Hostd: error hostd[2101447] [Originator@6876 sub=Default] Unable to convert Vigor value 'other5xlinux-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
Aug 15 12:39:07 eqx-lab-arm01 Hostd: error hostd[2101435] [Originator@6876 sub=Default] Unable to convert Vigor value 'other5xlinux-64' of type 'char const*' to VIM type 'Vim::Vm::GuestOsDescriptor::GuestOsIdentifier'
(It repeats every 30 seconds)
And just to make things clean - we are VMware partners (mostly for product certification with vSphere, TANZU and so on).