A couple of days back I had to re-install ESXi on a physical host for some troubleshooting purposes and while looking at the partitions on the disks using ESXCLI, I noticed the fresh ESXi installation had created two coredump partitions.
I was quite surprised to see two, since normally you would just have one configured. I even asked a colleague if he had ever see this before and he had not, so I wanted to double check that there was in fact two coredump partitions being created which I verified by using partedUtil.
As you can see from the screenshot above, there are definitely two coredump partitions. I took a look at our vSphere documentation, but did not find any mention of this. I decided to look internally and found that this is actually a new behavior that was introduced in ESXi 5.5. From what I can tell, the second coredump partition which is 2.5GB was created to ensure that there was sufficient space to handle ESXi hosts configured with a huge amount of memory (up to 4TB) if a coredump were to occur. This new coredump partition is only created on a fresh ESXi install, for upgrade scenarios the original partition structure is preserved. I suspect even on the fresh install, the original coredump partition was kept for potential backwards compatibility.
This definitely made sense given the reason. I guess this actually raises another interesting point from an operational point of view that though upgrades may be preferred, there are also good reasons to perform a fresh install over an upgrade. In this case, to ensure we do not break past requirements/assumptions, we could not just automatically expand or create a larger coredump partition to adhere to new requirements. This is actually not the first instance of this, here are two additional examples in which a fresh installation would have potentially yielded a more optimal environment:
Donagh Mc Sweeney says
In an upgrade scenario where the partition is still configured as 110MB, the coredump file feature was introduced in ESXi 5.5. The ESXi host can then determine whether the host might try to generate a coredump that is too big for the coredump partition.In that case the ESXi will automatically create the file /vmfs/volumes//vmkdump/.dumpfile. If a crash happens then the ESXi host will extract the coredump out of the configured dump file and create a vmkernel-zdump* file in /var/core during the first reboot following the crash.
William Lam says
Thanks for sharing this! Good to see that in an upgrade scenario, that a proper coredump can still happen which exceed the original partition size
Mike Foley says
I wonder why people do upgrades sometimes. It's almost as easy to backup the config data, remove the node from the cluster, install fresh and restore the config data and then re-add to the cluster. (I haven't tried that lately, but...)
William Lam says
Yea, someone had suggested upgrade + restore. I don't know if that would work or any implications. To me, if you have your installs automated, then this is trivial and almost always install + configure based on CMDB. I've always been lucky to be in env like that, so never been an issue. Another reason to look into automation 🙂
André Pett says
There's indeed not much in the documentation about the new partition size. It's only briefly mentioned in the description on how to "Move the Coredump to a File"
Hello William, There is one more catch wrt the coredump partitions. Say you deployed ESXi using 'dd' command, then it's noticed that the partition number becomes 2. This creates a problem during upgrade as upgrade installer thinks that 2nd partition is a scratch partition and tries to format it as vfat and it fails. Did you come across something like this ?
This behavior happens only when you use a USB key for ESXi and should not happen on a SAS LUN/HDD as the scratch partition would be set on that anyways.
This happened to me and we do not boot ESXi from a USB device (we use a local HDD). It reminded us to not upgrade our hosts and always do a fresh install.