Detecting ESXi Remote Syslog Connection Error Using a vCenter Alarm

I was just cleaning up one of my development labs and found that one of my VCSA (vCenter Server Appliance) which I had configured with vSphere Syslog Collector was no longer capturing logs for several of my ESXi hosts. After looking at some of the ESXi logs, I realized that I had rebooted the VCSA at one point and that caused an interruption in syslog forwarding and then knew immediately that I just needed to reload the syslog configuration via ESXCLI as noted in this VMware KB to restore log forwarding.

After restoring my syslog configurations, I had remembered a neat little trick I learned from one of the VMware TAMs about creating a vCenter Alarm to alert you when an ESXi host is no longer able to reach a remote syslog server. I thought this might be very handy alarm to have in your vCenter Server in case you hit a similar issue or having some connectivity issues with your syslog servers. By default, there is not an event on syslog connectivity but you can create a vCenter Alarm based on an eventId which shows up as "esx.problem.vmsyslogd.remote.failure" in both /var/log/hostd.log as well as /var/log/vobd.log.

Now that we know the eventId, we just need to create a vCenter Alarm which will notify us when it has a connectivity issue with it's configured syslog server.

Step 1 - Create a new alarm, in this example I am calling it "Syslog Connection Error" and you will need to specify the Alarm Type as "Host" and monitor for a specific event.

Step 2 - Next, click on Triggers and we will go ahead and paste in our eventId which is "esx.problem.vmsyslogd.remote.failure"

Step 3 - Lastly, you can configure an Action, if you wish to send an SNMP trap, run a command or send an email notification. In this example, we are just going to generate a regular vCenter Alarm event, so go ahead and just click OK to save the alarm.

To test the alarm, I just disabled the syslog-collector on the VCSA using "service syslog-collector stop" and you should see an alarm generate for any ESXi hosts forwarding it's logs to that particular syslog server.

So now when your ESXi hosts can not reach it's syslog server, you will automatically be notified and can look into the problem immediately. Now having an alarm is great ... but you might be wondering what about the need to reload the syslog configuration on all your ESXi hosts to restore syslog forwarding? This can definitely be a challenge/annoying, especially if the syslog server's connectivity is returned after some amount of time and you have hundreds of hosts.

Well luckily, you no longer have to worry about this, with the latest ESXi 5.0 patch03 that was just released, this problem has been addressed and ESXi syslog daemon will automatically start forwarding logs once connectivity has been restored to the syslog server. It is still definitely recommended that you have more than one syslog server in your environment and that they are properly being monitored. Also, do not forget with ESXi 5.0 you can now configure more than one remote syslog server, for more details take a look at this article here.

Note: After applying the patch, you will no longer be able to generate an alarm based on the eventId for syslog when using UDP. You will see something like "Hostd [290D5B90 verbose 'SoapAdapter'] Responded to service state request" in the hostd.log. The alarm will only be valid if you're using TCP or SSL protocol for syslog which have not been patched with latest p03.

If you are looking for a quick way to reload your syslog configurations, you can easily write a simple for loop to reload your ESXi hosts using the remote ESXCLI:

Here is another example using PowerCLI in-conjunction with ESXCLI:

Comments

Viktor says

07/29/2012 at 5:54 pm

Hi William,

Your article popped up as a referer on my website:) In the article your talking about patch03 for ESXi5; this resolves the issues with the syslog deamon. I was reading the changelog of patch03, the thing I found on the syslog deamon is:
"PR 838922: An ESXi host might not restart UDP logging after a temporary interruption that might be caused by target server reboot or network UDP package being lost."

Is the issue also solved if TCP or SSL is used, this is an option after all...?

Thx, Viktor

- William says
  
  07/30/2012 at 2:01 am
  
  @Viktor,
  
  afaik, only UDP has been resolved as mentioned in the release notes.
  
Michael MacFaden says

04/20/2015 at 8:55 pm

Also note one can also to monitor any/all of the TCP connections emanating from a given ESXi host by polling the snmp agent
using RFC 2790 HOST-RESOURCES-MIB (this technique works on any vendor's system that supports RFC 2790)

Fetch tcp connections, filter by connection state == established:
$ snmpwalk -v2c -c public hpgen8 tcpConnectionState | grep established

TCP-MIB::tcpConnectionState.ipv4."10.24.235.109".22.ipv4."10.20.93.148".42306 = INTEGER: established(5)
TCP-MIB::tcpConnectionState.ipv4."10.24.235.109".20929.ipv4."10.20.93.153".514 = INTEGER: established(5)

Any going SSH connections would be on port 22:
TCP-MIB::tcpConnectionState.ipv4."10.24.235.109".22.ipv4."10.20.93.148".42306 = INTEGER: established(5)

And assuming syslog was configured for tcp to port 514:
# esxcli system syslog config set --loghost='tcp://10.20.93.153:514'
~ # esxcli system syslog reload

Then one would expect to see an entry to that host in established state:
TCP-MIB::tcpConnectionState.ipv4."10.24.235.109".20929.ipv4."10.20.93.153".514 = INTEGER: established(5)

Michael MacFaden says

04/20/2015 at 8:57 pm

Once aso can do the same from esxcli:

$ esxcli network ip connection list | grep 514
tcp 0 0 10.24.235.109:20929 10.20.93.153:514 ESTABLISHED 33376 newreno vmsyslogd

Sudhakar S says

07/03/2017 at 5:01 am

Hi,

How to set alarm for the below scenario in vCenter ?

If the user login into ESXi using his ADMIN\ID, we need to get the mail notification like AMDIN\ID has been logged in to ESXi host.

Please let us know how to do it ?

More from my site

Comments

Trackbacks

Thanks for the comment!Cancel reply