4 ways to measure VM Uptime
I always tell my customers that one of the hardest metrics to measure by far is Virtual Machine uptime. The reason why it is so hard to measure is because there are so many factors that most people don’t think of. Many people think it is simple as a VM being powered on or powered off. What about VMs that are supposed to be powered off? Also, if a VM is powered on what if it has the blue screen of death or there is no network activity? In my opinion, if a VM is Powered on but the Operating system is not functional it should be considered down because this virtual machine is totally in an unusable state. In this guide I will explain all the possible scenarios and some of the methods I use to resolve this using Aria Operations.
VM Down Scenarios
- Virtual Machine is Powered Off in vCenter but is supposed to be On
- Virtual Machine is Powered On but Operating System has the Blue Screen of Death
- Virtual Machine is Powered On but Operating System is in a hung state
- Virtual Machine is Powered On but has no network activity because of bad IP configurations or even worst no vNIC/network attached.
- Virtual Machine is Powered On but nobody can access the URL for that application.
The above scenarios are what I consider a Virtual Machine to be down or unavailable. Now you can understand why this is the one of the hardest metrics to measure by a wide margin.
Methods to Measure VM Uptime
Measuring Blue Screen of Death and Hung VMs
One of my colleagues tested many scenarios already so you can read more about it below.
Measuring VMs down by using a Super Metric
This super metric will more accurately measure if a VM is down based on if a VM OS is responding or not. Using this method, I can also measure which VMs were up more than certain percent of the time but is currently down. Looking at the screenshot below as you can see vCenter reports that the VM has been powered on 100% of the time for a given time (Power On%). However, my super metrics (VM Uptime %) shows that the OS has been down unavailable at some point.
Measuring VMs by Ping pack
This method simply pings allows you to ping any URL or IP you would like to see measure for a response.
Measuring VMs down by using vSphere Tags
This method allows you to control which VMs you want to measure by simply creating one alert and tagging any VM in vCenter with a tag to see if it is down or up. In the post below I kept it simple and only based it on a VM being in a powered off state. You can take all the above scenarios and make a super alert that combines them to one and then control which VMs to monitor by using vSphere tags if you would like.