check_esx4_storage

On this page you will find the current version of the ESX4 storage check plugin

Idea

We are using the VMware ESXi 4 server on DL380G5 hardware hosted on a USB stick using the internal P400 controller for serving the storage for the Virtual Machines. The solution works very good for us since version 3. Short after the release of version 3 there were some different Nagios plugins released for checking the health of the ESX host and the virtual machines.

It was not possible for us to fetch those information from the ESXi 3. So we had to wait for the release of ESXi 4. And in fact: It is now possible to fetch the storage information from the ESXi.

Solution

To bring the status information to Nagios I wrote a small Nagios plugin which uses the VMware Infrastructure Perl Toolkit to gather those information from the ESXi servers.

check_esx4_storage-summary
check_esx4_storage-detail
 

Prerequisites

You will need the VMware Infrastructure Perl Toolkit have installed on your Nagios server to get the plugin working. I installed VIPerl with the howto included in check_esx3 from op5:

Download the latest version of Perl Toolkit from VMware support page.
In this example we use VMware-VIPerl-1.6.0-104313.i386.tar.gz,
but the instructions should apply to newer versions as well.

Upload the file to your Nagios server’s /root dir and execute:

cd /root
tar xvzf VMware-VIPerl-1.6.0-104313.i386.tar.gz
cd vmware-viperl-distrib/
./vmware-install.pl

Follow the on screen instructions, described below:

“Creating a new VMware VIPerl Toolkit installer database using the tar4 format.
Installing VMware VIPerl Toolkit.
You must read and accept the VMware VIPerl Toolkit End User License Agreement to continue.
Press enter to display it.”

“Read through the License Agreement”
“Do you accept? (yes/no)

yes

“In which directory do you want to install the executable files? [/usr/bin]”

The following Perl modules were found on the system but may be too old to work
with VIPerl:

Crypt::SSLeay
Compress::Zlib

The installation of VMware VIPerl Toolkit 1.6.0 build-104313 for Linux
completed successfully. You can decide to remove this software from your system
at any time by invoking the following command: /usr/bin/vmware-uninstall-viperl.pl.

Enjoy,

–the VMware team

Note: “Crypt::SSLeay” and “Compress::Zlib” are not needed for check_esx3 to work.

check_esx4_storage

Copy the file to your nagios/libexec directory, fix owner and make it executable.

#!/usr/bin/perl
# ##############################################################################
# 2009-10-20 Lars Michelsen <lars@vertical-visions.de>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307,
#
# GNU General Public License: http://www.gnu.org/licenses/gpl-2.0.txt
#
# ##############################################################################
# SCRIPT:          check_esx4_storage.pl
# VERSION:         1.0
# AUTHOR:          Lars Michelsen
# DECRIPTION:      Checks the storage health status in VMWare ESX4 servers using
#                  the VMware VIPerl toolkit. The script has been written for
#                  checking HP DL380g5 server with built-in P400 controller
#                  Inspired by the Hardware.pl found on
#                  <http://communities.vmware.com/docs/DOC-10665>
# BUGS:            Please report bugs on <http://www.nagios-portal.org>
# CHANGES:
# 2009-10-20 v1.0  Initial code
# ##############################################################################
 
use strict;
use warnings;
use VMware::VILib;
use WSMan::StubOps;
 
$Util::script_version = "1.0";
 
#
# Nagios specific definitions
#
 
my %ERRORS = ('OK' => 0,
              'WARNING' => 1,
              'CRITICAL' => 2,
              'UNKNOWN' => 3);
 
my %ERRORCODES = (0 => 'OK',
                  1 => 'WARNING',
                  2 => 'CRITICAL',
                  3 => 'UNKNOWN');
 
my %HEALTHSTATUS2NAGIOSCODE = ('Unknown' => 3,
                               'OK' => 0,
                               'Degraded/Warning' => 1,
                               'Minor failure' => 1,
                               'Major failure' => 2,
                               'Critical failure' => 2,
                               'Non-recoverable error' => 2);
 
my $output = '';
my $perfdata = '';
my $exitCode = 0;
 
#
# VMWare API definitions
#
 
my @classes = ("VMware_Controller","VMware_StorageExtent","VMware_StorageVolume","VMware_SASSATAPort");
 
my %healthstatus=(0 => "Unknown", 5 => "OK",
                  10 => "Degraded/Warning",
                  15 => "Minor failure",
                  20 => "Major failure",
                  25 => "Critical failure",
                  30 => "Non-recoverable error");
 
my %hardwaregroup=("VMware_Controller" => "Storage",
                   "VMware_StorageExtent" => "",
                   "VMware_StorageVolume" => "",
                   "VMware_SASSATAPort" => "");
 
my @operationalstatus = ("Unknown", "Other", "OK", "Degraded", "Stressed",
                         "Predictive Failure", "Error", "Non-Recoverable Error",
                         "Starting", "Stopping", "Stopped", "In Service",
                         "No Contact", "Lost Communication", "Aborted", "Dormant",
                         "Supporting Entity in Error", "Completed", "Power Mode",
                         "DMTF Reserved", "Vendor Reserved");
 
# General variable Declaration
 
my $client;
 
my %opts = (
   namespace  => {
      type     => "=s",
      help     => "Namespace for all queries. Default is :root/cimv2",
      required => 0,
      default => "root/cimv2",
   },
   timeout  => {
      type  => "=s",
      help  => "Default http timeout for all the queries. Default is 120",
      required => 0,
      default => "120"
   }
);
 
Opts::set_option('protocol', 'http');
Opts::set_option('servicepath','/wsman');
Opts::set_option('portnumber', '80');
Opts::add_options(%opts);
Opts::parse();
 
# validate() would use STDIN for input of username and password
# This should not be done. Instead print the usage and terminate
if(!Opts::get_option('username') || !Opts::get_option('password')) {
	print "ERROR: The options username or password are not set\n";
	Opts::usage();
	exit($ERRORS{UNKNOWN})
}
 
Opts::validate();
 
 
################################################################################
# Main
################################################################################
 
# Connect to ESX host
createConnection();
 
# Get hardware information
my @hw = @{getStorageHardware()};
 
# Catch no hardware information error
if($#hw <= 0) {
	$output = 'No storage Hardware information found';
	$exitCode = $ERRORS{UNKNOWN};
}
 
# Loop all hardware devices and build the output string
my $elemCode = 0;
foreach my $hw (@hw) {
	# DEBUG:
	#print $hw->{instanceName}."\n";
	#print $hw->{elementName}."\n";
	#print $hw->{healthStatus}."\n";
	#print $hw->{operationalStatus}."\n";
 
	# Translate VMware health status to Nagios status code
	$elemCode = $HEALTHSTATUS2NAGIOSCODE{$hw->{healthStatus}};
 
	# Build summary output
	$output .=  $ERRORCODES{$elemCode} . ': '. $hw->{elementName}."\n";
 
	# Build summary status
	if($elemCode > $exitCode) {
		$exitCode = $elemCode;
	}
}
 
# Print the Nagios output
if($perfdata ne '') {
	$output .= ' | '.$perfdata
}
print $ERRORCODES{$exitCode}. ': Summary status is ' . $ERRORCODES{$exitCode} . ". " .
      "For details take a look at the long output.\n" . $output . "\n";
exit($exitCode);
 
################################################################################
# Subs
################################################################################
 
sub getStorageHardware {
	my @ret = ();
 
	my $healthStatus = "";
	my $operationalStatus = "";
	my $instanceName = "";
	my $elementName = "";
 
	# Loop all classes which should be queried
	foreach my $class (@classes) {
		# Read all instances of the class
		my @details = $client->EnumerateInstances(class_name => $class);
 
		# Loop all elements in the instance
		foreach my $elem (@details) {
			# Don't handle empty elements
			if($elem && $elem ne "") {
				# Instance name is the type of the object
				$instanceName = (keys(%{$elem}))[0];
 
				# Display Name of the element
				#
				# e.g.
				# HP Smart Array P400 Controller : HPSA1
				# Disk 1 on HPSA1 : Port 1I Box 1 Bay 8 : 136GB : Spare Disk
				$elementName = $elem->{$instanceName}->{ElementName};
 
				# Shorten the display name for nice output
				#if(length($elementName) gt 43) {
				#	$elementName = substr($elementName, 0, 40);
				#	$elementName  = $elementName . "...";
				#}
 
				# Health information available?
				# When it is: Gather the status code and translate to VMware status description
				if($elem->{$instanceName}->{HealthState} && exists $healthstatus{$elem->{$instanceName}->{HealthState}}) {
					$healthStatus = $healthstatus{$elem->{$instanceName}->{HealthState}};
				} else {
					$healthStatus = "Unknown";
				}
 
				# Operational status available?
				# When it is: Gather the status code and translate to VMware status description
				if($elem->{$instanceName}->{OperationalStatus} && $elem->{$instanceName}->{OperationalStatus} <= (scalar(@operationalstatus)-1)) {
					$operationalStatus = $operationalstatus[$elem->{$instanceName}->{OperationalStatus}];
				} else {
					$operationalStatus = "Unknown";
				}
 
				push(@ret, {'instanceName' => $instanceName, 'elementName' => $elementName, 'healthStatus' => $healthStatus, 'operationalStatus' => $operationalStatus});
			}
		}
	}
 
	return \@ret;
}
 
sub createConnection {
	# Set the connection parameters from the environment
	my %args = (
	  path => Opts::get_option ('servicepath'),
	  username => Opts::get_option ('username'),
	  password => Opts::get_option ('password'),
	  port => Opts::get_option ('portnumber'),
	  address => Opts::get_option ('server'),
	  namespace => Opts::get_option('namespace'),
	  timeout  => Opts::get_option('timeout')
	);
 
	# Create the connection object in the client.
	$client = WSMan::GenericOps->new(%args);
 
	# Register extra CIM namespaces that the WS-Management server might require.
	$client->register_class_ns(OMC => 'http://schema.omc-project.org/wbem/wscim/1/cim-schema/2',
	                           VMware => 'http://schemas.vmware.com/wbem/wscim/1/cim-schema/2',
	                           ELXHBA => 'http://schemas.emulex.org/wbem/wscim/1/cim-schema/2');
}

Sample output

The simplest way to use the script is to call it like this:

# ./check_esx4_storage.pl --server esx4i-test.mydomain.com --username monitoring --password <PASSWORD>

The output on my test system looks like this:

OK: Summary status is OK. For details take a look at the long output.
OK: HP Smart Array P400 Controller : HPSA1
OK: Disk 1 on HPSA1 : Port 1I Box 1 Bay 8 : 136GB : Spare Disk
OK: Disk 2 on HPSA1 : Port 1I Box 1 Bay 7 : 136GB : Data Disk
OK: Disk 3 on HPSA1 : Port 1I Box 1 Bay 6 : 136GB : Data Disk
OK: Disk 4 on HPSA1 : Port 1I Box 1 Bay 5 : 136GB : Data Disk
OK: Disk 5 on HPSA1 : Port 2I Box 1 Bay 4 : 136GB : Data Disk
OK: Disk 6 on HPSA1 : Port 2I Box 1 Bay 3 : 136GB : Data Disk
OK: Disk 7 on HPSA1 : Port 2I Box 1 Bay 2 : 136GB : Data Disk
OK: Disk 8 on HPSA1 : Port 2I Box 1 Bay 1 : 136GB : Data Disk
OK: Logical Volume 1 on HPSA1 : RAID 5 : 820GB : Disk 2,3,4,5,6,7,8,1

Since this plugin uses multiline output only the line “OK: Summary status is OK. For details take a look at the long output.” will be shown on the status overview page. The long output including all the lines is only shown on the service detail page.

Comments (1) Trackbacks (0)
  1. crushNo Gravatar
    11:51 on October 26th, 2009

    Great script! Thannk you!

    For debian/ubuntu, you ned to install libcrypt-ssleay-perl,libsoap-lite-perl,libuuid-perl,libdata-dump-perl

    Tested on a ml370G6 + P410i

No trackbacks yet.