check_esx4_storage

On this page you will find the current version of the ESX4 storage check plugin

Idea

We are using the VMware ESXi 4 server on DL380G5 hardware hosted on a USB stick using the internal P400 controller for serving the storage for the Virtual Machines. The solution works very good for us since version 3. Short after the release of version 3 there were some different Nagios plugins released for checking the health of the ESX host and the virtual machines.

It was not possible for us to fetch those information from the ESXi 3. So we had to wait for the release of ESXi 4. And in fact: It is now possible to fetch the storage information from the ESXi.

Solution

<

p>To bring the status information to Nagios I wrote a small Nagios plugin which uses the VMware Infrastructure Perl Toolkit to gather those information from the ESXi servers.

check_esx4_storage-summary check_esx4_storage-detail  

Prerequisites

You will need the VMware Infrastructure Perl Toolkit have installed on your Nagios server to get the plugin working. I installed VIPerl with the howto included in check_esx3 from op5:

Download the latest version of Perl Toolkit from VMware support page. In this example we use VMware-VIPerl-1.6.0-104313.i386.tar.gz, but the instructions should apply to newer versions as well. Upload the file to your Nagios server’s /root dir and execute: cd /root tar xvzf VMware-VIPerl-1.6.0-104313.i386.tar.gz cd vmware-viperl-distrib/ ./vmware-install.pl Follow the on screen instructions, described below: “Creating a new VMware VIPerl Toolkit installer database using the tar4 format. Installing VMware VIPerl Toolkit. You must read and accept the VMware VIPerl Toolkit End User License Agreement to continue. Press enter to display it.” “Read through the License Agreement” “Do you accept? (yes/no) yes “In which directory do you want to install the executable files? [/usr/bin]” The following Perl modules were found on the system but may be too old to work with VIPerl: Crypt::SSLeay Compress::Zlib The installation of VMware VIPerl Toolkit 1.6.0 build-104313 for Linux completed successfully. You can decide to remove this software from your system at any time by invoking the following command: /usr/bin/vmware-uninstall-viperl.pl. Enjoy, –the VMware team Note: “Crypt::SSLeay” and “Compress::Zlib” are not needed for check_esx3 to work.

check_esx4_storage

Copy the file to your nagios/libexec directory, fix owner and make it executable.

#!/usr/bin/perl
# ##############################################################################
# 2009-10-20 Lars Michelsen <lars@vertical-visions.de>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307,
#
# GNU General Public License: http://www.gnu.org/licenses/gpl-2.0.txt
#
# ##############################################################################
# SCRIPT:          check_esx4_storage.pl
# VERSION:         1.0
# AUTHOR:          Lars Michelsen
# DECRIPTION:      Checks the storage health status in VMWare ESX4 servers using
#                  the VMware VIPerl toolkit. The script has been written for
#                  checking HP DL380g5 server with built-in P400 controller
#                  Inspired by the Hardware.pl found on
#                  <http://communities.vmware.com/docs/DOC-10665>
# BUGS:            Please report bugs on <http://www.nagios-portal.org>
# CHANGES:
# 2009-10-20 v1.0  Initial code
# ##############################################################################
 
use strict;
use warnings;
use VMware::VILib;
use WSMan::StubOps;
 
$Util::script_version = "1.0";
 
#
# Nagios specific definitions
#
 
my %ERRORS = ('OK' => 0,
              'WARNING' => 1,
              'CRITICAL' => 2,
              'UNKNOWN' => 3);
 
my %ERRORCODES = (0 => 'OK',
                  1 => 'WARNING',
                  2 => 'CRITICAL',
                  3 => 'UNKNOWN');
 
my %HEALTHSTATUS2NAGIOSCODE = ('Unknown' => 3,
                               'OK' => 0,
                               'Degraded/Warning' => 1,
                               'Minor failure' => 1,
                               'Major failure' => 2,
                               'Critical failure' => 2,
                               'Non-recoverable error' => 2);
 
my $output = '';
my $perfdata = '';
my $exitCode = 0;
 
#
# VMWare API definitions
#
 
my @classes = ("VMware_Controller","VMware_StorageExtent","VMware_StorageVolume","VMware_SASSATAPort");
 
my %healthstatus=(0 => "Unknown", 5 => "OK",
                  10 => "Degraded/Warning",
                  15 => "Minor failure",
                  20 => "Major failure",
                  25 => "Critical failure",
                  30 => "Non-recoverable error");
 
my %hardwaregroup=("VMware_Controller" => "Storage",
                   "VMware_StorageExtent" => "",
                   "VMware_StorageVolume" => "",
                   "VMware_SASSATAPort" => "");
 
my @operationalstatus = ("Unknown", "Other", "OK", "Degraded", "Stressed",
                         "Predictive Failure", "Error", "Non-Recoverable Error",
                         "Starting", "Stopping", "Stopped", "In Service",
                         "No Contact", "Lost Communication", "Aborted", "Dormant",
                         "Supporting Entity in Error", "Completed", "Power Mode",
                         "DMTF Reserved", "Vendor Reserved");
 
# General variable Declaration
 
my $client;
 
my %opts = (
   namespace  => {
      type     => "=s",
      help     => "Namespace for all queries. Default is :root/cimv2",
      required => 0,
      default => "root/cimv2",
   },
   timeout  => {
      type  => "=s",
      help  => "Default http timeout for all the queries. Default is 120",
      required => 0,
      default => "120"
   }
);
 
Opts::set_option('protocol', 'http');
Opts::set_option('servicepath','/wsman');
Opts::set_option('portnumber', '80');
Opts::add_options(%opts);
Opts::parse();
 
# validate() would use STDIN for input of username and password
# This should not be done. Instead print the usage and terminate
if(!Opts::get_option('username') || !Opts::get_option('password')) {
	print "ERROR: The options username or password are not set\n";
	Opts::usage();
	exit($ERRORS{UNKNOWN})
}
 
Opts::validate();
 
 
################################################################################
# Main
################################################################################
 
# Connect to ESX host
createConnection();
 
# Get hardware information
my @hw = @{getStorageHardware()};
 
# Catch no hardware information error
if($#hw <= 0) {
	$output = 'No storage Hardware information found';
	$exitCode = $ERRORS{UNKNOWN};
}
 
# Loop all hardware devices and build the output string
my $elemCode = 0;
foreach my $hw (@hw) {
	# DEBUG:
	#print $hw->{instanceName}."\n";
	#print $hw->{elementName}."\n";
	#print $hw->{healthStatus}."\n";
	#print $hw->{operationalStatus}."\n";
 
	# Translate VMware health status to Nagios status code
	$elemCode = $HEALTHSTATUS2NAGIOSCODE{$hw->{healthStatus}};
 
	# Build summary output
	$output .=  $ERRORCODES{$elemCode} . ': '. $hw->{elementName}."\n";
 
	# Build summary status
	if($elemCode > $exitCode) {
		$exitCode = $elemCode;
	}
}
 
# Print the Nagios output
if($perfdata ne '') {
	$output .= ' | '.$perfdata
}
print $ERRORCODES{$exitCode}. ': Summary status is ' . $ERRORCODES{$exitCode} . ". " .
      "For details take a look at the long output.\n" . $output . "\n";
exit($exitCode);
 
################################################################################
# Subs
################################################################################
 
sub getStorageHardware {
	my @ret = ();
 
	my $healthStatus = "";
	my $operationalStatus = "";
	my $instanceName = "";
	my $elementName = "";
 
	# Loop all classes which should be queried
	foreach my $class (@classes) {
		# Read all instances of the class
		my @details = $client->EnumerateInstances(class_name => $class);
 
		# Loop all elements in the instance
		foreach my $elem (@details) {
			# Don't handle empty elements
			if($elem && $elem ne "") {
				# Instance name is the type of the object
				$instanceName = (keys(%{$elem}))[0];
 
				# Display Name of the element
				#
				# e.g.
				# HP Smart Array P400 Controller : HPSA1
				# Disk 1 on HPSA1 : Port 1I Box 1 Bay 8 : 136GB : Spare Disk
				$elementName = $elem->{$instanceName}->{ElementName};
 
				# Shorten the display name for nice output
				#if(length($elementName) gt 43) {
				#	$elementName = substr($elementName, 0, 40);
				#	$elementName  = $elementName . "...";
				#}
 
				# Health information available?
				# When it is: Gather the status code and translate to VMware status description
				if($elem->{$instanceName}->{HealthState} && exists $healthstatus{$elem->{$instanceName}->{HealthState}}) {
					$healthStatus = $healthstatus{$elem->{$instanceName}->{HealthState}};
				} else {
					$healthStatus = "Unknown";
				}
 
				# Operational status available?
				# When it is: Gather the status code and translate to VMware status description
				if($elem->{$instanceName}->{OperationalStatus} && $elem->{$instanceName}->{OperationalStatus} <= (scalar(@operationalstatus)-1)) {
					$operationalStatus = $operationalstatus[$elem->{$instanceName}->{OperationalStatus}];
				} else {
					$operationalStatus = "Unknown";
				}
 
				push(@ret, {'instanceName' => $instanceName, 'elementName' => $elementName, 'healthStatus' => $healthStatus, 'operationalStatus' => $operationalStatus});
			}
		}
	}
 
	return \@ret;
}
 
sub createConnection {
	# Set the connection parameters from the environment
	my %args = (
	  path => Opts::get_option ('servicepath'),
	  username => Opts::get_option ('username'),
	  password => Opts::get_option ('password'),
	  port => Opts::get_option ('portnumber'),
	  address => Opts::get_option ('server'),
	  namespace => Opts::get_option('namespace'),
	  timeout  => Opts::get_option('timeout')
	);
 
	# Create the connection object in the client.
	$client = WSMan::GenericOps->new(%args);
 
	# Register extra CIM namespaces that the WS-Management server might require.
	$client->register_class_ns(OMC => 'http://schema.omc-project.org/wbem/wscim/1/cim-schema/2',
	                           VMware => 'http://schemas.vmware.com/wbem/wscim/1/cim-schema/2',
	                           ELXHBA => 'http://schemas.emulex.org/wbem/wscim/1/cim-schema/2');
}

Sample output

The simplest way to use the script is to call it like this:

# ./check_esx4_storage.pl --server esx4i-test.mydomain.com --username monitoring --password <PASSWORD>

The output on my test system looks like this:

OK: Summary status is OK. For details take a look at the long output.
OK: HP Smart Array P400 Controller : HPSA1
OK: Disk 1 on HPSA1 : Port 1I Box 1 Bay 8 : 136GB : Spare Disk
OK: Disk 2 on HPSA1 : Port 1I Box 1 Bay 7 : 136GB : Data Disk
OK: Disk 3 on HPSA1 : Port 1I Box 1 Bay 6 : 136GB : Data Disk
OK: Disk 4 on HPSA1 : Port 1I Box 1 Bay 5 : 136GB : Data Disk
OK: Disk 5 on HPSA1 : Port 2I Box 1 Bay 4 : 136GB : Data Disk
OK: Disk 6 on HPSA1 : Port 2I Box 1 Bay 3 : 136GB : Data Disk
OK: Disk 7 on HPSA1 : Port 2I Box 1 Bay 2 : 136GB : Data Disk
OK: Disk 8 on HPSA1 : Port 2I Box 1 Bay 1 : 136GB : Data Disk
OK: Logical Volume 1 on HPSA1 : RAID 5 : 820GB : Disk 2,3,4,5,6,7,8,1

Since this plugin uses multiline output only the line “OK: Summary status is OK. For details take a look at the long output.” will be shown on the status overview page. The long output including all the lines is only shown on the service detail page.

Comments (12) Trackbacks (0)
  1. crushNo Gravatar
    11:51 on October 26th, 2009

    Great script! Thannk you!

    For debian/ubuntu, you ned to install libcrypt-ssleay-perl,libsoap-lite-perl,libuuid-perl,libdata-dump-perl

    Tested on a ml370G6 + P410i

  2. BennyNo Gravatar
    14:18 on January 17th, 2011

    After vSphere Update to 4.1, we get the following error: “401 Unauthorized at /usr/share/perl/5.10/WSMan/WSBasic.pm line 199″

    Has anyone tried the Plugin after the Update to vSphere 4.1?

  3. ScottNo Gravatar
    08:18 on March 3rd, 2011

    Is there a way to run as a non administrative user? I keep getting error “401 Unauthorized at /usr/share/perl/5.10/WSMan/WSBasic.pm line 199″

  4. tobiasNo Gravatar
    10:36 on July 15th, 2011

    Hi,

    Nice Plugin but sometimes I get the follow Error:

    Additional Info:

    (Return code of 104 is out of bounds)

    Any Idea?

  5. tobiNo Gravatar
    09:45 on August 9th, 2011

    Hi, getting the same Error with vSphere 4.1/ connecting to single ESX server: 500 Can’t connect to esx-****:443 (certificate verify failed) at /usr/lib/perl5/5.10.0/WSMan/WSBasic.pm line 199

    Is there away to get around this?

    Best Regards, T.

  6. RalfNo Gravatar
    14:34 on August 24th, 2011

    set this in your Perl Script

    $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;

  7. RalfNo Gravatar
    14:36 on August 24th, 2011

    write this in your Perl Script

    $ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;

  8. van der KampNo Gravatar
    09:12 on November 29th, 2011

    Hi!

    I’m tryin to monitor a DL 380 G4, running on ESXi 4.1.0 (348481), but i only get an “UNKNOWN: Summary status is UNKNOWN. No storage Hardware information found” from the plugin.

    On another Server (DL380 G7) everything works fine.

    Do you have any suggestions? Do I have to configure anything to be able to read the storage status information?

    Best Regards, M. van der Kamp

  9. LaMiNo Gravatar
    19:14 on December 1st, 2011

    Mhm. Have no real idea. Maybe there is no support for the G4 in the VI perl toolkit.

  10. fireskyerNo Gravatar
    16:07 on December 6th, 2011

    Hello lami

    i tried to monitor our hp proliant d460 g1 machines with esxi 5.0

    ./check_esx4_storage.pl –server es.li.gov –username xxx –password xxxx

    but there comes the folowing error:

    UNKNOWN: Summary status is UNKNOWN. For details take a look at the long output. No storage Hardware information found

    is the script still working with esxi 5.0?

    best regards

  11. van der KampNo Gravatar
    08:33 on December 8th, 2011

    Hi, thank you for the quick feedback. I just checked the sensor overview in the hardware tabs inside vsphere client, there are no informations about the storage – so i guess when I see no informations there, than I just can’t read them with the VI perl toolkit.

    Since the server is not on the compatibility list, this is no surprise…

    Thank you anyway! Michael

  12. MilesNo Gravatar
    21:35 on December 29th, 2011

    Has anyone tried this script with ESXi 5? We’ve migrated a couple boxes here which worked at version 4, but not now. (I know it says “esx4″ right in the name.) I’m just curious if anyone’s made it work with 5.

No trackbacks yet.