check_esx4_storage
On this page you will find the current version of the ESX4 storage check plugin
Idea
We are using the VMware ESXi 4 server on DL380G5 hardware hosted on a USB stick using the internal P400 controller for serving the storage for the Virtual Machines. The solution works very good for us since version 3. Short after the release of version 3 there were some different Nagios plugins released for checking the health of the ESX host and the virtual machines.
It was not possible for us to fetch those information from the ESXi 3. So we had to wait for the release of ESXi 4. And in fact: It is now possible to fetch the storage information from the ESXi.
Solution
To bring the status information to Nagios I wrote a small Nagios plugin which uses the VMware Infrastructure Perl Toolkit to gather those information from the ESXi servers.
Prerequisites
You will need the VMware Infrastructure Perl Toolkit have installed on your Nagios server to get the plugin working. I installed VIPerl with the howto included in check_esx3 from op5:
Download the latest version of Perl Toolkit from VMware support page.
In this example we use VMware-VIPerl-1.6.0-104313.i386.tar.gz,
but the instructions should apply to newer versions as well.Upload the file to your Nagios server’s /root dir and execute:
cd /root
tar xvzf VMware-VIPerl-1.6.0-104313.i386.tar.gz
cd vmware-viperl-distrib/
./vmware-install.plFollow the on screen instructions, described below:
“Creating a new VMware VIPerl Toolkit installer database using the tar4 format.
Installing VMware VIPerl Toolkit.
You must read and accept the VMware VIPerl Toolkit End User License Agreement to continue.
Press enter to display it.”
“Read through the License Agreement”
“Do you accept? (yes/no)yes
“In which directory do you want to install the executable files? [/usr/bin]”
The following Perl modules were found on the system but may be too old to work
with VIPerl:Crypt::SSLeay
Compress::ZlibThe installation of VMware VIPerl Toolkit 1.6.0 build-104313 for Linux
completed successfully. You can decide to remove this software from your system
at any time by invoking the following command:/usr/bin/vmware-uninstall-viperl.pl.Enjoy,
–the VMware team
Note: “Crypt::SSLeay” and “Compress::Zlib” are not needed for check_esx3 to work.
check_esx4_storage
Copy the file to your nagios/libexec directory, fix owner and make it executable.
#!/usr/bin/perl # ############################################################################## # 2009-10-20 Lars Michelsen <lars@vertical-visions.de> # # This program is free software; you can redistribute it and/or # modify it under the terms of the GNU General Public License # as published by the Free Software Foundation; either version 2 # of the License, or (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, # # GNU General Public License: http://www.gnu.org/licenses/gpl-2.0.txt # # ############################################################################## # SCRIPT: check_esx4_storage.pl # VERSION: 1.0 # AUTHOR: Lars Michelsen # DECRIPTION: Checks the storage health status in VMWare ESX4 servers using # the VMware VIPerl toolkit. The script has been written for # checking HP DL380g5 server with built-in P400 controller # Inspired by the Hardware.pl found on # <http://communities.vmware.com/docs/DOC-10665> # BUGS: Please report bugs on <http://www.nagios-portal.org> # CHANGES: # 2009-10-20 v1.0 Initial code # ############################################################################## use strict; use warnings; use VMware::VILib; use WSMan::StubOps; $Util::script_version = "1.0"; # # Nagios specific definitions # my %ERRORS = ('OK' => 0, 'WARNING' => 1, 'CRITICAL' => 2, 'UNKNOWN' => 3); my %ERRORCODES = (0 => 'OK', 1 => 'WARNING', 2 => 'CRITICAL', 3 => 'UNKNOWN'); my %HEALTHSTATUS2NAGIOSCODE = ('Unknown' => 3, 'OK' => 0, 'Degraded/Warning' => 1, 'Minor failure' => 1, 'Major failure' => 2, 'Critical failure' => 2, 'Non-recoverable error' => 2); my $output = ''; my $perfdata = ''; my $exitCode = 0; # # VMWare API definitions # my @classes = ("VMware_Controller","VMware_StorageExtent","VMware_StorageVolume","VMware_SASSATAPort"); my %healthstatus=(0 => "Unknown", 5 => "OK", 10 => "Degraded/Warning", 15 => "Minor failure", 20 => "Major failure", 25 => "Critical failure", 30 => "Non-recoverable error"); my %hardwaregroup=("VMware_Controller" => "Storage", "VMware_StorageExtent" => "", "VMware_StorageVolume" => "", "VMware_SASSATAPort" => ""); my @operationalstatus = ("Unknown", "Other", "OK", "Degraded", "Stressed", "Predictive Failure", "Error", "Non-Recoverable Error", "Starting", "Stopping", "Stopped", "In Service", "No Contact", "Lost Communication", "Aborted", "Dormant", "Supporting Entity in Error", "Completed", "Power Mode", "DMTF Reserved", "Vendor Reserved"); # General variable Declaration my $client; my %opts = ( namespace => { type => "=s", help => "Namespace for all queries. Default is :root/cimv2", required => 0, default => "root/cimv2", }, timeout => { type => "=s", help => "Default http timeout for all the queries. Default is 120", required => 0, default => "120" } ); Opts::set_option('protocol', 'http'); Opts::set_option('servicepath','/wsman'); Opts::set_option('portnumber', '80'); Opts::add_options(%opts); Opts::parse(); # validate() would use STDIN for input of username and password # This should not be done. Instead print the usage and terminate if(!Opts::get_option('username') || !Opts::get_option('password')) { print "ERROR: The options username or password are not set\n"; Opts::usage(); exit($ERRORS{UNKNOWN}) } Opts::validate(); ################################################################################ # Main ################################################################################ # Connect to ESX host createConnection(); # Get hardware information my @hw = @{getStorageHardware()}; # Catch no hardware information error if($#hw <= 0) { $output = 'No storage Hardware information found'; $exitCode = $ERRORS{UNKNOWN}; } # Loop all hardware devices and build the output string my $elemCode = 0; foreach my $hw (@hw) { # DEBUG: #print $hw->{instanceName}."\n"; #print $hw->{elementName}."\n"; #print $hw->{healthStatus}."\n"; #print $hw->{operationalStatus}."\n"; # Translate VMware health status to Nagios status code $elemCode = $HEALTHSTATUS2NAGIOSCODE{$hw->{healthStatus}}; # Build summary output $output .= $ERRORCODES{$elemCode} . ': '. $hw->{elementName}."\n"; # Build summary status if($elemCode > $exitCode) { $exitCode = $elemCode; } } # Print the Nagios output if($perfdata ne '') { $output .= ' | '.$perfdata } print $ERRORCODES{$exitCode}. ': Summary status is ' . $ERRORCODES{$exitCode} . ". " . "For details take a look at the long output.\n" . $output . "\n"; exit($exitCode); ################################################################################ # Subs ################################################################################ sub getStorageHardware { my @ret = (); my $healthStatus = ""; my $operationalStatus = ""; my $instanceName = ""; my $elementName = ""; # Loop all classes which should be queried foreach my $class (@classes) { # Read all instances of the class my @details = $client->EnumerateInstances(class_name => $class); # Loop all elements in the instance foreach my $elem (@details) { # Don't handle empty elements if($elem && $elem ne "") { # Instance name is the type of the object $instanceName = (keys(%{$elem}))[0]; # Display Name of the element # # e.g. # HP Smart Array P400 Controller : HPSA1 # Disk 1 on HPSA1 : Port 1I Box 1 Bay 8 : 136GB : Spare Disk $elementName = $elem->{$instanceName}->{ElementName}; # Shorten the display name for nice output #if(length($elementName) gt 43) { # $elementName = substr($elementName, 0, 40); # $elementName = $elementName . "..."; #} # Health information available? # When it is: Gather the status code and translate to VMware status description if($elem->{$instanceName}->{HealthState} && exists $healthstatus{$elem->{$instanceName}->{HealthState}}) { $healthStatus = $healthstatus{$elem->{$instanceName}->{HealthState}}; } else { $healthStatus = "Unknown"; } # Operational status available? # When it is: Gather the status code and translate to VMware status description if($elem->{$instanceName}->{OperationalStatus} && $elem->{$instanceName}->{OperationalStatus} <= (scalar(@operationalstatus)-1)) { $operationalStatus = $operationalstatus[$elem->{$instanceName}->{OperationalStatus}]; } else { $operationalStatus = "Unknown"; } push(@ret, {'instanceName' => $instanceName, 'elementName' => $elementName, 'healthStatus' => $healthStatus, 'operationalStatus' => $operationalStatus}); } } } return \@ret; } sub createConnection { # Set the connection parameters from the environment my %args = ( path => Opts::get_option ('servicepath'), username => Opts::get_option ('username'), password => Opts::get_option ('password'), port => Opts::get_option ('portnumber'), address => Opts::get_option ('server'), namespace => Opts::get_option('namespace'), timeout => Opts::get_option('timeout') ); # Create the connection object in the client. $client = WSMan::GenericOps->new(%args); # Register extra CIM namespaces that the WS-Management server might require. $client->register_class_ns(OMC => 'http://schema.omc-project.org/wbem/wscim/1/cim-schema/2', VMware => 'http://schemas.vmware.com/wbem/wscim/1/cim-schema/2', ELXHBA => 'http://schemas.emulex.org/wbem/wscim/1/cim-schema/2'); }
Sample output
The simplest way to use the script is to call it like this:
# ./check_esx4_storage.pl --server esx4i-test.mydomain.com --username monitoring --password <PASSWORD>The output on my test system looks like this:
OK: Summary status is OK. For details take a look at the long output. OK: HP Smart Array P400 Controller : HPSA1 OK: Disk 1 on HPSA1 : Port 1I Box 1 Bay 8 : 136GB : Spare Disk OK: Disk 2 on HPSA1 : Port 1I Box 1 Bay 7 : 136GB : Data Disk OK: Disk 3 on HPSA1 : Port 1I Box 1 Bay 6 : 136GB : Data Disk OK: Disk 4 on HPSA1 : Port 1I Box 1 Bay 5 : 136GB : Data Disk OK: Disk 5 on HPSA1 : Port 2I Box 1 Bay 4 : 136GB : Data Disk OK: Disk 6 on HPSA1 : Port 2I Box 1 Bay 3 : 136GB : Data Disk OK: Disk 7 on HPSA1 : Port 2I Box 1 Bay 2 : 136GB : Data Disk OK: Disk 8 on HPSA1 : Port 2I Box 1 Bay 1 : 136GB : Data Disk OK: Logical Volume 1 on HPSA1 : RAID 5 : 820GB : Disk 2,3,4,5,6,7,8,1
Since this plugin uses multiline output only the line “OK: Summary status is OK. For details take a look at the long output.” will be shown on the status overview page. The long output including all the lines is only shown on the service detail page.








11:51 on October 26th, 2009
Great script! Thannk you!
For debian/ubuntu, you ned to install libcrypt-ssleay-perl,libsoap-lite-perl,libuuid-perl,libdata-dump-perl
Tested on a ml370G6 + P410i