MegaRAID Nagios Script

Last year we bought some Dell PowerEdge 2850 servers with a PERC 4e/DC RAID controller. It's based on the LSI MegaRAID chipset which we really liked. It's fast which is great, although so far it hasn't be entirely stable, which is greatly annoying. To that end, I was tasked with get a Nagios script in place to monitor the array and alert us if it fails (again!).

With the server came a disk with some utilities. One of those is the MegaServ and its corresponding MegaCtrl. It seems like a good idea, but the blasted thing doesn't work in any sane manner. It generates alerts for things like how many percent a rebuild is at and when the battery starts charging. It can get bad. Worse still is that it stopped sending any alerts.

But today I found another utility for Dell. It's an extension to snmpd, named percsnmp that polls a daemon for the current status of the controllers. It's great and so full of good info. For now I'm just looking at the online state, but given all these other fields I may have further uses for it. Most especially I'm interested in the settable rebuildRateInPercent field since rebuild rate can't be set through the megamgr (a copy of the BIOS-level tool).

#!/usr/bin/perl

use strict;

my $community="public";
my @error;
my @warning;
my $count;

open(SNMP, "snmpwalk -m ALL -v1 -c $community localhost .1.3.6.1.4.1.3582.1.1.3.1.4|");
while(my $snmp = ){
        $count++;
        chomp $snmp;
        $snmp =~ m/RAID-Adapter-MIB::state.(\d*)\.(\d*)\.(\d*) = INTEGER: (.*)\(\d*\)/;
        my $adapter = $1;
        my $channel = $2;
        my $lun = $3;
        my $state = $4;

        if ($state eq 'failed'){
                push @error, "drive $adapter,$channel,$lun is $state";
        }
        elsif ($state eq 'rebuild'){
                push @warning, "drive $adapter,$channel,$lun is $state";
        }
}
close(SNMP);

if (@error){
        print "CRITICAL: " . join(", ", @error, @warning) . "\n";
}
elsif (@warning){
        print "WARNING: " . join(", ", @warning) . "\n";
}
elsif ($count 

You'll also have to install the MIB to /usr/share/snmp/mibs/PERC-MIB.txt. I had to make a slight change to it. I didn't really take the time to figure out why snmpwalk was complaining about the original syntax, so I simply pruned out the offending entries. Homefully you have better luck, but if not, here's the diff.

--- perc.mib    2006-10-09 21:07:46.000000000 -0600
+++ /usr/share/snmp/mibs/PERC-MIB.txt   2006-10-09 14:35:14.000000000 -0600
@@ -780,35 +780,6 @@
                "Other Errors Occurred on this Drive Since it was Configured."
 ::= { physicaldriveEntry 15 }

--- Added in version 1.12
-writeCache  OBJECT-TYPE
-       SYNTAX  INTEGER {
-               not_available(0),
-               enabled(1),
-               disabled(2)
-               }
-       ACCESS  read-only
-       STATUS  optional
-       DESCRIPTION
-               "Write cache for SCSI device."
-::= { physicaldriveEntry 16 }
-transferSpeed OBJECT-TYPE
-       SYNTAX  INTEGER {
-               fastest_possible(0),
-               asynchronous(1),
-               5(2),
-               10(3),
-               20(4),
-               40(5),
-               80(6),
-               160(7),
-               320(8)
-               }
-       ACCESS  read-only
-       STATUS  optional
-       DESCRIPTION
-               "Physical Drive transfer rate."
-::= { physicaldriveEntry 17 }

 datatransferWidth OBJECT-TYPE
        SYNTAX  INTEGER {

Download check_megaraid and perc.mib.diff

Subscribe to Comments for "MegaRAID Nagios Script" Subscribe to zmonkey.org - All comments