Jul 252013
 

I’ve been using Nagios for a while. I use it to monitor many things, varying from disk space to disk temperature. One of the packages I use for this is net-mgmt/nagios-check_smartmon. This code is getting out of date it seems. According to the timestamp at the top of the file, the last time it was updated was 2006-03-24 10:30:20.

So it’s not surprising that it’s failing to work properly on a few cases. I have encountered one such case.

I have a system with several hard drives, all of which happen to be TOSHIBA (the brand is not important). What is relevant is how those drives are connected. Several are attached to a SATA card

mps0: <LSI SAS2008> port 0xc000-0xc0ff mem 0xfe83c000-0xfe83ffff,0xfe840000-0xfe87ffff irq 44 at device 0.0 on pci1)

While others are attached to the motherboard. The difference can be seen in /var/run/dmesg.boot:

# grep TOSH /var/run/dmesg.boot
ada0:  ATA-8 SATA 3.x device
ada1:  ATA-8 SATA 3.x device
ada2:  ATA-8 SATA 3.x device
ada3:  ATA-8 SATA 3.x device
ada4:  ATA-8 SATA 3.x device
da1:  Fixed Direct Access SCSI-6 device
da4:  Fixed Direct Access SCSI-6 device
da0:  Fixed Direct Access SCSI-6 device
da3:  Fixed Direct Access SCSI-6 device
da2:  Fixed Direct Access SCSI-6 device

Some HDD are presented to the system as ATA devices (see ada(4)), while others are represented as SCSI devices (see da(4)).

It is the devices attached to the SATA card which are presented as SCSI devices:

da0 at mps0 bus 0 scbus0 target 2 lun 0
da0: <ATA TOSHIBA DT01ACA3 ABB0< Fixed Direct Access SCSI-6 device
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)

The problem arises with the da devices, not the ada devices. Here is a working example:

# /usr/local/libexec/nagios/check_smartmon -d /dev/ada0
OK: device is functional and stable (temperature: 33)|TEMP=33;55;60;

And a failure:

# /usr/local/libexec/nagios/check_smartmon -d /dev/da0
Traceback (most recent call last):
  File "/usr/local/libexec/nagios/check_smartmon", line 307, in 
    (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype)
  File "/usr/local/libexec/nagios/check_smartmon", line 216, in parseOutput
    vprint(3, "Health status: %s" % healthStatus)
UnboundLocalError: local variable 'healthStatus' referenced before assignment

The problem seems to be that the system is unable to correctly determine the device type (i.e. ATA versus SCSI). It does contain a special case for FreeBSD SCSI devices, and it attempts to use that. This is where it fails. These are ATA devices, not SCSI. Thus, the extraction of the correct information fails as it is looking for a SCSI format output within ATA output.

Fortunately, the code allows you to specify the device type on the command line:

# /usr/local/libexec/nagios/check_smartmon -h
Usage: check_smartmon [options] device

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -d DEVICE, --device=DEVICE
                        device to check
  -v LEVEL, --verbosity=LEVEL
                        set verbosity level to LEVEL; defaults to 0 (quiet),
                        possible values go up to 3
  -t DEVTYPE, --type=DEVTYPE
                        type of device (ATA|SCSI)
  -w TEMP, --warning-threshold=TEMP
                        set temperature warning threshold to given temperature
                        (defaults to 55)
  -c TEMP, --critical-threshold=TEMP
                        set temperature critical threshold to given
                        temperature (defaults to 60)

Unfortunately, it appears to fail to use this value appropriately. Look at this code:

        # check device type, ATA is default
        vprint(2, "Get device type")
        devtype = options.devtype
        if not devtype:
                devtype = "ATA"

        if device_re.search( device ):
                devtype = "scsi"

options.devtype is optionally assigned from the command line argument -t (or –type). But lines 296 and 297 will overwrite any value set by the command line for a FreeBSD SCSI device. This effectively ignores the -t argument.

My solution is to not do this assignment if devtype is already specified. Here is my code:

        # check device type, ATA is default
        vprint(2, "Get device type")
        devtype = options.devtype
        if not devtype:
                if device_re.search( device ):
                        devtype = "scsi"
                else:
                        devtype = "ATA"

Now, when we specify the device type (using my code in root’s home directory), it works:

# /root/check_smartmon -d /dev/da0 -t ata
OK: device is functional and stable (temperature: 33)|TEMP=33;55;60;

Note that I have specified ata, not ATA. This differs from what the help says: type of device (ATA|SCSI)

My patch, which fixes both the argument overwite issue and the help documentation is:

# diff -u /usr/local/libexec/nagios/check_smartmon /root/check_smartmon
--- /usr/local/libexec/nagios/check_smartmon    2013-07-25 11:40:50.491011205 +0000
+++ /root/check_smartmon   2013-07-25 18:21:39.149894864 +0000
@@ -59,7 +59,7 @@
                         metavar="LEVEL", help="set verbosity level to LEVEL; defaults to 0 (quiet), \
                                         possible values go up to 3")
         parser.add_option("-t", "--type", action="store", dest="devtype", default="ata", metavar="DEVTYPE",
-                        help="type of device (ATA|SCSI)")
+                        help="type of device (ata|scsi)")
         parser.add_option("-w", "--warning-threshold", metavar="TEMP", action="store",
                         type="int", dest="warningThreshold", default=55,
                         help="set temperature warning threshold to given temperature (defaults to 55)")
@@ -290,11 +290,12 @@
         # check device type, ATA is default
         vprint(2, "Get device type")
         devtype = options.devtype
+        vprint(2, "command line supplied device type is: %s" % devtype)
         if not devtype:
-                devtype = "ATA"
-
-        if device_re.search( device ):
-                devtype = "scsi"
+                if device_re.search( device ):
+                        devtype = "scsi"
+                else:
+                        devtype = "ata"

         vprint(1, "Device type: %s" % devtype)

I have added a new debug statement, shown on line 17 of the patch.

Feel free to use this patch anyway you want. No restrictions.

  One Response to “nagios check_smartmon fails with SATA presented as SCSI devices”