Monitoring backups via Nagios and a shell script

Backups are useless without restores. I’ve written a few posts about Nagios, my current monitoring tool of choice. Included with Nagios are a number of plugins and you can even write your own plugins.

In this post, I’ll show you a shell script I wrote to make sure my backup files turn up where they should, when they should. In my case, these files are database backups, but the idea behind the script is applicable to any list of files you can collect.

But wait, there’s more!

I’ll show you have I keep the N latest backups on disk and let the system keep track of which one one to delete. By now, you should be thinking newsyslog.conf.

The background on my backups

I have a jail, dbclone, which has a single purpose: test every dump of every database by loading it. This ensures I have a backup and that it is functional.

This scripts which accomplish these tasks are triggered by crontab entries. The internals of those scripts aren’t important to the subject of this post.

Backups of your backups

I have a lot of disk space on this server. So much that I keep multiple copies on disk. I’ve just copied the database dump here. Why delete it immediately?

For each directory which contains a database dump, I create a subdirectory, old-backups. Each day, the recently rsync’d dump file is copied to this directory. Multiple copies of this file are retained by making use of newsyslog. More on that later.

Here is the script I use for copying the files to the old-backups directory:

#!/bin/sh
#
# copy each .dump and .sql file in $1 to $2
#

BACKUPDIR=$1
COPYDIR=$2
GLOBALS="globals"

cd ${BACKUPDIR}
# redirect to /dev/null to supress error messages when there is no .dump file
FILES=`ls *.dump *sql 2>/dev/null`

echo creating the databases for $FILES
for i in $FILES
do
  echo cp -p $i ${COPYDIR}
  cp -p $i ${COPYDIR}
done

Here is a crontab entry used to invoke the above script:

0    23   *   *   *   ${HOME}/bin/copy-for-rotation  ${HOME}/backups/bacula  ${HOME}/backups/bacula/old-backups/  > /dev/null

Using newsyslog

newsyslog is typically used for maintaining system logs to a manageable size. However, it can be used for any file you want. I use it for keeping copies of my incoming mail.

The configuration file (/etc/newsyslog.conf) allows for globbing of the file name, and the following entry will rotate all *.dump files in the specified directory.

/usr/home/dan/backups/bacula/old-backups/*.dump  dan:dan  640  10  *  $D23  GB

The script

Here is the script I created (it is also on github).

#!/bin/sh
 
LIST_OF_FILES="/usr/home/dan/bin/list.files.check"
FILES=`/bin/cat ${LIST_OF_FILES}`
 
BACKUPDIR="/usr/home/dan/backups"
 
# 26 and 48 hours
WARN_PRIMARY=93600
CRIT_PRIMARY=172800
 
# 72 and 96 hours
WARN_SECONDARY=259200
CRIT_SECONDARY=345600
 
if [ $1 == 'primary' ]
then
  WARN=$WARN_PRIMARY
  CRIT=$CRIT_PRIMARY
else
  WARN=$WARN_SECONDARY
  CRIT=$CRIT_SECONDARY
fi
 
#default to all OK
answer=0
reply=""
 
for file in ${FILES}
do
  if [ $1 == 'rotated' ]
  then
    dir=`/usr/bin/dirname ${file}`
    file=`/usr/bin/basename ${file}`
    file="${dir}/old-backups/${file}.0"
  fi
#  echo $file
  result=`/usr/local/libexec/nagios/check_file_age -w ${WARN} -c ${CRIT} -f ${BACKUPDIR}/${file}`
  success=$?
  if [ $success == 0 ]
  then
    # all good
  else if [ $success == 1 ]
    then
      # warning
      if [ $answer -lt $success ]
      then
        answer=$success
      fi
      reply="$reply $result"
    else
      # critical
      if [ $answer -lt $success ]
      then
        answer=$success
      fi
      reply="$reply $result"
    fi
  fi
#  echo $success
done
 
if [ $answer == 0 ]
then
  echo 'All OK'
else
  echo $reply
fi
 
return $answer

I won’t go into the details of a Nagios script, but in summary, the return codes are:

  1. OK
  2. Warning
  3. Critical

This script makes use of a Nagios plugin, check_file_age, which is installed by net-mgmt/nagios-plugins.

Screenshots

The following is a list of all the individual file checks I had before I wrote the above script:

Nagios screen shot
A list of the individual file checks for my backup files

Pros and Cons

The advantage to this single script is I can alter the files checked with a simple edit to the file references by LIST_OF_FILES.

The main disadvantage is the granularity of the monitoring. If five files go out of sync, I get only one alert, but I won’t see any updates until all five files sync up. If I had individual alerts, I would get one RECOVERY notice for each file.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top