I found a problem with the check_pgsql Nagios plugin last week. It can’t handle names such as freshports.org. It’s a valid database name, as witnessed here:
$ psql -l | grep freshports freshports.old | dan | SQL_ASCII | C | C | freshports.org | dan | SQL_ASCII | C | C |
But it doesn’t work:
$ /usr/local/libexec/nagios/check_pgsql -H slocum -l www -d freshports.org check_pgsql: Database name is not valid - freshports.org Usage: check_pgsql [-H] [-P ] [-c ] [-w ] [-t ] [-d ] [-l ] [-p ]
Hmm, what’s up with that. Looking at the source code, I found this comment:
Valid PostgreSQL database names are less than &NAMEDATALEN;
characters long and consist of letters, numbers, and underscores. The
first character cannot be a number, however.
This seems to be true for most identifiers (a database name is an identifier). But you can also use quoted identifiers, such as “freshports.org”. To quote: “Quoted identifiers can contain any character, except the character with code zero.
In the meantime, I’ve created another database and I’m testing that one can connect.
A fix 2021-02-14
7 years later and I hit the same problem again today:
[dan@devgit-ingress01:/usr/local/etc/nrpe.d] $ echo /usr/local/libexec/nagios/check_pgsql -H pg02.int.unixathome.org -d freshports.devgit -l nagios | sudo su -fm nagios check_pgsql: Database name is not valid - freshports.devgit Usage: check_pgsql [-H] [-P ] [-c ] [-w ] [-t ] [-d ] [-l ] [-p ] [-q ] [-C ] [-W ] [-r]
Thomas Hurst suggested patching and I followed up on that. My patch looks like this;
--- plugins/check_pgsql.c.orig 2019-12-04 21:53:08 UTC +++ plugins/check_pgsql.c @@ -439,6 +440,8 @@ is_pg_dbname (char *dbname) char tmp[NAMEDATALEN]; if (strlen (dbname) > NAMEDATALEN - 1) return (FALSE); + return (TRUE); +/* strncpy (txt, dbname, NAMEDATALEN - 1); txt[NAMEDATALEN - 1] = 0; if (sscanf (txt, "%[_a-zA-Z]%[^_a-zA-Z0-9-]", tmp, tmp) == 1) @@ -446,6 +449,7 @@ is_pg_dbname (char *dbname) if (sscanf (txt, "%[_a-zA-Z]%[_a-zA-Z0-9-]%[^_a-zA-Z0-9-]", tmp, tmp, tmp) == 2) return (TRUE); return (FALSE); +*/ } /**
It works:
[dan@devgit-ingress01:/usr/local/etc/nrpe.d] $ echo /usr/local/libexec/nagios/check_pgsql -H pg02.int.unixathome.org -d freshports.devgit -l nagios | sudo su -fm nagios OK - database freshports.devgit (0.020700 sec.)|time=0.020700s;2.000000;8.000000;0.000000
But I’m moving to Bucardo
In Otis’ tweet, Bucardo was also suggested. There is a FreeBSD port and it was already built and in my package repo. I do recall playing with it before, but I’m not sure where that went or why I didn’t deploy it.
Oh wait, let’s see if I have deployed it:
samdrucker=# select * from hostswithpackage('nagios-check_postgres'); hostswithpackage ------------------ (0 rows) samdrucker=#
Nope, it’s not installed anywhere, according to SamDrucker.
I’m just adjusting Ansible scripts now.
Problem within nrpe
I tried using this within net-mgmt/nrpe3 and hit this block:
$ /usr/local/libexec/nagios/check_nrpe3 -H devgit-ingress01 -c check_pgsql NRPE: Unable to read output
Where this was the definition in the nrpe.cfg file:
command[check_pgsql]=/usr/local/libexec/nagios/check_postgres_connection -H pg02.int.unixathome.org -db freshports.devgit --dbuser nagios
I had no idea. I tried adding multiple ‘-v‘ to the command, no help.
I found a known issue which prompted me to add ‘2>&1’ to the end of the above command. Running the command again from my Nagios server, I got this in the debug logs:
[1613332935] CONN_CHECK_PEER: checking if host is allowed: 10.0.0.3 port 9598 [1613332935] Connection from 10.0.0.3 port 9598 [1613332935] is_an_allowed_host (AF_INET): is host >10.55.0.3< an allowed host >10.55.0.3< [1613332935] is_an_allowed_host (AF_INET): host is in allowed host list! [1613332935] Host address is in allowed_hosts [1613332935] Host 10.55.0.3 is asking for command 'check_pgsql' to be run... [1613332935] Running command: /usr/local/libexec/nagios/check_postgres_connection -H pg02.int.unixathome.org -db freshports.devgit --dbuser nagios 2>&1 [1613332935] Command completed with return code 3 and output: env: perl: No such file or directory [1613332935] Return Code: 3, Output: env: perl: No such file or directory [1613332935] Connection from 10.55.0.3 closed.
That was my clue.
[dan@devgit-ingress01:/usr/local/etc/nrpe.d] $ head /usr/local/bin/check_postgres.pl #!/usr/bin/env perl # -*-mode:cperl; indent-tabs-mode: nil; cperl-indent-level: 4 -*- ## Perform many different checks against Postgres databases. ## Designed primarily as a Nagios script. ## Run with --help for a summary. ## ## Greg Sabino Mullane## ## End Point Corporation http://www.endpoint.com/
I changed that first line to #!/usr/local/bin/perl and it started working.
As Otis pointed out, this might indicate that nrpe does not add ${LOCALBASE}/bin to PATH.
Based on Commands executed by NRPE do not run in a shell, Otis suggested using this:
command[check_pgsql]=/usr/bin/env PATH="/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin" /usr/local/libexec/nagios/check_postgres_connection -H pg02.int.unixathome.org -db freshports.devgit --dbuser nagios
That worked:
$ /usr/local/libexec/nagios/check_nrpe3 -H devgit-ingress01 -c check_pgsql POSTGRES_CONNECTION OK: DB "freshports.devgit" (host:pg02.int.unixathome.org) version 12.5 | time=0.02s
I have reverted my mangling of /usr/local/bin/check_postgres.p.