It seems that when I decided to send a filesystem from one server to another, I neglected to establish sufficient space existed. This morning, before I headed to BSDCan, I found that my server was very sluggish and slow to respond. Nagios was flagging all kinds of errors, some of which I’d never seen before.
Looking at the system in question, I found the system 95% full:
$ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT system 16.2T 15.6T 692G 95% 1.00x ONLINE -
I control-C’d the send and space immediately started becoming available:
$ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT system 16.2T 14.2T 2.08T 87% 1.00x ONLINE -
I saw a number of odd error messages, all related, no doubt, to the lack of disk space. ZFS, and I’m sure all copy-on-write filesystems, need free disk space to operate.
May 5 07:18:21 jester postfix/smtpd[390]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org May 5 07:18:26 jester postfix/anvil[393]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org May 5 07:19:51 jester postfix/anvil[733]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org May 5 07:19:51 jester postfix/anvil[733]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org May 5 07:19:52 jester postfix/master[6651]: warning: process /usr/local/libexec/postfix/anvil pid 733 exit status 1 May 5 07:19:52 jester postfix/master[6651]: warning: /usr/local/libexec/postfix/anvil: bad command startup -- throttling May 5 07:20:29 jester postfix/smtpd[767]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org May 5 08:42:03 slocum postgres[14794]: [2-1] WARNING: pgstat wait timeout May 5 08:42:03 slocum postgres[2191]: [328-1] WARNING: pgstat wait timeout May 5 08:42:15 slocum postgres[2191]: [329-1] WARNING: pgstat wait timeout May 5 08:42:16 slocum postgres[14801]: [2-1] WARNING: pgstat wait timeout May 5 08:42:39 slocum postgres[2191]: [330-1] WARNING: pgstat wait timeout May 5 08:52:46 webserver postfix/master[3284]: warning: unix_trigger_event: read timeout for service public/pickup May 5 08:58:39 webserver postfix/master[3284]: warning: unix_trigger_event: read timeout for service public/qmgr psql: could not translate host name "slocum" to address: hostname nor servname provided, or not known [: -ne: unexpected operator May 5 10:51:52 tallboy stunnel: LOG5[7990:34384937984]: Service [ircproxy] accepted connection from 72.94.160.252:51724 May 5 10:59:47 tallboy kernel: sonewconn: pcb 0xfffffe003d3d6dc8: Listen queue overflow: 8 already in queue awaiting acceptance May 5 10:59:47 tallboy nrpe[21622]: Network server accept failure (53: Software caused connection abort) May 5 10:59:47 tallboy nrpe[21622]: Daemon shutdown
As time wore on, more and more space freed up:
[dan@slocum:~] $ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT system 16.2T 14.2T 2.08T 87% 1.00x ONLINE - [dan@slocum:~] $ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT system 16.2T 14.0T 2.28T 85% 1.00x ONLINE - [dan@slocum:~] $ zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT system 16.2T 10.2T 6.01T 63% 1.00x ONLINE - [dan@slocum:~] $
I’ll look at this again, after BSDCan.