zfs receive stalled, filesystem 95% full

It seems that when I decided to send a filesystem from one server to another, I neglected to establish sufficient space existed. This morning, before I headed to BSDCan, I found that my server was very sluggish and slow to respond. Nagios was flagging all kinds of errors, some of which I’d never seen before.

Looking at the system in question, I found the system 95% full:

$ zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
system  16.2T  15.6T   692G    95%  1.00x  ONLINE  -

I control-C’d the send and space immediately started becoming available:

$ zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
system  16.2T  14.2T  2.08T    87%  1.00x  ONLINE  -

I saw a number of odd error messages, all related, no doubt, to the lack of disk space. ZFS, and I’m sure all copy-on-write filesystems, need free disk space to operate.

May  5 07:18:21 jester postfix/smtpd[390]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org
May  5 07:18:26 jester postfix/anvil[393]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org
May  5 07:19:51 jester postfix/anvil[733]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org
May  5 07:19:51 jester postfix/anvil[733]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org
May  5 07:19:52 jester postfix/master[6651]: warning: process /usr/local/libexec/postfix/anvil pid 733 exit status 1
May  5 07:19:52 jester postfix/master[6651]: warning: /usr/local/libexec/postfix/anvil: bad command startup -- throttling
May  5 07:20:29 jester postfix/smtpd[767]: fatal: config variable inet_interfaces: host not found: jester.unixathome.org




May  5 08:42:03 slocum postgres[14794]: [2-1] WARNING:  pgstat wait timeout
May  5 08:42:03 slocum postgres[2191]: [328-1] WARNING:  pgstat wait timeout
May  5 08:42:15 slocum postgres[2191]: [329-1] WARNING:  pgstat wait timeout
May  5 08:42:16 slocum postgres[14801]: [2-1] WARNING:  pgstat wait timeout
May  5 08:42:39 slocum postgres[2191]: [330-1] WARNING:  pgstat wait timeout



May  5 08:52:46 webserver postfix/master[3284]: warning: unix_trigger_event: read timeout for service public/pickup
May  5 08:58:39 webserver postfix/master[3284]: warning: unix_trigger_event: read timeout for service public/qmgr


psql: could not translate host name "slocum" to address: hostname nor servname provided, or not known
[: -ne: unexpected operator



May  5 10:51:52 tallboy stunnel: LOG5[7990:34384937984]: Service [ircproxy] accepted connection from 72.94.160.252:51724
May  5 10:59:47 tallboy kernel: sonewconn: pcb 0xfffffe003d3d6dc8: Listen queue overflow: 8 already in queue awaiting acceptance
May  5 10:59:47 tallboy nrpe[21622]: Network server accept failure (53: Software caused connection abort)
May  5 10:59:47 tallboy nrpe[21622]: Daemon shutdown

As time wore on, more and more space freed up:

[dan@slocum:~] $ zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
system  16.2T  14.2T  2.08T    87%  1.00x  ONLINE  -
[dan@slocum:~] $ zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
system  16.2T  14.0T  2.28T    85%  1.00x  ONLINE  -
[dan@slocum:~] $ zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
system  16.2T  10.2T  6.01T    63%  1.00x  ONLINE  -
[dan@slocum:~] $

I’ll look at this again, after BSDCan.

About The Author

Dan Langille

Leave a Comment Cancel Reply

About The Author

Dan Langille

Related Posts

Leave a Comment Cancel Reply