Aug 142018

This problem was difficult to figure out. The cause was simple, but not obvious.

Messages such as this were appearing in emails

From July:

newsyslog: chmod(/var/log/ in change_attrs: No such file or directory

In August:

newsyslog: chmod(/var/log/auth.log.6.bz2) in change_attrs: No such file or directory

I had no idea why. My initial suspicion was the /etc/newsyslog.conf configuration for that file:

[dan@knew:~] $ grep auth /etc/newsyslog.conf
/var/log/auth.log	root:logcheck	640  7     *  @T00 JC
[dan@knew:~] $ 

That count is 7, so why is it complaining about 6? The files look ok:

[dan@knew:~] $ ls -l /var/log/auth.log*
-rw-r-----  1 root  logcheck  943409 Aug 14 19:09 /var/log/auth.log
-rw-r-----  1 root  logcheck   20036 Aug 14 00:00 /var/log/auth.log.0.bz2
-rw-r-----  1 root  logcheck   21375 Aug 13 00:00 /var/log/auth.log.1.bz2
-rw-r-----  1 root  logcheck   20002 Aug 12 00:00 /var/log/auth.log.2.bz2
-rw-r-----  1 root  logcheck   21031 Aug 11 00:00 /var/log/auth.log.3.bz2
-rw-r-----  1 root  logcheck   20636 Aug  4 00:00 /var/log/auth.log.4.bz2
-rw-r-----  1 root  logcheck   20446 Aug  3 00:00 /var/log/auth.log.5.bz2
-rw-r-----  1 root  logcheck   20370 Aug  2 00:00 /var/log/auth.log.6.bz2
[dan@knew:~] $ 

NOTE: the above ls was run several days after I solved the problem.

I will look, find nothing, and go on with my day.

After some time, I had a clue. quota

$ zfs get quota zroot/var/log
zroot/var/log  quota     3.50G  local

Ahh, that’s got to be it. I bumped it up and the problem went away.

What was using all the space?

Now that I knew this was a space limitation, I looked around at what was taking up all the space. I found it quickly:

[dan@knew:/var/log] $ ls -lt | head
total 3512974
-rw-r--r--  1 root  wheel     22638798824 Aug 10 17:15 netatalk.log
-rw-r--r--  1 root  wheel           80481 Aug 10 17:15 snmpd.log
-rw-r-----  1 root  logcheck        42713 Aug 10 17:15 maillog
-rw-------  1 root  wheel            3635 Aug 10 17:15 cron
-rw-r-----  1 root  logcheck       451826 Aug 10 17:12 auth.log
-rw-r--r--  1 root  wheel           98945 Aug 10 16:20 utx.log
-rw-r--r--  1 root  wheel             848 Aug 10 16:20 messages
-rw-r--r--  1 root  wheel             591 Aug 10 16:20 utx.lastlogin
-rw-------  1 root  wheel          369078 Aug 10 16:09 debug.log

[dan@knew:/var/log] $ ls -lh netatalk.log
-rw-r--r--  1 root  wheel    21G Aug 10 17:16 netatalk.log

Yes, that is 21G of logging. Compression for the win.

Checking the configuration file:

[dan@knew:~] $ head -4 /usr/local/etc/afp.conf
vol preset = default_for_all_vol
log file = /var/log/netatalk.log
log level = default:maxdebug

I remembered enabling debug when I was trying to get my ZFS-based Time Capsule working again.

I waited for any running backups to complete, then changed the file to look like this:

vol preset = default_for_all_vol
log file = /var/log/netatalk.log
log level = default:warn

Then I restarted afp.

I should also rotate that file.

Monitor, monitor, monitor

But wait, there’s more.

Why was I not monitoring quotas?

That led me to a script, check_zfs_quota from Claudiu Vasadi.

Now I am monitoring those quotas. The deployment of that script had some unintended side effects, not related to the script, which took FreshPorts offline for a few hours.

This is what that looks like in Nagios, for one host:

nagios check_zfs_quota in action

nagios check_zfs_quota in action

I think I would prefer a script which:

* detects filesystems containing quotas
* runs this check_zfs_quota on that filesystem

I guess I’ll add that to my list of things to do. EDIT: Done, contributed back to check_zpool_scrub but I am not using it on my Nagios configuration yet.

Website Pin Facebook Twitter Myspace Friendfeed Technorati Digg Google StumbleUpon Premium Responsive