see https://lists.freebsd.org/archives/freebsd-questions/2024-October/005877.html for the full archive.
A while back, I changed my FreeBSD jails from using postfix to using dma. I like postfix, but I don’t want to maintain the app in so many locations. dma is less work. I wrote a short blog post about configuring dma and also blogged about replacing postfix with dma.
All was well.
Except this one log entry which I started to see, on some hosts, not every day.
Oct 18 03:01:00 x8dtu-pg01 dma[dma][92883]: could not pick up queue file: `/var/spool/dma/Q5993d.326f7fa48050'/`/var/spool/dma/M5993d.326f7fa48050': No such file or directory
After a while, I posted to the FreeBSD Questions mailing list and the Dragonfly users mailing list.
With no replies, 11 days later, I posted: “I’ve decided to start ignoring it.”
That got the ball rolling with John Levine saying the code indicates this is a race condition.
However, I could not figure out what it was racing with. Clearly, it must be itself. But how?
The story from the logs
When it next happened, I provided these logs:
This event occurred today. These logs are slightly amended to obscure host names and email and IP addresses.
Nov 1 00:00:10 wikis newsyslog[46335]: logfile turned over Nov 1 03:05:42 wikis dma[46d60][90243]: new mail from user=root uid=26 envelope_from=<root@wikis.[redacted]> Nov 1 03:05:42 wikis dma[dma][90258]: could not pick up queue file: `/var/spool/dma/Q46d60.432a50848050'/`/var/spool/dma/M46d60.432a50848050': No such file or directory Nov 1 03:05:42 wikis dma[46d60][90243]: mail to=<dan@example.org> queued as 46d60.432a50848050 Nov 1 03:05:42 wikis dma[46d60.432a50848050][90243]: <dan@example.org> trying delivery Nov 1 03:05:42 wikis dma[46d60.432a50848050][90269]: using smarthost (tallboy.[redacted]:25) Nov 1 03:05:42 wikis dma[46d60.432a50848050][90269]: trying remote delivery to tallboy.[redacted] [10.0.0.1] pref 0 Nov 1 03:05:42 wikis dma[46d60.432a50848050][90269]: <dan@example.org> delivery successful Nov 1 03:46:21 wikis dma[46d47][96425]: new mail from user=root uid=26 envelope_from=<root@wikis.[redacted]>
The trouble line is highlighted and has PID 90258. Was this PID just sitting around waiting? I don’t think so. My test of dma showed it start up and stopped. It does not linger.
I am also running this crontab to clear the queues:
[12:36 wikis dvl ~] % cat /usr/local/etc/cron.d/dma */30 * * * * root /usr/libexec/dma -q
That process isn’t waiting around 5 minutes and 42 seconds to conflict with the above.
A great clue came from Daniel Tameling who suggested periodic scripts.
Checking periodic
I checked cron, and I’m sure it’s not cron:
[13:25 wikis dvl ~] % sudo grep -r dma /var/cron/tabs [13:25 wikis dvl ~] % sudo grep -r dma /etc/crontab [13:27 wikis dvl ~] % sudo grep dma /var/log/cron | egrep -ve ':00:|:30:' [13:27 wikis dvl ~] %
Let’s check periodic. His idea turns out to be a likely suspect.
[13:27 wikis dvl ~] % grep -r dma /usr/local/etc/periodic [13:29 wikis dvl ~] % grep -r dma /etc/periodic /etc/periodic/daily/500.queuerun: if [ ! -x /usr/sbin/sendmail ] /etc/periodic/daily/500.queuerun: echo '$daily_queuerun_enable is set but /usr/sbin/sendmail' \ /etc/periodic/daily/500.queuerun: /usr/sbin/sendmail -q >/dev/null 2>&1 & /etc/periodic/daily/500.queuerun: /usr/sbin/sendmail -q -Ac >/dev/null 2>&1 & /etc/periodic/daily/150.clean-hoststat: echo "Removing stale entries from sendmail host status cache:" /etc/periodic/monthly/200.accounting:oldmask=$(umask) /etc/periodic/monthly/200.accounting:umask $oldmask
Looking in those scripts to find the _enable variable, I checked the default values:
[13:33 wikis dvl ~] % grep daily_queuerun_enable /etc/periodic.conf /etc/defaults/periodic.conf /etc/defaults/periodic.conf:daily_queuerun_enable="YES" # Run mail queue
That one runs by default.
[13:34 wikis dvl ~] % grep daily_clean_hoststat_enable /etc/periodic.conf /etc/defaults/periodic.conf /etc/periodic.conf:daily_clean_hoststat_enable="NO" /etc/defaults/periodic.conf:daily_clean_hoststat_enable="YES" # Purge sendmail host
That one is enabled by default, but disabled on this host.
Looking to see when daily periodic runs:
[13:35 wikis dvl ~] % grep daily /etc/crontab # Perform daily/weekly/monthly maintenance. 1 3 * * * root periodic daily
It’s entirely possible that process 90258 above relates to /etc/periodic/daily/500.queuerun
Not entirely related, but for the record, this is my mail configuration:
[13:54 wikis dvl ~] % cat /etc/mail/mailer.conf sendmail /usr/libexec/dma mailq /usr/libexec/dma newaliases /usr/libexec/dma rmail /usr/libexec/dma
Let’s try a fix
Based on the information from the previous, I’ll add this to /etc/periodic.conf. This will disable the periodic script which runs by default.
# disable queuerun_enable to avoid conflict with dma daily_queuerun_enable="NO"
Now it’s just a matter of time. If there are no more incidents, I say Daniel has solved it. Let’s give it a few weeks. As I write this, it’s been just over 24 hours.