dma: could not pick up queue file

see https://lists.freebsd.org/archives/freebsd-questions/2024-October/005877.html for the full archive.

A while back, I changed my FreeBSD jails from using postfix to using dma. I like postfix, but I don’t want to maintain the app in so many locations. dma is less work. I wrote a short blog post about configuring dma and also blogged about replacing postfix with dma.

All was well.

Except this one log entry which I started to see, on some hosts, not every day.

Oct 18 03:01:00 x8dtu-pg01 dma[dma][92883]: could not pick up queue file:
`/var/spool/dma/Q5993d.326f7fa48050'/`/var/spool/dma/M5993d.326f7fa48050': No such file or directory

After a while, I posted to the FreeBSD Questions mailing list and the Dragonfly users mailing list.

With no replies, 11 days later, I posted: “I’ve decided to start ignoring it.”

That got the ball rolling with John Levine saying the code indicates this is a race condition.

However, I could not figure out what it was racing with. Clearly, it must be itself. But how?

The story from the logs

When it next happened, I provided these logs:

This event occurred today. These logs are slightly amended to obscure host names and email and IP addresses.

Nov  1 00:00:10 wikis newsyslog[46335]: logfile turned over
Nov  1 03:05:42 wikis dma[46d60][90243]: new mail from user=root uid=26 envelope_from=<root@wikis.[redacted]>
Nov  1 03:05:42 wikis dma[dma][90258]: could not pick up queue file: `/var/spool/dma/Q46d60.432a50848050'/`/var/spool/dma/M46d60.432a50848050': No such file or directory
Nov  1 03:05:42 wikis dma[46d60][90243]: mail to=<dan@example.org> queued as 46d60.432a50848050
Nov  1 03:05:42 wikis dma[46d60.432a50848050][90243]: <dan@example.org> trying delivery
Nov  1 03:05:42 wikis dma[46d60.432a50848050][90269]: using smarthost (tallboy.[redacted]:25)
Nov  1 03:05:42 wikis dma[46d60.432a50848050][90269]: trying remote delivery to tallboy.[redacted] [10.0.0.1] pref 0
Nov  1 03:05:42 wikis dma[46d60.432a50848050][90269]: <dan@example.org> delivery successful
Nov  1 03:46:21 wikis dma[46d47][96425]: new mail from user=root uid=26 envelope_from=<root@wikis.[redacted]>

The trouble line is highlighted and has PID 90258. Was this PID just sitting around waiting? I don’t think so. My test of dma showed it start up and stopped. It does not linger.

I am also running this crontab to clear the queues:

[12:36 wikis dvl ~] % cat /usr/local/etc/cron.d/dma
*/30    *       *       *       *       root    /usr/libexec/dma -q

That process isn’t waiting around 5 minutes and 42 seconds to conflict with the above.

A great clue came from Daniel Tameling who suggested periodic scripts.

Checking periodic

I checked cron, and I’m sure it’s not cron:

[13:25 wikis dvl ~] % sudo grep -r dma /var/cron/tabs
[13:25 wikis dvl ~] % sudo grep -r dma /etc/crontab
[13:27 wikis dvl ~] % sudo grep dma /var/log/cron  | egrep -ve ':00:|:30:'  
[13:27 wikis dvl ~] %

Let’s check periodic. His idea turns out to be a likely suspect.

[13:27 wikis dvl ~] % grep -r dma /usr/local/etc/periodic 
[13:29 wikis dvl ~] % grep -r dma /etc/periodic          
/etc/periodic/daily/500.queuerun:	if [ ! -x /usr/sbin/sendmail ]
/etc/periodic/daily/500.queuerun:	    echo '$daily_queuerun_enable is set but /usr/sbin/sendmail' \
/etc/periodic/daily/500.queuerun:	    /usr/sbin/sendmail -q >/dev/null 2>&1 &
/etc/periodic/daily/500.queuerun:		    /usr/sbin/sendmail -q -Ac >/dev/null 2>&1 &
/etc/periodic/daily/150.clean-hoststat:	    echo "Removing stale entries from sendmail host status cache:"
/etc/periodic/monthly/200.accounting:oldmask=$(umask)
/etc/periodic/monthly/200.accounting:umask $oldmask

Looking in those scripts to find the _enable variable, I checked the default values:

[13:33 wikis dvl ~] % grep daily_queuerun_enable /etc/periodic.conf /etc/defaults/periodic.conf 
/etc/defaults/periodic.conf:daily_queuerun_enable="YES"				# Run mail queue

That one runs by default.

[13:34 wikis dvl ~] % grep daily_clean_hoststat_enable  /etc/periodic.conf /etc/defaults/periodic.conf
/etc/periodic.conf:daily_clean_hoststat_enable="NO"
/etc/defaults/periodic.conf:daily_clean_hoststat_enable="YES"			# Purge sendmail host

That one is enabled by default, but disabled on this host.

Looking to see when daily periodic runs:

[13:35 wikis dvl ~] % grep daily /etc/crontab 
# Perform daily/weekly/monthly maintenance.
1	3	*	*	*	root	periodic daily

It’s entirely possible that process 90258 above relates to /etc/periodic/daily/500.queuerun

Not entirely related, but for the record, this is my mail configuration:

[13:54 wikis dvl ~] % cat /etc/mail/mailer.conf
sendmail      /usr/libexec/dma
mailq         /usr/libexec/dma
newaliases    /usr/libexec/dma
rmail         /usr/libexec/dma

Let’s try a fix

Based on the information from the previous, I’ll add this to /etc/periodic.conf. This will disable the periodic script which runs by default.

# disable queuerun_enable to avoid conflict with dma
daily_queuerun_enable="NO"

Now it’s just a matter of time. If there are no more incidents, I say Daniel has solved it. Let’s give it a few weeks. As I write this, it’s been just over 24 hours.

A week later

Here we are, on the morning of 2024-11-10, and another dma queue message has arrived. Fortunately, it is on a host which had not been configured with the above mentioned fix.

Nov 10 03:01:37 dev-pgeu dma[55d33.377910848050][79105]: could not acquire queue file: No such file or directory
Nov 10 03:01:37 dev-pgeu dma[55d33.377910848050][79104]: could not acquire queue file: No such file or directory

A recent post to the mailing list, by Daniel Tameling, mentions that dma is still in the ports tree. Looking at mail/dma/files/pkg-message.in we find this information:

And you can disable some sendmail specific daily maintenance routines in your
/etc/periodic.conf file:

daily_clean_hoststat_enable="NO"
daily_status_mail_rejects_enable="NO"
daily_status_include_submit_mailq="NO"
daily_submit_queuerun="NO"

Very useful to know. I’ll be adding those to my hosts soon.