Jul 072021
 

This started earlier today and I solved it only just now. It took me a while to find out it was duplicate cron jobs, and even longer to find out why.

It started with lockf notifications (if you’ve never heard of lockf before, please read this Twitter thread for examples).

lockf emails

lockf emails

In short, one of those emails said:

lockf: /var/run/periodic.hourly.lock: already locked

I looked to see if there were any long-running jobs which might cause this. Found nothing.

Later, I got two reports of incoming commits already found in the database.

The idea of duplicate cronjobs still hadn’t hit me.

Looking at the cron log, I could see duplicates. Here is the start of the duplication, down to the top of the first hour:

Jul  7 16:21:10 dev-ingress01 /usr/sbin/cron[54957]: (root) CMD (periodic everythreeminutes)
Jul  7 16:21:14 dev-ingress01 /usr/sbin/cron[54961]: (root) CMD (periodic everythreeminutes)
Jul  7 16:22:00 dev-ingress01 /usr/sbin/cron[60508]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:22:00 dev-ingress01 /usr/sbin/cron[60514]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:24:02 dev-ingress01 /usr/sbin/cron[89516]: (root) CMD (periodic everythreeminutes)
Jul  7 16:24:10 dev-ingress01 /usr/sbin/cron[89511]: (root) CMD (periodic everythreeminutes)
Jul  7 16:25:05 dev-ingress01 /usr/sbin/cron[93738]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:25:13 dev-ingress01 /usr/sbin/cron[93736]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:27:04 dev-ingress01 /usr/sbin/cron[2177]: (root) CMD (periodic everythreeminutes)
Jul  7 16:27:13 dev-ingress01 /usr/sbin/cron[2176]: (root) CMD (periodic everythreeminutes)
Jul  7 16:27:37 dev-ingress01 /usr/sbin/cron[3518]: (CRON) WARNING (madvise() failed)
Jul  7 16:27:37 dev-ingress01 /usr/sbin/cron[3523]: (logcheck) CMD (if [ -x /usr/local/sbin/logcheck ]; then nice -n10 /usr/local/sbin/logcheck -R; fi)
Jul  7 16:30:00 dev-ingress01 /usr/sbin/cron[6300]: (root) CMD (periodic everythreeminutes)
Jul  7 16:30:01 dev-ingress01 /usr/sbin/cron[6297]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:30:07 dev-ingress01 /usr/sbin/cron[6292]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:30:13 dev-ingress01 /usr/sbin/cron[6317]: (root) CMD (periodic everythreeminutes)
Jul  7 16:33:00 dev-ingress01 /usr/sbin/cron[16245]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:33:00 dev-ingress01 /usr/sbin/cron[16255]: (root) CMD (periodic everythreeminutes)
Jul  7 16:33:00 dev-ingress01 /usr/sbin/cron[16293]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:33:11 dev-ingress01 /usr/sbin/cron[16295]: (root) CMD (periodic everythreeminutes)
Jul  7 16:35:11 dev-ingress01 /usr/sbin/cron[17606]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:35:14 dev-ingress01 /usr/sbin/cron[17604]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:36:10 dev-ingress01 /usr/sbin/cron[21714]: (root) CMD (periodic everythreeminutes)
Jul  7 16:36:12 dev-ingress01 /usr/sbin/cron[21710]: (root) CMD (periodic everythreeminutes)
Jul  7 16:39:03 dev-ingress01 /usr/sbin/cron[28921]: (root) CMD (periodic everythreeminutes)
Jul  7 16:39:10 dev-ingress01 /usr/sbin/cron[28898]: (root) CMD (periodic everythreeminutes)
Jul  7 16:40:00 dev-ingress01 /usr/sbin/cron[29933]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:40:12 dev-ingress01 /usr/sbin/cron[29968]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:42:05 dev-ingress01 /usr/sbin/cron[38188]: (root) CMD (periodic everythreeminutes)
Jul  7 16:42:07 dev-ingress01 /usr/sbin/cron[38186]: (root) CMD (periodic everythreeminutes)
Jul  7 16:44:00 dev-ingress01 /usr/sbin/cron[39622]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:44:00 dev-ingress01 /usr/sbin/cron[39626]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:45:03 dev-ingress01 /usr/sbin/cron[39952]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:45:03 dev-ingress01 /usr/sbin/cron[39959]: (root) CMD (periodic everythreeminutes)
Jul  7 16:45:10 dev-ingress01 /usr/sbin/cron[39968]: (root) CMD (periodic everythreeminutes)
Jul  7 16:45:14 dev-ingress01 /usr/sbin/cron[39963]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:48:03 dev-ingress01 /usr/sbin/cron[49668]: (root) CMD (periodic everythreeminutes)
Jul  7 16:48:12 dev-ingress01 /usr/sbin/cron[49601]: (root) CMD (periodic everythreeminutes)
Jul  7 16:50:02 dev-ingress01 /usr/sbin/cron[50938]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:50:05 dev-ingress01 /usr/sbin/cron[50924]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:51:04 dev-ingress01 /usr/sbin/cron[54981]: (root) CMD (periodic everythreeminutes)
Jul  7 16:51:11 dev-ingress01 /usr/sbin/cron[54984]: (root) CMD (periodic everythreeminutes)
Jul  7 16:54:00 dev-ingress01 /usr/sbin/cron[61859]: (root) CMD (periodic everythreeminutes)
Jul  7 16:54:11 dev-ingress01 /usr/sbin/cron[61862]: (root) CMD (periodic everythreeminutes)
Jul  7 16:55:00 dev-ingress01 /usr/sbin/cron[63398]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:55:00 dev-ingress01 /usr/sbin/cron[63410]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 16:55:01 dev-ingress01 /usr/sbin/cron[63411]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:55:14 dev-ingress01 /usr/sbin/cron[63400]: (root) CMD (/usr/libexec/atrun)
Jul  7 16:57:06 dev-ingress01 /usr/sbin/cron[71106]: (root) CMD (periodic everythreeminutes)
Jul  7 16:57:14 dev-ingress01 /usr/sbin/cron[71113]: (root) CMD (periodic everythreeminutes)
Jul  7 17:00:00 dev-ingress01 /usr/sbin/cron[77148]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 17:00:00 dev-ingress01 /usr/sbin/cron[77448]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 17:00:04 dev-ingress01 /usr/sbin/cron[77444]: (root) CMD (newsyslog)
Jul  7 17:00:04 dev-ingress01 /usr/sbin/cron[77150]: (root) CMD (newsyslog)
Jul  7 17:00:04 dev-ingress01 /usr/sbin/cron[77449]: (root) CMD (/usr/libexec/atrun)
Jul  7 17:00:05 dev-ingress01 /usr/sbin/cron[77162]: (root) CMD (periodic everythreeminutes)
Jul  7 17:00:06 dev-ingress01 /usr/sbin/cron[77154]: (root) CMD (/usr/libexec/atrun)
Jul  7 17:00:06 dev-ingress01 /usr/sbin/cron[77454]: (root) CMD (periodic everythreeminutes)
Jul  7 17:00:11 dev-ingress01 /usr/sbin/cron[77452]: (root) CMD (periodic hourly)
Jul  7 17:00:13 dev-ingress01 /usr/sbin/cron[77152]: (root) CMD (periodic hourly)

At first, I only noticed the duplication of the periodic everythreeminutes entries.

FreshPorts is to blame

This is the development ingress node for FreshPorts. I suspected the crontab file for the periodic scripts were duplicated.

[dan@dev-ingress01:~] $ cd /usr/local/etc/cron.d
[dan@dev-ingress01:/usr/local/etc/cron.d] $ ls -l
total 9
-rw-r--r--  1 root  wheel  203 Jul  7 19:54 freshports-periodics
[dan@dev-ingress01:/usr/local/etc/cron.d] $ cat freshports-periodics 
#
# NOTE - this file is maintained by a package and local changes will be lost.
# 
*/3	*	*	*	*	root	periodic everythreeminutes
0	*	*	*	*	root	periodic hourly
20      3	9,23	*	*	root	periodic fortnightly
[dan@dev-ingress01:/usr/local/etc/cron.d] $ 

But it was not. I suspected to find a second file in there, or something else.

If I commented out the */3 line, none of the periodic everythreeminutes jobs would run at all.

My conclusion: it wasn’t duplicate jobs (i.e. duplicate entries within a crontab). The crontab entries were being run twice.

Prove it

I had already checked, and there was just one instance of cron running.

To prove my idea, I added this to /etc/crontab:

*/1     *       *       *       *       dan     /usr/bin/lockf -t 0 /tmp/.dan.lockf sleep 20

The idea: the lockf of the first instance would prevent the second instance from running, an email would be generated, and I’d know it was duplication by cron.

Sure enough, the emails came in, one per minute:

testing emails

testing emails

Like the emails from before, they contained very little:

lockf: /tmp/.dan.lockf: already locked

I had no idea.

I consulted IRC.

The IRC theories

Is cron running twice? No, only one instance:

[dan@dev-ingress01:~] $ sudo ps wwx | grep cron
root       26734  0.0  0.0 11328  2596  -  SsJ  19:35   0:00.02 /usr/sbin/cron -J 15 -s
root       30294  0.0  0.0 17376  8164  1  I+J  19:35   0:00.01 sudo tail -F /var/log/cron
root       30295  0.0  0.0 10872  2368  1  S+J  19:35   0:00.11 tail -F /var/log/cron
dan        56843  0.0  0.0 11432  2768  3  S+J  20:14   0:00.00 grep cron
[dan@dev-ingress01:~] $

Tried running cron without -J 15? Try that, no change.

Out of curiosity: “crontab -l dan” – no crontab for dan.

Tried * instead of */1? Tried it, no change.

It looks like you’ve already tried restarting crond? Yes, and restarted the jail too.

OH.

Let’s see:

[dan@slocum:~] $ jls | grep dev-ingress
     6  10.55.0.37      dev-ingress01.int.unixathome. /jails/dev-ingress01
    31  10.55.0.37      dev-ingress01.int.unixathome. /jails/dev-ingress01

I have two jail instances running out of the same jail root.

Ummm, yes, I do have two cron running.

Stopping the jails

Here is how I stopped the jails

[dan@slocum:~] $ sudo service jail stop dev-ingress01
Stopping jails: dev-ingress01.

[dan@slocum:~] $ jls | grep dev-ingress
     6  10.55.0.37      dev-ingress01.int.unixathome. /jails/dev-ingress01

[dan@slocum:~] $ sudo service jail stop dev-ingress01
Stopping jails:.
[dan@slocum:~] $ sudo service jail stop 6
Stopping jails: 6.
[dan@slocum:~] $ jls | grep dev-ingress
[dan@slocum:~] $ 

Notice that my second sudo service jail stop dev-ingress01 achieved nothing. I had to stop the jail by jail id.

But is is fixed?

Restarting the jail, and all is well again:

$ sudo tail -30 /var/log/cron
Jul  7 20:30:00 dev-ingress01 /usr/sbin/cron[90688]: (root) CMD (periodic everythreeminutes)
Jul  7 20:33:00 dev-ingress01 /usr/sbin/cron[99442]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 20:33:00 dev-ingress01 /usr/sbin/cron[99454]: (root) CMD (periodic everythreeminutes)
Jul  7 20:35:00 dev-ingress01 /usr/sbin/cron[1185]: (root) CMD (/usr/libexec/atrun)
Jul  7 20:36:00 dev-ingress01 /usr/sbin/cron[5392]: (root) CMD (periodic everythreeminutes)
Jul  7 20:39:00 dev-ingress01 /usr/sbin/cron[10131]: (root) CMD (periodic everythreeminutes)
Jul  7 20:40:00 dev-ingress01 /usr/sbin/cron[11222]: (root) CMD (/usr/libexec/atrun)
Jul  7 20:42:00 dev-ingress01 /usr/sbin/cron[19003]: (root) CMD (periodic everythreeminutes)
Jul  7 20:44:00 dev-ingress01 /usr/sbin/cron[20154]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 20:44:00 dev-ingress01 /usr/sbin/cron[20155]: (anvil) CMD (/usr/local/bin/cert-puller)
Jul  7 20:45:00 dev-ingress01 /usr/sbin/cron[20940]: (root) CMD (/usr/libexec/atrun)
Jul  7 20:45:00 dev-ingress01 /usr/sbin/cron[20946]: (root) CMD (periodic everythreeminutes)
Jul  7 20:48:00 dev-ingress01 /usr/sbin/cron[29943]: (root) CMD (periodic everythreeminutes)
Jul  7 20:50:00 dev-ingress01 /usr/sbin/cron[31434]: (root) CMD (/usr/libexec/atrun)
Jul  7 20:51:00 dev-ingress01 /usr/sbin/cron[35629]: (root) CMD (periodic everythreeminutes)
Jul  7 20:54:00 dev-ingress01 /usr/sbin/cron[41810]: (root) CMD (periodic everythreeminutes)
Jul  7 20:55:00 dev-ingress01 /usr/sbin/cron[43135]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 20:55:00 dev-ingress01 /usr/sbin/cron[43137]: (root) CMD (/usr/libexec/atrun)
Jul  7 20:57:00 dev-ingress01 /usr/sbin/cron[50961]: (root) CMD (periodic everythreeminutes)
Jul  7 21:00:00 dev-ingress01 /usr/sbin/cron[56767]: (root) CMD (newsyslog)
Jul  7 21:00:00 dev-ingress01 /usr/sbin/cron[56769]: (operator) CMD (/usr/libexec/save-entropy)
Jul  7 21:00:00 dev-ingress01 /usr/sbin/cron[56770]: (root) CMD (/usr/libexec/atrun)
Jul  7 21:00:00 dev-ingress01 /usr/sbin/cron[56772]: (root) CMD (periodic hourly)
Jul  7 21:00:00 dev-ingress01 /usr/sbin/cron[56773]: (root) CMD (periodic everythreeminutes)
Jul  7 21:02:00 dev-ingress01 /usr/sbin/cron[66437]: (logcheck) CMD (if [ -x /usr/local/sbin/logcheck ]; then nice -n10 /usr/local/sbin/logcheck; fi)
Jul  7 21:03:00 dev-ingress01 /usr/sbin/cron[93255]: (root) CMD (periodic everythreeminutes)
Jul  7 21:05:00 dev-ingress01 /usr/sbin/cron[95964]: (root) CMD (/usr/libexec/atrun)
Jul  7 21:06:00 dev-ingress01 /usr/sbin/cron[315]: (root) CMD (periodic everythreeminutes)
Jul  7 21:09:00 dev-ingress01 /usr/sbin/cron[5181]: (root) CMD (periodic everythreeminutes)
Jul  7 21:10:00 dev-ingress01 /usr/sbin/cron[6100]: (root) CMD (/usr/libexec/atrun)

How did this happen?

Three days ago, I managed to create a jail which I could not terminate. Today, I rebooted that host.

After the reboot, I went into to fix up my hacks to /etc/jail.conf which allowed me to get another instance of the non-stoppable jail running without rebooting. That involved a different name (dev-ingress01-new for the temp jail.

Today, I changed that jail name back to dev-ingress01, and other things I cannot recall now.

I suspect what I did was not stop the jail first. Perhaps I got the jail name wrong and didn’t notice that it did not stop. Then I started it. jail did not recognize it as the same ‘jail’ and therefore didn’t say it was already running and not start it.

I find edge cases.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive