I noticed a problem with a newly-created freshports daemon script: starting it would sometime freeze the terminal session.
The rc.d script
The rc.d script was fairly straight forward:
#!/bin/sh # $FreeBSD$ # PROVIDE: freshports # REQUIRE: LOGIN cleanvar # KEYWORD: shutdown # # Add the following lines to /etc/rc.conf to enable freshports: # freshports_enable (bool): Set to "NO" by default. # Set it to "YES" to enable freshports # . /etc/rc.subr name="freshports" rcvar=${name}_enable pidfile="/var/run/${name}/${name}.pid" freshports_user="freshports" freshports_command="/usr/local/libexec/freshports-service/freshports.sh" command="/usr/sbin/daemon" load_rc_config $name : ${freshports_enable:=NO} : ${freshports_user:=freshports} : ${freshports_group:=freshports} : ${freshports_syslog_facility:=local3} required_files="/usr/local/etc/freshports/freshports.sh" start_precmd=freshports_prestart freshports_prestart() { # create the file pid, and directory, with correct permissions if [ ! -e ${pidfile} ]; then install -o ${freshports_user} -g ${freshports_group} /dev/null ${pidfile}; else chown ${freshports_user}:${freshports_group} ${pidfile}; fi } command_args="-P ${pidfile} -t ${name} -T ${name} -l ${freshports_syslog_facility} ${freshports_command}" run_rc_command "$1"
Nothing odd there, right?
This first happened a week or so ago. I moved on it and decided to look into later.
But wait, there’s more
This issue came up again today. The system would be processing incoming commits and it would hang. It would get stuck on deleting a file. I’d see this in ps auwwx output:
freshports 8452 0.0 0.0 10980 2336 - IJ 18:54 0:00.00 /bin/rm /var/db/ingress/message-queues/incoming/2020.08.03.11.07.37.000002.ea0b4a2a7ed172b4618e09b74c3182e035cd6de2.xml
I would check the file, it would be no longer on disk. So why is it hanging?
I checked the code in question, and it looked like this:
${RM} ${file}
Hmmm, that’s the only use of ${RM} in the whole script.
Let’s try rm instead.
Nope. That did not help. And my ssh session is frozen. What’s up with that?
Wait, what about rm -f?
That worked!
Ahh, it’s a permissions issue.
The file is in a directory which is chgrp ingress:freshports as shown here:
$ ls -ld /var/db/ingress/message-queues/incoming drwxrwxr-x 2 ingress freshports 51 Aug 3 19:48 /var/db/ingress/message-queues/incoming
But the file is:
-rw-rw-r-- 1 ingress ingress 720 Aug 3 18:54 /var/db/ingress/message-queues/incoming/2020.08.02.16.45.14.000000.fa3e16b820913309f1078dcefb69084a3ee5564b.xml
The facts:
- the script runs as the freshports user
- that user does not have write access on the file
- that user has write access on the directory
I know how this came about, and it is because of an unusual work flow.
But first, a test, to demonstrate
Create a directory where I have read/write permissions.
[dan@empty:~] $ mkdir testing [dan@empty:~] $ sudo chown root:dan testing [dan@empty:~] $ ls -ld testing drwxr-xr-x 2 root dan 2 Aug 3 20:48 testing [dan@empty:~] $ sudo chmod g+w testing [dan@empty:~] $ ls -ld testing drwxrwxr-x 2 root dan 2 Aug 3 20:48 testing
Create a file where I have no write permissions:
[dan@empty:~] $ sudo touch testing/file [dan@empty:~] $ ls -l testing/file -rw-r--r-- 1 root dan 0 Aug 3 20:49 testing/file
Deleting it gets me a prompt:
[dan@empty:~] $ rm testing/file override rw-r--r-- root/dan uarch for testing/file? n
The above override appears in my logs (see below):
Using -f suppresses the prompt:
[dan@empty:~] $ rm -f testing/file [dan@empty:~] $
The logs
In the logs, I would found the follwing. I have removed Aug 3 19:39:22 devgit-ingress01 freshports[51876]: from the start of each line:
'-rw-rw-r-- 1 ingress ingress 720 Aug 3 18:54 /var/db/ingress/message-queues/incoming/2020.08.02.16.45.14.000000.fa3e16b820913309f1078dcefb69084a3ee5564b.xml' 'drwxr-xr-x 2 freshports freshports 2 Jul 17 17:46 /var/db/freshports/message-queues/incoming/' 'drwxr-xr-x 2 freshports freshports 566 Aug 3 19:39 /var/db/freshports/message-queues/recent/' removing /var/db/ingress/message-queues/incoming/2020.08.02.16.45.14.000000.fa3e16b820913309f1078dcefb69084a3ee5564b.xml override rw-rw-r-- ingress/ingress uarch for /var/db/ingress/message-queues/incoming/2020.08.02.16.45.14.000000.fa3e16b820913309f1078dcefb69084a3ee5564b.xml? removal completed
The unusual work flow
Usually, these messages originate in this directory:
$ ls -ld /var/db/ingress/message-queues/spooling/ drwxr-xr-x 2 ingress freshports 2 Aug 3 18:55 /var/db/ingress/message-queues/spooling/
And are then mv‘d to this directory:
$ ls -ld /var/db/ingress/message-queues/incoming/ drwxrwxr-x 2 ingress freshports 2 Aug 3 20:03 /var/db/ingress/message-queues/incoming/
Over the past week or so I’ve been running tests which were dumping messages into the testing and testing-new directories for comparison purposes. I would process the same git commit into XML using two different versions of the same script.
Those directories were chown ingress:ingress.
The files created in those directories were also chown ingress:ingress.
The directories are now chown ingress:freshports.
and that’s it
I should have paid closer attention to the files and that would have clued me in early to the cause of the problem.
For now, the script will do a rm -f and the directories will be chown ingress:freshports.
Thanks for coming to my TED talk.