Using FreeBSD’s daemon(8)? Consider -r

FreshPorts make use of a simple python daemon (fp_listen). It has been in use since at least 2006. It was (I think) vermaden who mentioned it (I can’t find the reference) and it triggered an idea.

The role of fp_listen is to listen for backend notifications and respond accordingly. One of its primary goals is to clear the front end cache as required. Part of that strategy involves a persistent connection to the PostgreSQL database running on AWS and uses PostgreSQL running on RDS.

From time to time, the persistent connection is lost and fp_listen dies, like it did last night:

System Events
=-=-=-=-=-=-=
Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last):
Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 286, in <module>
Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]:     conn.poll()
Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: SSL connection has been closed unexpectedly
Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: 
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last):
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module>
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]:     conn = psycopg2.connect(DSN)
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: FATAL:  the database system is shutting down
Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: 
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last):
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module>
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]:     conn = psycopg2.connect(DSN)
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: FATAL:  the database system is shutting down
Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: 

# the above repeats

Jul 16 08:34:35 aws-1-nginx01 fp_listen[3805]: 
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last):
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module>
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]:     conn = psycopg2.connect(DSN)
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: could not connect to server: Connection refused
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: Is the server running on host "pg01.cqor9jd5vvww.us-east-1.rds.amazonaws.com" (52.202.233.11/32) and accepting
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: TCP/IP connections on port 5432?
Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: 
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last):
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module>
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]:     conn = psycopg2.connect(DSN)
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]:   File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]:     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: FATAL:  the database system is starting up
Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]:

Usually this is the result of backend maintenance. This particular death took about two minutes.

This may have happened perhaps 4 times since I started using PostgreSQL RDS.

The quick solution, add -r to the rc.d script for fp_listen.

command_args="-P ${pidfile} -r -t ${name} -T ${name} -l ${fp_listen_syslog_facility} ${fp_listen_command}"

From the man page:

command_args="-P ${pidfile} -r -t ${name} -T ${name} -l ${fp_listen_syslog_facility} ${fp_listen_command}"

Looking at the jail in question:

[16:22 aws-1-nginx01 dan ~] % ps auwwx | grep listen
root        3714  0.0  0.1  21068   3380  -  SsJ  24Jun23   0:04.45 sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
freshports  3805  0.0  0.0  12816   1296  -  IsJ  24Jun23   0:00.02 daemon: fp_listen[10662] (daemon)
freshports 10662  0.0  0.5  40156  19172  -  SJ   08:34     0:02.53 /usr/local/bin/python /usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py (python3.9)
dan        95353  0.0  0.0   8716   1860  0  R+J  16:22     0:00.00 grep listen

We can see that daemon start on 24 Jun and the fp_listen script started at 8:34 today, which matches up with the last log entry above.

Why not change the code?

Some of you may ask, why not change the python code to detect the lost connection and reconnect?

That would require more coding, more potential for bugs, and more complex code. This is not critical code. This solution works well enough and requires no additional coding.

Other will point at RDS and blame Amazon. That’s not right. They have a window for maintenance. I accept that. No commit data will be lost.

However, I’m happy to read patches or provide the code to be patched.