FreshPorts make use of a simple python daemon (fp_listen). It has been in use since at least 2006. It was (I think) vermaden who mentioned it (I can’t find the reference) and it triggered an idea.
The role of fp_listen is to listen for backend notifications and respond accordingly. One of its primary goals is to clear the front end cache as required. Part of that strategy involves a persistent connection to the PostgreSQL database running on AWS and uses PostgreSQL running on RDS.
From time to time, the persistent connection is lost and fp_listen dies, like it did last night:
System Events =-=-=-=-=-=-= Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last): Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 286, in <module> Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: conn.poll() Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: SSL connection has been closed unexpectedly Jul 16 08:32:49 aws-1-nginx01 fp_listen[3805]: Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last): Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module> Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: conn = psycopg2.connect(DSN) Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync) Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: FATAL: the database system is shutting down Jul 16 08:32:51 aws-1-nginx01 fp_listen[3805]: Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last): Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module> Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: conn = psycopg2.connect(DSN) Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync) Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: FATAL: the database system is shutting down Jul 16 08:32:52 aws-1-nginx01 fp_listen[3805]: # the above repeats Jul 16 08:34:35 aws-1-nginx01 fp_listen[3805]: Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last): Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module> Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: conn = psycopg2.connect(DSN) Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync) Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: could not connect to server: Connection refused Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: Is the server running on host "pg01.cqor9jd5vvww.us-east-1.rds.amazonaws.com" (52.202.233.11/32) and accepting Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: TCP/IP connections on port 5432? Jul 16 08:34:36 aws-1-nginx01 fp_listen[3805]: Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: Traceback (most recent call last): Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py", line 267, in <module> Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: conn = psycopg2.connect(DSN) Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: File "/usr/local/lib/python3.9/site-packages/psycopg2/__init__.py", line 122, in connect Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: conn = _connect(dsn, connection_factory=connection_factory, **kwasync) Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]: psycopg2.OperationalError: FATAL: the database system is starting up Jul 16 08:34:37 aws-1-nginx01 fp_listen[3805]:
Usually this is the result of backend maintenance. This particular death took about two minutes.
This may have happened perhaps 4 times since I started using PostgreSQL RDS.
The quick solution, add -r to the rc.d script for fp_listen.
command_args="-P ${pidfile} -r -t ${name} -T ${name} -l ${fp_listen_syslog_facility} ${fp_listen_command}"
From the man page:
command_args="-P ${pidfile} -r -t ${name} -T ${name} -l ${fp_listen_syslog_facility} ${fp_listen_command}"
Looking at the jail in question:
[16:22 aws-1-nginx01 dan ~] % ps auwwx | grep listen root 3714 0.0 0.1 21068 3380 - SsJ 24Jun23 0:04.45 sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd) freshports 3805 0.0 0.0 12816 1296 - IsJ 24Jun23 0:00.02 daemon: fp_listen[10662] (daemon) freshports 10662 0.0 0.5 40156 19172 - SJ 08:34 0:02.53 /usr/local/bin/python /usr/local/lib/python3.9/site-packages/fp-listen/fp-listen.py (python3.9) dan 95353 0.0 0.0 8716 1860 0 R+J 16:22 0:00.00 grep listen
We can see that daemon start on 24 Jun and the fp_listen script started at 8:34 today, which matches up with the last log entry above.
Why not change the code?
Some of you may ask, why not change the python code to detect the lost connection and reconnect?
That would require more coding, more potential for bugs, and more complex code. This is not critical code. This solution works well enough and requires no additional coding.
Other will point at RDS and blame Amazon. That’s not right. They have a window for maintenance. I accept that. No commit data will be lost.
However, I’m happy to read patches or provide the code to be patched.