This post covers my debugging of a self-signed certificate on one of my Bacula instances.
The error message is:
09-Sep 10:08 bacula-dir JobId 358320: Error: tls.c:96 Error with certificate at depth: 0, issuer = /C=US/ST=PA/L=Media/O=BSD Cabal Headquarters/CN=BSD Cabal Headquarters/emailAddress=dan@langille.org, subject = /C=US/ST=PA/O=BSD Cabal Headquarters/CN=r730-03.int.unixathome.org/emailAddress=dan@langille.org, ERR=26:unsupported certificate purpose 09-Sep 10:08 bacula-dir JobId 358320: Error: openssl.c:68 Connect failure: ERR=error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed 09-Sep 10:08 bacula-dir JobId 358320: Fatal error: TLS negotiation failed with FD at "r730-03.int.unixathome.org:9102". 09-Sep 10:08 bacula-dir JobId 358320: Fatal error: bsock.c:520 Packet size=386073346 too big from "Client: r730-03-fd:r730-03.int.unixathome.org:9102". Maximum permitted 1000000. Terminating connection. 09-Sep 10:08 bacula-dir JobId 358320: Fatal error: No Job status returned from FD. 09-Sep 10:08 bacula-dir JobId 358320: Error: Bacula bacula-dir 9.6.7 (10Dec20):
I’ve encountered that unsupported certificate purpose message before:
I always thought it was a server versus client issue. Now I’m not so sure. There was also an ssl-admin issue
In this post:
- FreeBSD 13.2
- ssl-admin-1.3.0
- Bacula 9.6.7 – yes, that is rather outdated; it’s on my list
Trying again
Looking at the cert causing the above problem, I found these attributes under X509v3 extensions. It was these extensions which raised the ssl-admin issue referenced above.
[10:06 r730-03 dvl /usr/local/etc/ssl] % openssl x509 -text -in ssl-admin-1.3.0_1/r730-03.int.unixathome.org.crt| less ... X509v3 extensions: X509v3 Basic Constraints: CA:FALSE X509v3 Key Usage: Digital Signature, Non Repudiation, Key Encipherment, Key Agreement X509v3 Extended Key Usage: TLS Web Client Authentication X509v3 CRL Distribution Points: Full Name: URI:http://CRL_URI ...
NOTE: the above cert was created with ssl-admin-1.3.0
Let’s try a Server cert. I created a new cert using ssl-admin-1.3.0. That certficate contained this:
X509v3 extensions: X509v3 Basic Constraints: CA:FALSE Netscape Cert Type: SSL Server Netscape Comment: ssl-admin (OpenSSL) Generated Server Certificate X509v3 Subject Key Identifier: F8:25:24:10:19:15:09:B7:A9:8C:11:15:85:8D:79:4B:19:3B:33:B7 X509v3 Authority Key Identifier: keyid:A6:78:30:1E:74:D0:24:D7:A8:1A:EF:64:B0:90:8A:5B:7F:12:07:14 DirName:/C=US/ST=PA/L=Media/O=BSD Cabal Headquarters/CN=BSD Cabal Headquarters/emailAddress=dan@langille.org serial:D1:2A:2F:79:92:DF:7D:A2 X509v3 Extended Key Usage: TLS Web Server Authentication X509v3 Key Usage: Digital Signature, Key Encipherment
Interesting. I looked at a few certificates used by other Bacula clients. None of them had X509v3 extensions.
That certificate gives this error:
09-Sep 10:19 bacula-sd-04 JobId 358322: Error: openssl.c:68 Connect failure: ERR=error:1417C086:SSL routines:tls_process_client_certificate:certificate verify failed 09-Sep 10:19 bacula-sd-04 JobId 358322: Fatal error: bnet.c:75 TLS Negotiation failed. 09-Sep 10:19 bacula-sd-04 JobId 358322: Fatal error: TLS negotiation failed with FD at "10.55.0.7:27230" 09-Sep 10:19 bacula-sd-04 JobId 358322: Fatal error: Incorrect authorization key from File daemon at client rejected. For help, please see: http://www.bacula.org/rel-manual/en/problems/Bacula_Frequently_Asked_Que.html 09-Sep 10:19 bacula-sd-04 JobId 358322: Security Alert: Unable to authenticate File daemon
Now the problem occurs when talking to 10.55.0.7 (bacula-sd-04.int.unixathome.org) – that is the destination for this backup.
Is it the authorization key?
Let’s go through this slowly. It’s 6:30 AM on a Saturday morning and I’ve been awake since 5:00 AM with a cough. My brain is not the best today.
This is the only client having trouble with backups. It is also the newest client. It is the one using the new certs.
I have confirmed that the passwords for r730-03-fd in both the bacula-dir configuration and the bacula-fd configuration match. This is confirmed by the status client test shown below.
So is it the authorization key? Or is it the cert?
status
I have confirmed that status client succeeds:
root@bacula:/usr/local/etc/bacula # bconsole Connecting to Director bacula.int.unixathome.org:9101 1000 OK: 103 bacula-dir Version: 9.6.7 (10 December 2020) Enter a period to cancel a command. *status client=r730-03-fd Connecting to Client r730-03-fd at r730-03.int.unixathome.org:9102 r730-03-fd Version: 9.6.7 (10 December 2020) amd64-portbld-freebsd13.2 freebsd 13.2-RELEASE Daemon started 09-Sep-23 10:19. Jobs: run=0 running=0. Heap: heap=0 smbytes=34,698 max_bytes=36,792 bufs=97 max_bufs=110 Sizes: boffset_t=8 size_t=8 debug=0 trace=0 mode=0,0 bwlimit=0kB/s Running Jobs: Director connected using TLS at: 09-Sep-23 10:47 No Jobs running. ==== Terminated Jobs: ==== *
Note the empty job history (under Terminated Jobs) – this client has never run a backup.
A solution
My solution? I went back to my patch mentioned in the ssl-admin issue. Then I created a client cert. It works.
Let’s work this out
Next step, let’s talk about this on the Bacula user mailing lists. Clearly, there is something up which needs sorting out.
Some problems with bacula-sd certs
This section was added on 2023-09-10.
Today I solved another issue with certs.bacula-sd can use a certificate with X509v3 extensions but only if it is not the destination for a copy job. I suspect migrate jobs would also have the same problem, but I have not tested it.
This is the error I see when using a X509v3 extensions with the receiving bacula-sd:
10-Sep 19:33 bacula-dir JobId 358356: Warning: FileSet MD5 digest not found. 10-Sep 19:33 bacula-dir JobId 358356: The following 1 JobId was chosen to be copied: 344190 10-Sep 19:33 bacula-dir JobId 358356: Copying using JobId=344190 Job=ansible.2022-09-04_03.05.00_07 10-Sep 19:33 bacula-dir JobId 358356: Start Copying JobId 358356, Job=CopyToSD04.2023-09-10_19.33.13_46 10-Sep 19:33 bacula-dir JobId 358356: Using Device "vDrive-FullFile-0" to read. 10-Sep 19:33 bacula-sd-01-sd JobId 358356: Error: openssl.c:68 Connect failure: ERR=error:1417C086:SSL routines:tls_process_client_certificate:certificate verify failed 10-Sep 19:33 bacula-sd-01-sd JobId 358356: Fatal error: bnet.c:75 TLS Negotiation failed. 10-Sep 19:33 bacula-sd-01-sd JobId 358356: Fatal error: TLS negotiation failed with FD at "10.55.0.7:46199" 10-Sep 19:33 bacula-sd-01-sd JobId 358356: Fatal error: Incorrect authorization key from File daemon at client rejected. For help, please see: http://www.bacula.org/rel-manual/en/problems/Bacula_Frequently_Asked_Que.html 10-Sep 19:33 bacula-sd-01-sd JobId 358356: Security Alert: Unable to authenticate File daemon 10-Sep 19:33 bacula-dir JobId 358356: Error: Bacula bacula-dir 9.6.7 (10Dec20):
In this job:
- bacula-sd-01-sd is the origin of the job to be copied.
- 10.55.0.7 is bacula-sd-04, the destination of the copy.
I already had copy jobs working from bacula-sd-01-sd to bacula-sd-03-sd, so I checked the certificate on the latter. It did not contain X509v3 extensions.
Here is the successful job:
10-Sep 20:08 bacula-dir JobId 358363: Warning: FileSet MD5 digest not found. 10-Sep 20:08 bacula-dir JobId 358363: The following 1 JobId was chosen to be copied: 344190 10-Sep 20:08 bacula-dir JobId 358363: Copying using JobId=344190 Job=ansible.2022-09-04_03.05.00_07 10-Sep 20:08 bacula-dir JobId 358363: Start Copying JobId 358363, Job=CopyToSD04.2023-09-10_20.08.28_01 10-Sep 20:08 bacula-dir JobId 358363: Using Device "vDrive-FullFile-0" to read. 10-Sep 20:08 bacula-sd-01-sd JobId 358363: Ready to read from volume "FullAuto-8043" on File device "vDrive-FullFile-0" (/usr/local/bacula/volumes/FullFile). 10-Sep 20:08 bacula-sd-01-sd JobId 358363: Forward spacing Volume "FullAuto-8043" to addr=252 10-Sep 20:08 bacula-sd-01-sd JobId 358363: End of Volume "FullAuto-8043" at addr=16334511 on device "vDrive-FullFile-0" (/usr/local/bacula/volumes/FullFile). 10-Sep 20:08 bacula-sd-01-sd JobId 358363: Elapsed time=00:00:01, Transfer rate=16.20 M Bytes/second 10-Sep 20:10 bacula-dir JobId 358363: Bacula bacula-dir 9.6.7 (10Dec20): Build OS: amd64-portbld-freebsd13.2 freebsd 13.2-RELEASE Prev Backup JobId: 344190 Prev Backup Job: ansible.2022-09-04_03.05.00_07 New Backup JobId: 358364 Current JobId: 358363 Current Job: CopyToSD04.2023-09-10_20.08.28_01 Backup Level: Full Client: crey-fd FileSet: "EmptyCopyToTape" 2011-02-20 20:53:31 Read Pool: "FullFile" (From Job resource) Read Storage: "bacula-sd-01-FullFile" (From Pool resource) Write Pool: "FullFile-04" (From Job resource) Write Storage: "bacula-sd-04-FullFile" (From Job resource) Catalog: "MyCatalog" (From Client resource) Start time: 10-Sep-2023 20:08:31 End time: 10-Sep-2023 20:10:32 Elapsed time: 2 mins 1 sec Priority: 10 SD Files Written: 3,996 SD Bytes Written: 16,204,845 (16.20 MB) Rate: 133.9 KB/s Volume name(s): FullAuto-04-14772 Volume Session Id: 3 Volume Session Time: 1694089198 Last Volume Bytes: 16,334,522 (16.33 MB) SD Errors: 0 SD termination status: OK Termination: Copying OK 10-Sep 20:10 bacula-dir JobId 358363: Begin pruning Jobs older than 3 years . 10-Sep 20:10 bacula-dir JobId 358363: No Jobs found to prune. 10-Sep 20:10 bacula-dir JobId 358363: Begin pruning Files. 10-Sep 20:10 bacula-dir JobId 358363: No Files found to prune. 10-Sep 20:10 bacula-dir JobId 358363: End auto prune.
And this output comes in another email:
10-Sep 20:08 bacula-dir JobId 358364: Using Device "vDrive-FullFile-0" to write. 10-Sep 20:08 bacula-sd-04 JobId 358364: Labeled new Volume "FullAuto-04-14772" on File device "vDrive-FullFile-0" (/usr/local/bacula/volumes/FullFile). 10-Sep 20:08 bacula-sd-04 JobId 358364: Wrote label to prelabeled Volume "FullAuto-04-14772" on File device "vDrive-FullFile-0" (/usr/local/bacula/volumes/FullFile) 10-Sep 20:08 bacula-dir JobId 358364: Max Volume jobs=1 exceeded. Marking Volume "FullAuto-04-14772" as Used. 10-Sep 20:10 bacula-sd-04 JobId 358364: Elapsed time=00:02:00, Transfer rate=135.0 K Bytes/second 10-Sep 20:10 bacula-sd-04 JobId 358364: Sending spooled attrs to the Director. Despooling 1,139,833 bytes ...
Current theory
The receiving SD is acting as an FD with respect to the sending SD:
TLS negotiation failed with FD at "10.55.0.7:46199"
10.55.0.7 is the receiving SD. The port number, not being 9103, tells me the connection was initiated by the receiving SD. I suspect the connection fails because of the X509v3 extensions, much like they do if a bacula-fd certificate contains them. In both cases, the node is acting like an FD and the extensions on the cert do not permit it to act like like a client. I suspect the underlying error is unsupported certificate purpose, much like the one at the top of this post.
Hopefully, we can get this sorted out within the Bacula project.
Which ports are used?
Let’s look at the port connections during multiple concurrent copy jobs.
This is on the receiving host, and as you can see, it has both incoming and outgoing connections on port 9103.
- 10.55.0.4 – bacula-dir
- 10.55.0.36 – bacula-sd-01 – the source of the copy jobs
- 10.55.0.7 – bacula-sd-04 – the destination of the copy jobs
[20:59 bacula-sd-04 dvl /usr/local/etc/ssl/ssl-admin-my-patches-option-4] % sockstat -4 -p 9103 USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS bacula bacula-sd 17310 5 tcp4 10.55.0.7:9103 *:* bacula bacula-sd 17310 7 tcp4 10.55.0.7:9103 10.55.0.4:53343 bacula bacula-sd 17310 8 tcp4 10.55.0.7:9103 10.55.0.4:53344 bacula bacula-sd 17310 9 tcp4 10.55.0.7:9103 10.55.0.4:53350 bacula bacula-sd 17310 10 tcp4 10.55.0.7:9103 10.55.0.4:53351 bacula bacula-sd 17310 12 tcp4 10.55.0.7:9103 10.55.0.4:53363 bacula bacula-sd 17310 13 tcp4 10.55.0.7:9103 10.55.0.4:53364 bacula bacula-sd 17310 15 tcp4 10.55.0.7:9103 10.55.0.4:53367 bacula bacula-sd 17310 16 tcp4 10.55.0.7:9103 10.55.0.4:53368 bacula bacula-sd 17310 17 tcp4 10.55.0.7:9103 10.55.0.4:53374 bacula bacula-sd 17310 18 tcp4 10.55.0.7:10363 10.55.0.36:9103 bacula bacula-sd 17310 19 tcp4 10.55.0.7:10361 10.55.0.36:9103 bacula bacula-sd 17310 20 tcp4 10.55.0.7:10362 10.55.0.36:9103 bacula bacula-sd 17310 21 tcp4 10.55.0.7:10364 10.55.0.36:9103 bacula bacula-sd 17310 24 tcp4 10.55.0.7:60596 10.55.0.36:9103 bacula bacula-sd 17310 25 tcp4 10.55.0.7:12168 10.55.0.36:9103 bacula bacula-sd 17310 26 tcp4 10.55.0.7:49459 10.55.0.36:9103 bacula bacula-sd 17310 27 tcp4 10.55.0.7:64775 10.55.0.36:9103 bacula bacula-sd 17310 30 tcp4 10.55.0.7:62149 10.55.0.36:9103
As you can see,