Earlier today, I saw a bunch of emails like this:
for <root@localhos>; Wed, 21 Jan 2015 20:32:36 +0000 (UTC) From: root@localhost.example.org (Bacula) Subject: Bacula: Copy Fatal Error of crey-fd Full Sender: bacula@bacula.example.org To: root@localhost.example.org Date: Wed, 21 Jan 2015 20:32:36 +0000 (UTC) Message-Id: <20150121203236.E993D2B28@bacula.example.org> X-Gm-Original-To: dan@example.org 21-Jan 20:32 bacula-dir JobId 197313: Warning: FileSet MD5 digest not found. 21-Jan 20:32 bacula-dir JobId 197313: The following 1 JobId was chosen to be copied: 197238 21-Jan 20:32 bacula-dir JobId 197313: Copying using JobId=197238 Job=fileserver_basic.2015-01-21_05.55.00_15 21-Jan 20:32 bacula-dir JobId 197313: Bootstrap records written to /usr/local/bacula/working/bacula-dir.restore.152.bsr 21-Jan 20:32 bacula-dir JobId 197313: Start Copying JobId 197313, Job=CopyToTape-Inc-Just-One.2015-01-21_20.32.33_31 21-Jan 20:32 bacula-dir JobId 197313: Using Device "CreyFile" to read. 21-Jan 20:32 crey-sd JobId 197313: Ready to read from volume "IncrAuto-4354" on file device "CreyFile" (/usr/local/bacula/volumes). 21-Jan 20:32 crey-sd JobId 197313: Forward spacing Volume "IncrAuto-4354" to file:block 0:3419975122. 21-Jan 20:32 crey-sd JobId 197313: Error: bsock.c:432 Wrote 65540 bytes to client:Hostname not found:9103, but only 16384 accepted. 21-Jan 20:32 crey-sd JobId 197313: Fatal error: read.c:267 Error sending to File daemon. ERR=Broken pipe 21-Jan 20:32 crey-sd JobId 197313: Elapsed time=00:00:01, Transfer rate=1.362 M Bytes/second 21-Jan 20:32 crey-sd JobId 197313: Error: bsock.c:370 Socket has errors=1 on call to client:Hostname not found:9103 21-Jan 20:32 crey-sd JobId 197313: Fatal error: fd_cmds.c:138 Read data not accepted 21-Jan 20:32 crey-sd JobId 197313: Error: bsock.c:370 Socket has errors=1 on call to client:Hostname not found:9103 21-Jan 20:32 bacula-dir JobId 197313: Error: Bacula bacula-dir 7.0.5 (28Jul14): Build OS: amd64-portbld-freebsd9.3 freebsd 9.3-RELEASE-p7 Prev Backup JobId: 197238 Prev Backup Job: fileserver_basic.2015-01-21_05.55.00_15 New Backup JobId: 197314 Current JobId: 197313 Current Job: CopyToTape-Inc-Just-One.2015-01-21_20.32.33_31 Backup Level: Full Client: crey-fd FileSet: "EmptyCopyToTape" 2011-02-20 20:53:31 Read Pool: "IncrFile" (From Job resource) Read Storage: "CreyFile" (From Pool resource) Write Pool: "Incrementals" (From Pool's NextPool resource) Write Storage: "CompaqStorageWorksTapeLibrary" (From Pool's NextPool resource) Catalog: "MyCatalog" (From Client resource) Start time: 21-Jan-2015 20:32:36 End time: 21-Jan-2015 20:32:36 Elapsed time: 0 secs Priority: 410 SD Files Written: 21 SD Bytes Written: 1,362,218 (1.362 MB) Rate: 0.0 KB/s Volume name(s): Volume Session Id: 2 Volume Session Time: 1421869857 Last Volume Bytes: 0 (0 B) SD Errors: 3 SD termination status: Error Termination: *** Copying Error ***
This meant all my copy jobs were failing.
I started looking at permissions because I had recently run mergemaster on the jail in which bacula-sd ran.
No luck. That all looked good.
Eventually, I look at groups.
$ id bacula uid=910(bacula) gid=910(bacula) groups=910(bacula)
That’s it; bacula is not part of the operator group.
To confirm this, I restored /etc/group from a few days ago. Then I did a diff:
[root@knew:/tmp/bacula-restores/usr/jails/crey/.zfs/snapshot/snapshot-for-backup/etc] # diff group /usr/jails/crey/etc/group 1c1 < # $FreeBSD: release/9.1.0/etc/group 218046 2011-01-28 22:28:12Z pjd $ --- > # $FreeBSD: releng/9.3/etc/group 256102 2013-10-07 08:19:41Z des $ 8c8 < operator:*:5:root,bacula --- > operator:*:5:root
BINGO!
That is confirmation. I amended the file, restarted the jail, and everything was back to normal.