I recently migrated a bunch of jails from one server to another. Today I attached the Dell TL4000 tape library. A jail on this server copied Bacula backups from disk to tape.
In this post:
- FreeBSD 12.0
- Bacula 9.4.3
- Dell R720
Investigation
As anticipated, I needed to update the server configuration to cope with changed device numbers.
The symptoms: when running the update slots command from within bconsole, I was getting this response.
Connecting to Storage daemon bacula-sd-02 at bacula-sd-02.int.unixathome.org:9103 ... 3306 Issuing autochanger "slots" command. Device "bacula-sd-02" has 0 slots. No slots in changer to scan. You have messages. *m 06-Nov 13:40 bacula-dir JobId 0: Error: openssl.c:68 TLS shutdown failure.: ERR=error:14094123:SSL routines:ssl3_read_bytes:application data after close notify
Looking at the mtx.log on bacula-sd-02, I found:
[dan@bacula-sd-02:~] $ tail -F /var/db/bacula/mtx.log 2019-11-06-13:40 0 Parms: /dev/pass8 slots 0 /dev/nsa0 0 2019-11-06-13:40 0 Doing mtx -f /dev/pass8 -- to get count of slots 2019-11-06-13:40 0 Parms: /dev/pass8 loaded 0 /dev/nsa0 0 2019-11-06-13:40 0 Doing mtx -f /dev/pass8 status -- to find what is loaded 2019-11-06-13:40 0 FAIL: mtx -f /dev/pass8 loaded drv=0 2019-11-06-13:40 0 Parms: /dev/pass8 loaded 0 /dev/nsa0 0 2019-11-06-13:40 0 Doing mtx -f /dev/pass8 status -- to find what is loaded 2019-11-06-13:40 0 FAIL: mtx -f /dev/pass8 loaded drv=0 2019-11-06-13:40 0 Parms: /dev/pass8 slots 0 /dev/nsa0 0 2019-11-06-13:40 0 Doing mtx -f /dev/pass8 -- to get count of slots
When I tried that command manually:
[dan@bacula-sd-02:~] $ sudo mtx -f /dev/pass8 status mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=70 (Current) mtx: Request Sense: Sense Key=Illegal Request mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 24 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: Field in Error = 05 mtx: Request Sense: BPV=yes mtx: Request Sense: Error in CDB=yes mtx: Request Sense: SKSV=yes mtx: Request Sense: Field Pointer = 00 02 Mode sense (0x1A) for Page 0x1D failed mtx: Request Sense: Long Report=yes mtx: Request Sense: Valid Residual=no mtx: Request Sense: Error Code=70 (Current) mtx: Request Sense: Sense Key=Illegal Request mtx: Request Sense: FileMark=no mtx: Request Sense: EOM=no mtx: Request Sense: ILI=no mtx: Request Sense: Additional Sense Code = 20 mtx: Request Sense: Additional Sense Qualifier = 00 mtx: Request Sense: BPV=no mtx: Request Sense: Error in CDB=no mtx: Request Sense: SKSV=no READ ELEMENT STATUS Command Failed
Yeah, I’m pretty sure at this point that devices had changed names.
Current devices
These are the current devices:
[dan@r720-01:~] $ sudo camcontrol devlist <INTEL SSDSA2CW080G3 4PC10362> at scbus4 target 0 lun 0 (pass0,ada0) <INTEL SSDSA2CW080G3 4PC10362> at scbus5 target 0 lun 0 (pass1,ada1) <AHCI SGPIO Enclosure 1.00 0001> at scbus6 target 0 lun 0 (pass2,ses0) <IBM ULT3580-HH4 C7Q1> at scbus7 target 4 lun 0 (pass20,sa1) <IBM ULT3580-HH4 C7Q1> at scbus7 target 5 lun 0 (pass18,sa0) <IBM 3573-TL B.60> at scbus7 target 5 lun 1 (ch0,pass19) <ATA MZ-5EA1000-0D3 7D3Q> at scbus8 target 8 lun 0 (pass3,da0) <ATA MZ-5EA1000-0D3 7D3Q> at scbus8 target 9 lun 0 (pass4,da1) <Pliant LB406M D323> at scbus8 target 10 lun 0 (pass5,da2) <Pliant LB406M D323> at scbus8 target 11 lun 0 (pass6,da3) <Pliant LB406M D323> at scbus8 target 12 lun 0 (pass7,da4) <Pliant LB406M D323> at scbus8 target 13 lun 0 (pass8,da5) <Pliant LB406M D323> at scbus8 target 14 lun 0 (pass9,da6) <Pliant LB406M D323> at scbus8 target 15 lun 0 (pass10,da7) <Pliant LB406M D323> at scbus8 target 16 lun 0 (pass11,da8) <Pliant LB406M D323> at scbus8 target 17 lun 0 (pass12,da9) <ATA Samsung SSD 840 3B0Q> at scbus8 target 19 lun 0 (pass13,da10) <ATA Samsung SSD 850 3B6Q> at scbus8 target 20 lun 0 (pass14,da11) <ATA Samsung SSD 850 3B6Q> at scbus8 target 21 lun 0 (pass15,da12) <ATA Samsung SSD 850 3B6Q> at scbus8 target 22 lun 0 (pass16,da13) <ATA Samsung SSD 850 3B6Q> at scbus8 target 23 lun 0 (pass17,da14)
Device can be tied to device names, but I found that to be unreliable.
This tape library is usually powered off most of the month. The device numbers you see above are those mostly likely to be present when the unit is powered up. Let’s work with what we have.
I tried using the new device names from within the jail:
[dan@bacula-sd-02:~] $ sudo mtx -f /dev/pass19 cannot open SCSI device '/dev/pass19' - No such file or directory
That’s when I remembered the devices are permitted into the jail on a case-by-case basis.
Passing the devices through to the jail
/etc/devfs.rules needs to be updated to allow the jail to see them. At present, I have this:
[devfsrules_jail_unhide_tapes=5] add path sa0 unhide add path pass8 unhide mode 0660 add path pass9 unhide mode 0660 add path ch0 unhide add path nsa0 unhide add path sa1 unhide add path pass6 unhide add path pass7 unhide mode 0660 add path ch1 unhide add path nsa1 unhide [devfsrules_jail_bacula=6] add include $devfsrules_hide_all add include $devfsrules_unhide_basic add include $devfsrules_unhide_login add path zfs unhide add include $devfsrules_jail_unhide_tapes
From there, I see that lines 3-4 and 9-10 need to be updated with the values displayed by camcontrol.
After my updates, the diff was:
[dan@r720-01:/etc] $ diff -ruN devfs.rules.before.moving.to.R720-01 devfs.rules --- devfs.rules.before.moving.to.R720-01 2019-11-06 14:15:05.325255000 +0000 +++ devfs.rules 2019-11-06 14:17:00.156148000 +0000 @@ -1,14 +1,12 @@ [devfsrules_jail_unhide_tapes=5] add path sa0 unhide -add path pass8 unhide mode 0660 -add path pass9 unhide mode 0660 +add path pass18 unhide mode 0660 +add path pass19 unhide mode 0660 add path ch0 unhide add path nsa0 unhide add path sa1 unhide -add path pass6 unhide -add path pass7 unhide mode 0660 -add path ch1 unhide +add path pass20 unhide add path nsa1 unhide [devfsrules_jail_bacula=6] [dan@r720-01:/etc] $
I removed pass7 and ch1 because they related to ch1 which was not showing up at all. That device is not used by my Bacula configuration. pass6 because pass20.
I restarted both devfs on the host and the jail:
[dan@r720-01:/etc] $ sudo service devfs restart [dan@r720-01:/etc] $ sudo iocage restart bacula-sd-02 * Stopping bacula-sd-02_int_unixathome_org + Executing prestop OK + Stopping services OK + Removing devfs_ruleset: 6 OK + Removing jail process OK + Executing poststop OK * Ruleset 6 does not exist, using defaults * Starting bacula-sd-02_int_unixathome_org + Started OK + Using devfs_ruleset: 6 + Using IP options: ip4.addr=ix0|10.55.0.33 ip4.saddrsel=1 ip4=new ip6.saddrsel=1 ip6=new + Starting services OK + Executing poststart OK [dan@r720-01:/etc] $
Does not exist? This sounds like an old problem I encountered. I found a workaround though, and updated the iocage issue.
I amended /etc/devfs.rules changing 5 to 41 and 6 to 42. Then it works:
$ sudo iocage stop bacula-sd-02.int.unixathome.org * Stopping bacula-sd-02_int_unixathome_org + Executing prestop OK + Stopping services OK + Removing devfs_ruleset: 6 OK + Removing jail process OK + Executing poststop OK $ sudo iocage set devfs_ruleset=42 bacula-sd-02.int.unixathome.org devfs_ruleset: 6 -> 42 $ sudo iocage start bacula-sd-02.int.unixathome.org * Starting bacula-sd-02_int_unixathome_org + Started OK + Using devfs_ruleset: 42 + Using IP options: ip4.addr=ix0|10.55.0.33 ip4.saddrsel=1 ip4=new ip6.saddrsel=1 ip6=new + Starting services OK + Executing poststart OK
Within the jail I see the devices we need.:
[dan@bacula-sd-02:~] $ ls -l /dev total 1 crw------- 1 root operator 0x10b Nov 6 13:09 ch0 dr-xr-xr-x 2 root wheel 512 Oct 28 20:16 fd lrwxr-xr-x 1 root wheel 14 Nov 6 14:40 log -> ../var/run/log crw-rw---- 1 root operator 0x107 Nov 6 13:09 nsa0 crw-rw---- 1 root operator 0x10e Nov 6 13:09 nsa1 crw-rw-rw- 1 root wheel 0x12 Nov 6 14:40 null crw-rw---- 1 root operator 0x109 Nov 6 13:09 pass18 crw-rw---- 1 root operator 0x10a Nov 6 13:09 pass19 crw------- 1 root operator 0x110 Nov 6 13:09 pass20 dr-xr-xr-x 2 root wheel 512 Nov 6 14:40 pts crw-r--r-- 1 root wheel 0x8 Oct 28 20:17 random crw-rw---- 1 root operator 0x106 Nov 6 13:09 sa0 crw-rw---- 1 root operator 0x10d Nov 6 13:09 sa1 lrwxr-xr-x 1 root wheel 4 Nov 6 14:40 stderr -> fd/2 lrwxr-xr-x 1 root wheel 4 Nov 6 14:40 stdin -> fd/0 lrwxr-xr-x 1 root wheel 4 Nov 6 14:40 stdout -> fd/1 lrwxr-xr-x 1 root wheel 6 Nov 6 14:40 urandom -> random crw-rw-rw- 1 root wheel 0x13 Oct 28 20:16 zero crw-rw-rw- 1 root operator 0x4f Oct 28 20:16 zfs [dan@bacula-sd-02:~] $
bacula-sd.conf updates
The Bacula Storage Daemon configuration file needed an update to refer to the new ch0 device:
Autochanger { Name = "bacula-sd-02" Device = LTO_0 # take off, LTO_1 Description = "Dell PowerVault TL4000 Tape Library" Changer Device = /dev/pass19 Changer Command = "/usr/local/sbin/mtx-changer %c %o %S %a %d" }
That Changer Device needed to be pass19. Such changes require a restart:
[dan@bacula-sd-02:~] $ sudo service bacula-sd restart Stopping bacula_sd. Waiting for PIDS: 64089. Starting bacula_sd. [dan@bacula-sd-02:~] $
The command works now
Now this works:
[dan@bacula-sd-02:~] $ sudo mtx -f /dev/pass19 status Storage Changer /dev/pass19:2 Drives, 47 Slots ( 0 Import/Export ) Data Transfer Element 0:Empty Data Transfer Element 1:Empty Storage Element 1:Empty Storage Element 2:Empty Storage Element 3:Empty Storage Element 4:Empty Storage Element 5:Full :VolumeTag=HQ0018L4 Storage Element 6:Full :VolumeTag=HQ0015L4 Storage Element 7:Full :VolumeTag=HQ0013L4 Storage Element 8:Empty Storage Element 9:Full :VolumeTag=HQ0019L4 Storage Element 10:Full :VolumeTag=HQ0016L4 Storage Element 11:Full :VolumeTag=HQ0012L4 Storage Element 12:Empty Storage Element 13:Empty Storage Element 14:Empty Storage Element 15:Empty Storage Element 16:Empty Storage Element 17:Empty Storage Element 18:Empty Storage Element 19:Empty Storage Element 20:Empty Storage Element 21:Empty Storage Element 22:Empty Storage Element 23:Empty Storage Element 24:Empty Storage Element 25:Empty Storage Element 26:Empty Storage Element 27:Empty Storage Element 28:Empty Storage Element 29:Empty Storage Element 30:Empty Storage Element 31:Empty Storage Element 32:Empty Storage Element 33:Empty Storage Element 34:Empty Storage Element 35:Empty Storage Element 36:Empty Storage Element 37:Empty Storage Element 38:Empty Storage Element 39:Empty Storage Element 40:Empty Storage Element 41:Empty Storage Element 42:Empty Storage Element 43:Empty Storage Element 44:Empty Storage Element 45:Empty Storage Element 46:Empty Storage Element 47:Empty [dan@bacula-sd-02:~] $
Back to bconsole
Back in bconsole, this works:
*update slots storage=bacula-sd-02 Connecting to Storage daemon bacula-sd-02 at bacula-sd-02.int.unixathome.org:9103 ... 3306 Issuing autochanger "slots" command. Device "bacula-sd-02" has 47 slots. Connecting to Storage daemon bacula-sd-02 at bacula-sd-02.int.unixathome.org:9103 ... 3306 Issuing autochanger "list" command. Catalog record for Volume "HQ0018L4" is up to date. Catalog record for Volume "HQ0015L4" is up to date. Catalog record for Volume "HQ0013L4" is up to date. Catalog record for Volume "HQ0019L4" is up to date. Catalog record for Volume "HQ0016L4" is up to date. Catalog record for Volume "HQ0012L4" is up to date. *
We got jobs!
After starting the job copy, I see this status:
*st storage=bacula-sd-02 Connecting to Storage daemon bacula-sd-02 at bacula-sd-02.int.unixathome.org:9103 bacula-sd-02 Version: 9.4.3 (02 May 2019) amd64-portbld-freebsd12.0 freebsd 12.0-RELEASE-p5 Daemon started 06-Nov-19 14:56. Jobs: run=0, running=9. Heap: heap=0 smbytes=5,459,408 max_bytes=5,701,257 bufs=1,545 max_bufs=1,547 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 newbsr=0 Res: ndevices=2 nautochgr=1 Running Jobs: Writing: Full Backup job slocum_jail_snapshots JobId=298557 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=33,189 Bytes=9,680,299,044 AveBytes/sec=8,832,389 LastBytes/sec=8,832,389 FDReadSeqNo=318,848 in_msg=254259 out_msg=5 fd=15 Writing: Full Backup job BackupCatalog JobId=298559 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=1 Bytes=9,682,092,180 AveBytes/sec=8,834,025 LastBytes/sec=8,834,025 FDReadSeqNo=147,747 in_msg=147746 out_msg=5 fd=17 Writing: Full Backup job x8dtu_jail_snapshots JobId=298561 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=339,631 Bytes=9,707,344,338 AveBytes/sec=8,857,066 LastBytes/sec=8,857,066 FDReadSeqNo=2,102,642 in_msg=1434426 out_msg=5 fd=16 Writing: Full Backup job zuul_jail_snapshots JobId=298565 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=10,287 Bytes=9,682,276,985 AveBytes/sec=8,834,194 LastBytes/sec=8,834,194 FDReadSeqNo=195,390 in_msg=179324 out_msg=5 fd=18 Writing: Full Backup job pkg01_poudriere_snapshots JobId=298563 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=491,365 Bytes=9,632,965,800 AveBytes/sec=8,789,202 LastBytes/sec=8,789,202 FDReadSeqNo=4,165,929 in_msg=2809931 out_msg=5 fd=19 Writing: Full Backup job slocum_home JobId=298569 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=1,882 Bytes=9,683,880,274 AveBytes/sec=8,835,657 LastBytes/sec=8,835,657 FDReadSeqNo=162,736 in_msg=157687 out_msg=5 fd=22 Writing: Full Backup job pkg01_poudriere_snapshots JobId=298567 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=459,358 Bytes=9,648,422,478 AveBytes/sec=8,803,305 LastBytes/sec=8,803,305 FDReadSeqNo=3,904,947 in_msg=2636735 out_msg=5 fd=21 Writing: Full Backup job supernews_FP_msgs JobId=298573 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=1,349,671 Bytes=9,186,286,145 AveBytes/sec=8,381,647 LastBytes/sec=8,381,647 FDReadSeqNo=11,322,763 in_msg=7584033 out_msg=5 fd=24 Writing: Full Backup job x8dtu_message_archive_snapshots JobId=298571 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=164,139 Bytes=9,705,307,485 AveBytes/sec=8,855,207 LastBytes/sec=8,855,207 FDReadSeqNo=1,595,311 in_msg=1103760 out_msg=5 fd=23 Writing: Full Backup job r720-01_jail_snapshots JobId=298575 Volume="HQ0018L4" pool="FullsLTO4" device="LTO_0" (/dev/nsa0) spooling=0 despooling=0 despool_wait=0 Files=20,768 Bytes=9,680,886,378 AveBytes/sec=8,832,925 LastBytes/sec=8,832,925 FDReadSeqNo=255,909 in_msg=219264 out_msg=5 fd=25 ==== Jobs waiting to reserve a drive: ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name =================================================================== 297265 Full 3,798 232.0 M OK 07-Oct-19 15:31 r710-01_basic 297272 Full 2 24.20 K OK 07-Oct-19 15:31 tallboy_Papers_Jail_PostgreSQL_Configuration 297245 Full 74,374 2.761 G OK 07-Oct-19 15:32 supernews 297243 Full 183,623 3.922 G OK 07-Oct-19 15:33 mydev_home_dir 297221 Full 6,116 34.74 G OK 07-Oct-19 15:44 slocum_home 297219 Full 238,932 43.22 G OK 07-Oct-19 15:51 zuul_jail_snapshots 297217 Full 346,442 53.56 G OK 07-Oct-19 15:58 x8dtu_jail_snapshots 297215 Full 1 98.12 G OK 07-Oct-19 16:15 BackupCatalog 297213 Full 898,054 451.3 G OK 07-Oct-19 17:49 slocum_jail_snapshots 297610 Rest 0 0 Cancel 15-Oct-19 13:29 RestoreFiles ==== Device status: Autochanger "bacula-sd-02" with devices: "LTO_0" (/dev/nsa0) Device Tape is "LTO_0" (/dev/nsa0) mounted with: Volume: HQ0018L4 Pool: FullsLTO4 Media type: LTO4 Total Bytes=96,450,020,352 Blocks=1,495,070 Bytes/block=64,512 Positioned at File=96 Block=7,071 Slot 5 is loaded in drive 0. == Device Tape: "LTO_1" (/dev/nsa1) is not open. == ==== Used Volume status: Reserved volume: HQ0018L4 on Tape device "LTO_0" (/dev/nsa0) Reader=0 writers=10 reserves=0 volinuse=1 worm=0 ==== Attr spooling: 10 active jobs, 0 bytes; 0 total jobs, 0 max bytes. ==== *
This looks good to me!
The traffic
This is the traffic graph of the backup jobs flowing into r720-01: