Today is the day.
Today I move a zpool from an R710 into an R720. The goal: all services on that zpool start running on the new host.
Fortunately, that zpool is dedicated to jails, more or less. I have done some planning about this, including moving a poudriere on the R710 into a jail.
Now it is almost noon on Saturday, I am sitting in the basement (just outside the server room), and I’m typing this up.
In this post:
- FreeBSD 12.0
- Dell R710 (r710-01)
- Dell R720 (r720-01)
- drive caddies from eBay and now I know the difference between SATA and SATAu
PLEASE READ THIS first: Migrating ZFS Storage Pools
Preparing the source system for export
First step: stop the services which depend upon the source system, the server which now holds the drives.
For me, that’s this script on my laptop which stops various processes on various jails:
[dan@pro02:~/src/scripts] $ ./pg03-postgresql-stop-apps.sh root 33158 0.0 0.0 10660 2156 - SJ 19:18 0:00.01 supervise freshports root 27497 0.0 0.0 10660 2156 - SJ 19:17 0:00.01 supervise freshports freshports 91163 0.0 0.0 10648 2148 - SCJ 15:43 0:00.00 sleep 3 root 24047 0.0 0.0 10660 2156 - SJ 19:17 0:00.01 supervise freshports freshports 91184 0.0 0.0 10648 2148 - SCJ 15:43 0:00.00 sleep 3 Stopping bacula_dir. Waiting for PIDS: 98645. [dan@pro02:~/src/scripts] $
What’s in that script? A bunch of ssh statements:
[dan@pro02:~/src/scripts] $ cat ./pg03-postgresql-stop-apps.sh #!/bin/sh ssh dev-ingress01 sudo svc -d /var/service/freshports ssh dev-nginx01 sudo svc -d /var/service/fp-listen ssh dev-ingress01 ps auwwx | grep fresh ssh test-ingress01 sudo svc -d /var/service/freshports ssh test-nginx01 sudo svc -d /var/service/fp-listen ssh test-ingress01 ps auwwx | grep fresh ssh stage-ingress01 sudo svc -d /var/service/freshports ssh stage-nginx01 sudo svc -d /var/service/fp-listen ssh stage-ingress01 ps auwwx | grep fresh ssh bacula sudo service bacula-dir stop [dan@pro02:~/src/scripts] $
Monitoring
I then went to Nagios and set downtime for various services. I could have done that first, and I should have, but this is home and I can cope with the reversal of order.
Stopping services on the source
I stopped the jails running on the R720:
[dan@r710-01:~] $ sudo service iocage stop * [I|O|C] stopping jails... [dan@r710-01:~] $
Exporting the zpool
“Storage pools should be explicitly exported to indicate that they are ready to be migrated. This operation flushes any unwritten data to disk, writes data to the disk indicating that the export was done, and removes all information about the pool from the system.” – Migrating ZFS Storage Pools
Here we go:
[dan@r710-01:~] $ sudo zpool export tank_fast cannot unmount '/iocage/jails/mqtt01/root': Device busy [dan@r710-01:~] $
Yes, when that happens, force it:
[dan@r710-01:~] $ sudo zpool export -f tank_fast [dan@r710-01:~] $
Now I am safe to pull the drives from the system.
Pulling the drives
This is the zpool I will be pulling from this system:
$ zpool status tank_fast pool: tank_fast state: ONLINE scan: scrub repaired 0 in 0 days 00:11:06 with 0 errors on Mon Oct 21 03:15:10 2019 config: NAME STATE READ WRITE CKSUM tank_fast ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da2p1 ONLINE 0 0 0 da3p1 ONLINE 0 0 0 errors: No known data errors
Examination of /var/run/dmesg.boot should show you drive models and serial numbers. This information will help you know that you have the right drives. Also monitor /var/log/messages to see what drive you just pulled.
I know that the system boots off a zroot mirror, so I know I can accidentally pull the wrong boot drive, and the system will still work. I must, however, reinsert that boot drive and ensure the zpool is intact before proceeding. If I were the pull the second boot drive, the system will freeze.
As I typed the above, I remembered that some drives can be identified by telling the light to flash.In my system, I’m using the mps driver. I failed to find something which would do this blinking for me.
So I just started pulling drives.
Here are the logs for pulling the drives:
Oct 26 16:20:40 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 11 Oct 26 16:20:40 r710-01 kernel: da3 at mps0 bus 0 scbus0 target 11 lun 0 Oct 26 16:20:40 r710-01 kernel: da3:s/n S3PTNF0JA70159T detached Oct 26 16:20:40 r710-01 kernel: (da3:mps0:0:11:0): Periph destroyed Oct 26 16:20:40 r710-01 kernel: mps0: Unfreezing devq for target ID 11 Oct 26 16:20:57 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 9 Oct 26 16:20:57 r710-01 kernel: da1 at mps0 bus 0 scbus0 target 9 lun 0 Oct 26 16:20:57 r710-01 kernel: da1: s/n S3PTNF0JA11513Y detached Oct 26 16:20:57 r710-01 kernel: (da1:mps0:0:9:0): Periph destroyed Oct 26 16:20:57 r710-01 kernel: mps0: Unfreezing devq for target ID 9 Oct 26 16:21:47 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 8 Oct 26 16:21:47 r710-01 kernel: da0 at mps0 bus 0 scbus0 target 8 lun 0 Oct 26 16:21:47 r710-01 kernel: da0: s/n S3PTNF0JA70588A detached Oct 26 16:21:47 r710-01 kernel: (da0:mps0:0:8:0): Periph destroyed Oct 26 16:21:47 r710-01 kernel: mps0: Unfreezing devq for target ID 8 Oct 26 16:22:12 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 10 Oct 26 16:22:12 r710-01 kernel: da2 at mps0 bus 0 scbus0 target 10 lun 0 Oct 26 16:22:12 r710-01 kernel: da2: s/n S3PTNF0JA70742A detached Oct 26 16:22:12 r710-01 kernel: (da2:mps0:0:10:0): Periph destroyed Oct 26 16:22:12 r710-01 kernel: mps0: Unfreezing devq for target ID 10
I have separated them out to make it easier to read. None of those drives are involved with the zroot zpool
Drive caddies
In my drive swap, I had to change from 3.5″ drive caddies to 2.5″ drive caddies. The ICY DOCK adaptors have served their purpose well. These were great adaptors and I recommend them, however, now I prefer hot swap versions.
I think this drive is already mounted to the new drive cage. I got it right the first time, not like the last time I tried this.
This was the second drive I moved over. This drive-swap process took most of the time.
With the drives in their new drive caddies, and placed into the new system, I powered up the R720.
Power up
That iocage message is just that iocage is not configured yet.
zfs import
[dan@r720-01:~] $ sudo zpool export tank_fast cannot open 'tank_fast': no such pool [dan@r720-01:~] $
Oh wait, that’s an export, not an import. Let’s try again.
[dan@r720-01:~] $ sudo zpool import tank_fast [dan@r720-01:~] $
Well, that was painless.
[dan@r720-01:~] $ zpool status tank_fast pool: tank_fast state: ONLINE scan: scrub repaired 0 in 0 days 00:11:06 with 0 errors on Mon Oct 21 03:15:10 2019 config: NAME STATE READ WRITE CKSUM tank_fast ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da14p1 ONLINE 0 0 0 da13p1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da11p1 ONLINE 0 0 0 da12p1 ONLINE 0 0 0 errors: No known data errors [dan@r720-01:~] $
From /var/log/messages I found:
Oct 27 01:31:56 r720-01 ZFS[95481]: vdev state changed, pool_guid=$16710903227826824647 vdev_guid=$11076130169143938357 Oct 27 01:31:56 r720-01 ZFS[95482]: vdev state changed, pool_guid=$16710903227826824647 vdev_guid=$11432593833572235920 Oct 27 01:32:01 r720-01 ZFS[95486]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$15592969848595908370 Oct 27 01:32:01 r720-01 ZFS[95487]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$11376585178559251170 Oct 27 01:32:01 r720-01 ZFS[95488]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$9573731602086691459 Oct 27 01:32:01 r720-01 ZFS[95489]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$8716406602783665762 Oct 27 01:32:06 r720-01 ZFS[95494]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$14488099808806263927 Oct 27 01:32:06 r720-01 ZFS[95495]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$2130360426031670789 Oct 27 01:32:06 r720-01 ZFS[95496]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$8085362650912887666 Oct 27 01:32:06 r720-01 ZFS[95497]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$12391766448035188132 Oct 27 01:32:06 r720-01 ZFS[95498]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$11017634317446294119 Oct 27 01:32:06 r720-01 ZFS[95499]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$9645762143460001864 Oct 27 01:32:06 r720-01 ZFS[95500]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$17367374049032558079 Oct 27 01:32:06 r720-01 ZFS[95501]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$12183170449216945977
I believe the above was the start of a zpool scrub, not the import.
I will make some guesses here:
- Lines 1-2 are the two drives in the zpool boot mirror
- Lines 3-6 are the newly imported zpool
- Lines 7-14 are the data01 zpool
My guess are based upon the different values for pool_guid. I could run zdb on those pools and confirm, but I have some stuff to do.
Here is the zpool status for comparison:
-- Dru$ zpool status pool: data01 state: ONLINE scan: scrub repaired 0 in 0 days 00:00:01 with 0 errors on Sun Oct 27 01:32:07 2019 config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/data1 ONLINE 0 0 0 gpt/data2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/data3 ONLINE 0 0 0 gpt/data4 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/data5 ONLINE 0 0 0 gpt/data6 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gpt/data7 ONLINE 0 0 0 gpt/data8 ONLINE 0 0 0 errors: No known data errors pool: tank_fast state: ONLINE scan: scrub repaired 0 in 0 days 00:10:01 with 0 errors on Sun Oct 27 01:42:02 2019 config: NAME STATE READ WRITE CKSUM tank_fast ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da14p1 ONLINE 0 0 0 da13p1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da11p1 ONLINE 0 0 0 da12p1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:06 with 0 errors on Sun Oct 27 01:32:02 2019 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/zfs0 ONLINE 0 0 0 gpt/zfs1 ONLINE 0 0 0 errors: No known data errors $
OK, done
With that, we are finished the migration. There is much to do, mainly configure iocage and get it running. I will leave that for another post because I am sure that is something important enough to have its own post.