Migrating drives and the zpool from one host to another.

Today is the day.

Today I move a zpool from an R710 into an R720. The goal: all services on that zpool start running on the new host.

Fortunately, that zpool is dedicated to jails, more or less. I have done some planning about this, including moving a poudriere on the R710 into a jail.

Now it is almost noon on Saturday, I am sitting in the basement (just outside the server room), and I’m typing this up.

In this post:

FreeBSD 12.0
Dell R710 (r710-01)
Dell R720 (r720-01)
drive caddies from eBay and now I know the difference between SATA and SATAu

PLEASE READ THIS first: Migrating ZFS Storage Pools

Preparing the source system for export

First step: stop the services which depend upon the source system, the server which now holds the drives.

For me, that’s this script on my laptop which stops various processes on various jails:

[dan@pro02:~/src/scripts] $  ./pg03-postgresql-stop-apps.sh
root    33158  0.0  0.0 10660  2156  -  SJ   19:18   0:00.01 supervise freshports
root       27497  0.0  0.0 10660  2156  -  SJ   19:17   0:00.01 supervise freshports
freshports 91163  0.0  0.0 10648  2148  -  SCJ  15:43   0:00.00 sleep 3
root       24047  0.0  0.0 10660  2156  -  SJ   19:17   0:00.01 supervise freshports
freshports 91184  0.0  0.0 10648  2148  -  SCJ  15:43   0:00.00 sleep 3
Stopping bacula_dir.
Waiting for PIDS: 98645.
[dan@pro02:~/src/scripts] $

What’s in that script? A bunch of ssh statements:

[dan@pro02:~/src/scripts] $ cat ./pg03-postgresql-stop-apps.sh
#!/bin/sh
ssh dev-ingress01    sudo svc -d /var/service/freshports
ssh dev-nginx01      sudo svc -d /var/service/fp-listen

ssh dev-ingress01 ps auwwx | grep fresh

ssh test-ingress01   sudo svc -d /var/service/freshports
ssh test-nginx01     sudo svc -d /var/service/fp-listen

ssh test-ingress01 ps auwwx | grep fresh

ssh stage-ingress01  sudo svc -d /var/service/freshports
ssh stage-nginx01    sudo svc -d /var/service/fp-listen

ssh stage-ingress01 ps auwwx | grep fresh

ssh bacula    sudo service bacula-dir stop
[dan@pro02:~/src/scripts] $

Monitoring

I then went to Nagios and set downtime for various services. I could have done that first, and I should have, but this is home and I can cope with the reversal of order.

Stopping services on the source

I stopped the jails running on the R720:

[dan@r710-01:~] $ sudo service iocage stop 
* [I|O|C] stopping jails... 
[dan@r710-01:~] $

Exporting the zpool

“Storage pools should be explicitly exported to indicate that they are ready to be migrated. This operation flushes any unwritten data to disk, writes data to the disk indicating that the export was done, and removes all information about the pool from the system.” – Migrating ZFS Storage Pools

Here we go:

[dan@r710-01:~] $ sudo zpool export tank_fast
cannot unmount '/iocage/jails/mqtt01/root': Device busy
[dan@r710-01:~] $

Yes, when that happens, force it:

[dan@r710-01:~] $ sudo zpool export -f tank_fast
[dan@r710-01:~] $

Now I am safe to pull the drives from the system.

Pulling the drives

This is the zpool I will be pulling from this system:

$ zpool status tank_fast
  pool: tank_fast
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:11:06 with 0 errors on Mon Oct 21 03:15:10 2019
config:

	NAME        STATE     READ WRITE CKSUM
	tank_fast   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da0p1   ONLINE       0     0     0
	    da1p1   ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da2p1   ONLINE       0     0     0
	    da3p1   ONLINE       0     0     0

errors: No known data errors

Examination of /var/run/dmesg.boot should show you drive models and serial numbers. This information will help you know that you have the right drives. Also monitor /var/log/messages to see what drive you just pulled.

I know that the system boots off a zroot mirror, so I know I can accidentally pull the wrong boot drive, and the system will still work. I must, however, reinsert that boot drive and ensure the zpool is intact before proceeding. If I were the pull the second boot drive, the system will freeze.

As I typed the above, I remembered that some drives can be identified by telling the light to flash.In my system, I’m using the mps driver. I failed to find something which would do this blinking for me.

So I just started pulling drives.

Here are the logs for pulling the drives:

Oct 26 16:20:40 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 11
Oct 26 16:20:40 r710-01 kernel: da3 at mps0 bus 0 scbus0 target 11 lun 0
Oct 26 16:20:40 r710-01 kernel: da3:   s/n S3PTNF0JA70159T      detached
Oct 26 16:20:40 r710-01 kernel: (da3:mps0:0:11:0): Periph destroyed
Oct 26 16:20:40 r710-01 kernel: mps0: Unfreezing devq for target ID 11

Oct 26 16:20:57 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 9
Oct 26 16:20:57 r710-01 kernel: da1 at mps0 bus 0 scbus0 target 9 lun 0
Oct 26 16:20:57 r710-01 kernel: da1:   s/n S3PTNF0JA11513Y      detached
Oct 26 16:20:57 r710-01 kernel: (da1:mps0:0:9:0): Periph destroyed
Oct 26 16:20:57 r710-01 kernel: mps0: Unfreezing devq for target ID 9

Oct 26 16:21:47 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 8
Oct 26 16:21:47 r710-01 kernel: da0 at mps0 bus 0 scbus0 target 8 lun 0
Oct 26 16:21:47 r710-01 kernel: da0:   s/n S3PTNF0JA70588A      detached
Oct 26 16:21:47 r710-01 kernel: (da0:mps0:0:8:0): Periph destroyed
Oct 26 16:21:47 r710-01 kernel: mps0: Unfreezing devq for target ID 8

Oct 26 16:22:12 r710-01 kernel: mps0: mpssas_prepare_remove: Sending reset for target ID 10
Oct 26 16:22:12 r710-01 kernel: da2 at mps0 bus 0 scbus0 target 10 lun 0
Oct 26 16:22:12 r710-01 kernel: da2:   s/n S3PTNF0JA70742A      detached
Oct 26 16:22:12 r710-01 kernel: (da2:mps0:0:10:0): Periph destroyed
Oct 26 16:22:12 r710-01 kernel: mps0: Unfreezing devq for target ID 10

I have separated them out to make it easier to read. None of those drives are involved with the zroot zpool

Drive caddies

In my drive swap, I had to change from 3.5″ drive caddies to 2.5″ drive caddies. The ICY DOCK adaptors have served their purpose well. These were great adaptors and I recommend them, however, now I prefer hot swap versions.

ICY DOCK 2.5" to 3.5" drive adaptors — ICY DOCK 2.5″ to 3.5″ drive adaptors

I think this drive is already mounted to the new drive cage. I got it right the first time, not like the last time I tried this.

This was the second drive I moved over. This drive-swap process took most of the time.

With the drives in their new drive caddies, and placed into the new system, I powered up the R720.

Power up

That iocage message is just that iocage is not configured yet.

zfs import

[dan@r720-01:~] $ sudo zpool export tank_fast
cannot open 'tank_fast': no such pool
[dan@r720-01:~] $

Oh wait, that’s an export, not an import. Let’s try again.

[dan@r720-01:~] $ sudo zpool import tank_fast
[dan@r720-01:~] $

Well, that was painless.

[dan@r720-01:~] $ zpool status tank_fast
  pool: tank_fast
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:11:06 with 0 errors on Mon Oct 21 03:15:10 2019
config:

	NAME        STATE     READ WRITE CKSUM
	tank_fast   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da14p1  ONLINE       0     0     0
	    da13p1  ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da11p1  ONLINE       0     0     0
	    da12p1  ONLINE       0     0     0

errors: No known data errors
[dan@r720-01:~] $

From /var/log/messages I found:

Oct 27 01:31:56 r720-01 ZFS[95481]: vdev state changed, pool_guid=$16710903227826824647 vdev_guid=$11076130169143938357
Oct 27 01:31:56 r720-01 ZFS[95482]: vdev state changed, pool_guid=$16710903227826824647 vdev_guid=$11432593833572235920
Oct 27 01:32:01 r720-01 ZFS[95486]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$15592969848595908370
Oct 27 01:32:01 r720-01 ZFS[95487]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$11376585178559251170
Oct 27 01:32:01 r720-01 ZFS[95488]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$9573731602086691459
Oct 27 01:32:01 r720-01 ZFS[95489]: vdev state changed, pool_guid=$1975810868733347630 vdev_guid=$8716406602783665762
Oct 27 01:32:06 r720-01 ZFS[95494]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$14488099808806263927
Oct 27 01:32:06 r720-01 ZFS[95495]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$2130360426031670789
Oct 27 01:32:06 r720-01 ZFS[95496]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$8085362650912887666
Oct 27 01:32:06 r720-01 ZFS[95497]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$12391766448035188132
Oct 27 01:32:06 r720-01 ZFS[95498]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$11017634317446294119
Oct 27 01:32:06 r720-01 ZFS[95499]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$9645762143460001864
Oct 27 01:32:06 r720-01 ZFS[95500]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$17367374049032558079
Oct 27 01:32:06 r720-01 ZFS[95501]: vdev state changed, pool_guid=$11681240405058725014 vdev_guid=$12183170449216945977

I believe the above was the start of a zpool scrub, not the import.

I will make some guesses here:

Lines 1-2 are the two drives in the zpool boot mirror
Lines 3-6 are the newly imported zpool
Lines 7-14 are the data01 zpool

My guess are based upon the different values for pool_guid. I could run zdb on those pools and confirm, but I have some stuff to do.

Here is the zpool status for comparison:

		-- Dru 
$ zpool status
  pool: data01
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:01 with 0 errors on Sun Oct 27 01:32:07 2019
config:

	NAME           STATE     READ WRITE CKSUM
	data01         ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    gpt/data1  ONLINE       0     0     0
	    gpt/data2  ONLINE       0     0     0
	  mirror-1     ONLINE       0     0     0
	    gpt/data3  ONLINE       0     0     0
	    gpt/data4  ONLINE       0     0     0
	  mirror-2     ONLINE       0     0     0
	    gpt/data5  ONLINE       0     0     0
	    gpt/data6  ONLINE       0     0     0
	  mirror-3     ONLINE       0     0     0
	    gpt/data7  ONLINE       0     0     0
	    gpt/data8  ONLINE       0     0     0

errors: No known data errors

  pool: tank_fast
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:10:01 with 0 errors on Sun Oct 27 01:42:02 2019
config:

	NAME        STATE     READ WRITE CKSUM
	tank_fast   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da14p1  ONLINE       0     0     0
	    da13p1  ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da11p1  ONLINE       0     0     0
	    da12p1  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:06 with 0 errors on Sun Oct 27 01:32:02 2019
config:

	NAME          STATE     READ WRITE CKSUM
	zroot         ONLINE       0     0     0
	  mirror-0    ONLINE       0     0     0
	    gpt/zfs0  ONLINE       0     0     0
	    gpt/zfs1  ONLINE       0     0     0

errors: No known data errors
$

OK, done

With that, we are finished the migration. There is much to do, mainly configure iocage and get it running. I will leave that for another post because I am sure that is something important enough to have its own post.