Swapping disks in a zpool – Dan Langille's Other Diary

I’m using FreeBSD 10.2-RC2 here, with a bunch of HDD. Here are the current pools:

$ sudo zpool list
NAME            SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
benchmarking   21.8T  1.71T  20.0T         -     3%     7%  1.00x  DEGRADED  -
music            30T  1.86T  28.1T         -     3%     6%  1.00x  ONLINE  -
random_mirror  2.72T   308K  2.72T         -     0%     0%  1.00x  ONLINE  -
zroot           220G  1.29G   219G         -     0%     0%  1.00x  ONLINE  -

The degraded music pool contains a drive I will be replacing later in this post. But first, I want to replace a mis-matched drive in the benchmarking pool. I noticed that when I saw this output:

  pool: music
 state: ONLINE
  scan: none requested
config:

	NAME                       STATE     READ WRITE CKSUM
	music                      ONLINE       0     0     0
	  raidz3-0                 ONLINE       0     0     0
	    gpt/disk_653AK2MXFS9A  ONLINE       0     0     0
	    gpt/disk_653EK93XFS9A  ONLINE       0     0     0
	    gpt/disk_653DK7WCFS9A  ONLINE       0     0     0
	    gpt/disk_6525K2DGFS9A  ONLINE       0     0     0
	    gpt/disk_652FK58FFS9A  ONLINE       0     0     0
	    gpt/disk_653BK12IFS9A  ONLINE       0     0     0
	    gpt/disk_653EK93QFS9A  ONLINE       0     0     0
	    gpt/disk_653IK1IBFS9A  ONLINE       0     0     0
	    gpt/disk_6539K3OJFS9A  ONLINE       0     0     0
	    gpt/disk_653BK12FFS9A  ONLINE       0     0     0
	    gpt/disk_256BYDPGS     ONLINE       0     0     0

The serial number on line 19 is of a different format. I suspected I was using different size drives because I knew the box had both 3 and 5 TB drives. Let’s compare the device on line 18 with the one on line 19. I looked through the output of gpart list and found the two entries:

Geom name: da15
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 9767541134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da15p1
   Mediasize: 3000592941056 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e2
   rawuuid: e27010ea-3a34-11e5-952f-0cc47a4cb140
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: disk_653BK12FFS9A
   length: 3000592941056
   offset: 20480
   type: freebsd-zfs
   index: 1
   end: 5860533127
   start: 40
Consumers:
1. Name: da15
   Mediasize: 5000981078016 (4.5T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e3

Geom name: da19
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 5860533134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da19p1
   Mediasize: 3000592941056 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   rawuuid: e3ae48ab-3a34-11e5-952f-0cc47a4cb140
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: disk_256BYDPGS
   length: 3000592941056
   offset: 20480
   type: freebsd-zfs
   index: 1
   end: 5860533127
   start: 40
Consumers:
1. Name: da19
   Mediasize: 3000592982016 (2.7T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0

Lines 28 shows that the first device (da15) has 4.5T (which is what a 5TB drive gives you) and line 62 (da19) shows a 2.7TB drive (yes, that’s a 3TB drive).

Looking in /var/run/dmesg.boot, the above can be confirmed:

$ grep da15 /var/run/dmesg.boot 
da15 at mps2 bus 0 scbus12 target 15 lun 0
da15: <ATA TOSHIBA MD04ACA5 FP2A> Fixed Direct Access SPC-4 SCSI device
da15: Serial Number         653BK12FFS9A
da15: 600.000MB/s transfers
da15: Command Queueing enabled
da15: 4769307MB (9767541168 512 byte sectors: 255H 63S/T 608001C)

$ grep da19 /var/run/dmesg.boot 
da19 at mps3 bus 0 scbus13 target 8 lun 0
da19: <ATA TOSHIBA DT01ACA3 ABB0> Fixed Direct Access SPC-4 SCSI device
da19: Serial Number            256BYDPGS
da19: 600.000MB/s transfers
da19: Command Queueing enabled
da19: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)

The goal

My goal: remove da19 (disk_256BYDPGS) and replace it with another drive.

The new drive

I have a new drive, da23, and here is how I prepared it for the pool.

$ sudo gpart create -s gpt da23
da23 created

$ sudo gpart add  -a 4k -s 4600GB -t freebsd-zfs  -l disk_653DK7WBFS9A da23
da23p1 added

$ gpart show da23
=>        34  9767541101  da23  GPT  (4.5T)
          34           6        - free -  (3.0K)
          40  9646899200     1  freebsd-zfs  (4.5T)
  9646899240   120641895        - free -  (58G)

I tried to specify the size as 4.5T but failed: gpart: Invalid size param: Invalid argument

Removing the existing device

$ sudo zpool offline music gpt/disk_256BYDPGS
$ zpool status music
  pool: music
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: none requested
config:

	NAME                       STATE     READ WRITE CKSUM
	music                      DEGRADED     0     0     0
	  raidz3-0                 DEGRADED     0     0     0
	    gpt/disk_653AK2MXFS9A  ONLINE       0     0     0
	    gpt/disk_653EK93XFS9A  ONLINE       0     0     0
	    gpt/disk_653DK7WCFS9A  ONLINE       0     0     0
	    gpt/disk_6525K2DGFS9A  ONLINE       0     0     0
	    gpt/disk_652FK58FFS9A  ONLINE       0     0     0
	    gpt/disk_653BK12IFS9A  ONLINE       0     0     0
	    gpt/disk_653EK93QFS9A  ONLINE       0     0     0
	    gpt/disk_653IK1IBFS9A  ONLINE       0     0     0
	    gpt/disk_6539K3OJFS9A  ONLINE       0     0     0
	    gpt/disk_653BK12FFS9A  ONLINE       0     0     0
	    3040563092296029026    OFFLINE      0     0     0  was /dev/gpt/disk_256BYDPGS

errors: No known data errors

Adding in the new drive

This is the command I used to replace the drive:

$ sudo zpool replace music 3040563092296029026 gpt/disk_653DK7WBFS9A
$ zpool status music
  pool: music
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Aug 15 17:50:55 2015
        3.45G scanned out of 1.86T at 208M/s, 2h36m to go
        320M resilvered, 0.18% done
config:

	NAME                         STATE     READ WRITE CKSUM
	music                        DEGRADED     0     0     0
	  raidz3-0                   DEGRADED     0     0     0
	    gpt/disk_653AK2MXFS9A    ONLINE       0     0     0
	    gpt/disk_653EK93XFS9A    ONLINE       0     0     0
	    gpt/disk_653DK7WCFS9A    ONLINE       0     0     0
	    gpt/disk_6525K2DGFS9A    ONLINE       0     0     0
	    gpt/disk_652FK58FFS9A    ONLINE       0     0     0
	    gpt/disk_653BK12IFS9A    ONLINE       0     0     0
	    gpt/disk_653EK93QFS9A    ONLINE       0     0     0
	    gpt/disk_653IK1IBFS9A    ONLINE       0     0     0
	    gpt/disk_6539K3OJFS9A    ONLINE       0     0     0
	    gpt/disk_653BK12FFS9A    ONLINE       0     0     0
	    replacing-10             OFFLINE      0     0     0
	      3040563092296029026    OFFLINE      0     0     0  was /dev/gpt/disk_256BYDPGS
	      gpt/disk_653DK7WBFS9A  ONLINE       0     0     0  (resilvering)

errors: No known data errors

I’ll just wait…

For extra points

I returned one of the Toshiba 3TB HDD which had smartctl errors shortly after starting up. The replacement arrived last week from Amazon. I put the drive into the system tonight and prepared it. Here’s what I did:

$ sudo gpart create -s gpt da7
da7 created

$ sudo gpart add  -a 4k    -s 5860533095 -t freebsd-zfs  -l disk_35AL161GS da7
da7p1 added

Here is the pool before I replace the missing device:

$ zpool status benchmarking
  pool: benchmarking
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 202G in 1h28m with 0 errors on Mon Aug  3 22:29:55 2015
config:

	NAME                     STATE     READ WRITE CKSUM
	benchmarking             DEGRADED     0     0     0
	  raidz3-0               DEGRADED     0     0     0
	    gpt/disk_Z2T2UJJAS   ONLINE       0     0     0
	    gpt/disk_13Q8U6GYS   ONLINE       0     0     0
	    gpt/disk_256BWVLGS   ONLINE       0     0     0
	    gpt/disk_256BY66GS   ONLINE       0     0     0
	    gpt/disk_255BV69GS   ONLINE       0     0     0
	    gpt/disk_255BS4NGS   ONLINE       0     0     0
	    gpt/disk_255BUT1GS   ONLINE       0     0     0
	    4796034587839379020  UNAVAIL      0     0     0  was /dev/gpt/disk_653EK93PFS9A

errors: No known data errors

Here is the command I used to replace the missing device:

$ sudo zpool replace benchmarking 4796034587839379020 gpt/disk_35AL161GS

Just like that, it was busy resilvering:

$ zpool status benchmarking
  pool: benchmarking
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Aug 15 18:21:01 2015
        263G scanned out of 1.71T at 763M/s, 0h33m to go
        30.6G resilvered, 15.00% done
config:

	NAME                       STATE     READ WRITE CKSUM
	benchmarking               DEGRADED     0     0     0
	  raidz3-0                 DEGRADED     0     0     0
	    gpt/disk_Z2T2UJJAS     ONLINE       0     0     0
	    gpt/disk_13Q8U6GYS     ONLINE       0     0     0
	    gpt/disk_256BWVLGS     ONLINE       0     0     0
	    gpt/disk_256BY66GS     ONLINE       0     0     0
	    gpt/disk_255BV69GS     ONLINE       0     0     0
	    gpt/disk_255BS4NGS     ONLINE       0     0     0
	    gpt/disk_255BUT1GS     ONLINE       0     0     0
	    replacing-7            UNAVAIL      0     0     0
	      4796034587839379020  UNAVAIL      0     0     0  was /dev/gpt/disk_653EK93PFS9A
	      gpt/disk_35AL161GS   ONLINE       0     0     0  (resilvering)

errors: No known data errors

Waiting waiting waiting

After 90 minutes, both resilverings were complete:

$ zpool status benchmarking music
  pool: benchmarking
 state: ONLINE
  scan: resilvered 202G in 0h39m with 0 errors on Sat Aug 15 19:00:07 2015
config:

	NAME                    STATE     READ WRITE CKSUM
	benchmarking            ONLINE       0     0     0
	  raidz3-0              ONLINE       0     0     0
	    gpt/disk_Z2T2UJJAS  ONLINE       0     0     0
	    gpt/disk_13Q8U6GYS  ONLINE       0     0     0
	    gpt/disk_256BWVLGS  ONLINE       0     0     0
	    gpt/disk_256BY66GS  ONLINE       0     0     0
	    gpt/disk_255BV69GS  ONLINE       0     0     0
	    gpt/disk_255BS4NGS  ONLINE       0     0     0
	    gpt/disk_255BUT1GS  ONLINE       0     0     0
	    gpt/disk_35AL161GS  ONLINE       0     0     0

errors: No known data errors

  pool: music
 state: ONLINE
  scan: resilvered 173G in 1h25m with 0 errors on Sat Aug 15 19:15:59 2015
config:

	NAME                       STATE     READ WRITE CKSUM
	music                      ONLINE       0     0     0
	  raidz3-0                 ONLINE       0     0     0
	    gpt/disk_653AK2MXFS9A  ONLINE       0     0     0
	    gpt/disk_653EK93XFS9A  ONLINE       0     0     0
	    gpt/disk_653DK7WCFS9A  ONLINE       0     0     0
	    gpt/disk_6525K2DGFS9A  ONLINE       0     0     0
	    gpt/disk_652FK58FFS9A  ONLINE       0     0     0
	    gpt/disk_653BK12IFS9A  ONLINE       0     0     0
	    gpt/disk_653EK93QFS9A  ONLINE       0     0     0
	    gpt/disk_653IK1IBFS9A  ONLINE       0     0     0
	    gpt/disk_6539K3OJFS9A  ONLINE       0     0     0
	    gpt/disk_653BK12FFS9A  ONLINE       0     0     0
	    gpt/disk_653DK7WBFS9A  ONLINE       0     0     0

errors: No known data errors

Interesting messages

I noticed these messages in /var/log/messages:

Aug 15 18:21:01 varm ZFS: vdev state changed, pool_guid=17468409599595688358 vdev_guid=17185678106577923296
Aug 15 18:36:18 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=13168585638132050393 vdev_guid=8485381449328651362''
Aug 15 18:36:18 varm ZFS: vdev state changed, pool_guid=13168585638132050393 vdev_guid=8485381449328651362
Aug 15 18:36:18 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=13168585638132050393 vdev_guid=8421655518573488526''
Aug 15 18:36:18 varm ZFS: vdev state changed, pool_guid=13168585638132050393 vdev_guid=8421655518573488526
Aug 15 18:36:18 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=13168585638132050393 vdev_guid=6595461994760359315''
Aug 15 18:36:18 varm ZFS: vdev state changed, pool_guid=13168585638132050393 vdev_guid=6595461994760359315
Aug 15 18:37:14 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=15362233628480096924 vdev_guid=6417995594589973460''
Aug 15 18:37:14 varm ZFS: vdev state changed, pool_guid=15362233628480096924 vdev_guid=6417995594589973460
Aug 15 18:37:14 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=15362233628480096924 vdev_guid=12132241653334970585''
Aug 15 18:37:14 varm ZFS: vdev state changed, pool_guid=15362233628480096924 vdev_guid=12132241653334970585
Aug 15 18:37:15 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=15362233628480096924 vdev_guid=12827680528713607564''
Aug 15 18:37:15 varm ZFS: vdev state changed, pool_guid=15362233628480096924 vdev_guid=12827680528713607564

Anyone care to tell me what they refer to?

Trolling

This completes our device replacements for today. Please tip your servers, mention @mwlauthor on Twitter (without referencing this post), and have a nice day. Seriously, please just mention him. :)