I’m using FreeBSD 10.2-RC2 here, with a bunch of HDD. Here are the current pools:
$ sudo zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT benchmarking 21.8T 1.71T 20.0T - 3% 7% 1.00x DEGRADED - music 30T 1.86T 28.1T - 3% 6% 1.00x ONLINE - random_mirror 2.72T 308K 2.72T - 0% 0% 1.00x ONLINE - zroot 220G 1.29G 219G - 0% 0% 1.00x ONLINE -
The degraded music pool contains a drive I will be replacing later in this post. But first, I want to replace a mis-matched drive in the benchmarking pool. I noticed that when I saw this output:
pool: music state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM music ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gpt/disk_653AK2MXFS9A ONLINE 0 0 0 gpt/disk_653EK93XFS9A ONLINE 0 0 0 gpt/disk_653DK7WCFS9A ONLINE 0 0 0 gpt/disk_6525K2DGFS9A ONLINE 0 0 0 gpt/disk_652FK58FFS9A ONLINE 0 0 0 gpt/disk_653BK12IFS9A ONLINE 0 0 0 gpt/disk_653EK93QFS9A ONLINE 0 0 0 gpt/disk_653IK1IBFS9A ONLINE 0 0 0 gpt/disk_6539K3OJFS9A ONLINE 0 0 0 gpt/disk_653BK12FFS9A ONLINE 0 0 0 gpt/disk_256BYDPGS ONLINE 0 0 0
The serial number on line 19 is of a different format. I suspected I was using different size drives because I knew the box had both 3 and 5 TB drives. Let’s compare the device on line 18 with the one on line 19. I looked through the output of gpart list and found the two entries:
Geom name: da15 modified: false state: OK fwheads: 255 fwsectors: 63 last: 9767541134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da15p1 Mediasize: 3000592941056 (2.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r1w1e2 rawuuid: e27010ea-3a34-11e5-952f-0cc47a4cb140 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk_653BK12FFS9A length: 3000592941056 offset: 20480 type: freebsd-zfs index: 1 end: 5860533127 start: 40 Consumers: 1. Name: da15 Mediasize: 5000981078016 (4.5T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r1w1e3 Geom name: da19 modified: false state: OK fwheads: 255 fwsectors: 63 last: 5860533134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: da19p1 Mediasize: 3000592941056 (2.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0 rawuuid: e3ae48ab-3a34-11e5-952f-0cc47a4cb140 rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk_256BYDPGS length: 3000592941056 offset: 20480 type: freebsd-zfs index: 1 end: 5860533127 start: 40 Consumers: 1. Name: da19 Mediasize: 3000592982016 (2.7T) Sectorsize: 512 Stripesize: 4096 Stripeoffset: 0 Mode: r0w0e0
Lines 28 shows that the first device (da15) has 4.5T (which is what a 5TB drive gives you) and line 62 (da19) shows a 2.7TB drive (yes, that’s a 3TB drive).
Looking in /var/run/dmesg.boot, the above can be confirmed:
$ grep da15 /var/run/dmesg.boot da15 at mps2 bus 0 scbus12 target 15 lun 0 da15: <ATA TOSHIBA MD04ACA5 FP2A> Fixed Direct Access SPC-4 SCSI device da15: Serial Number 653BK12FFS9A da15: 600.000MB/s transfers da15: Command Queueing enabled da15: 4769307MB (9767541168 512 byte sectors: 255H 63S/T 608001C) $ grep da19 /var/run/dmesg.boot da19 at mps3 bus 0 scbus13 target 8 lun 0 da19: <ATA TOSHIBA DT01ACA3 ABB0> Fixed Direct Access SPC-4 SCSI device da19: Serial Number 256BYDPGS da19: 600.000MB/s transfers da19: Command Queueing enabled da19: 2861588MB (5860533168 512 byte sectors: 255H 63S/T 364801C)
The goal
My goal: remove da19 (disk_256BYDPGS) and replace it with another drive.
The new drive
I have a new drive, da23, and here is how I prepared it for the pool.
$ sudo gpart create -s gpt da23 da23 created $ sudo gpart add -a 4k -s 4600GB -t freebsd-zfs -l disk_653DK7WBFS9A da23 da23p1 added $ gpart show da23 => 34 9767541101 da23 GPT (4.5T) 34 6 - free - (3.0K) 40 9646899200 1 freebsd-zfs (4.5T) 9646899240 120641895 - free - (58G)
I tried to specify the size as 4.5T but failed: gpart: Invalid size param: Invalid argument
Removing the existing device
$ sudo zpool offline music gpt/disk_256BYDPGS $ zpool status music pool: music state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: none requested config: NAME STATE READ WRITE CKSUM music DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gpt/disk_653AK2MXFS9A ONLINE 0 0 0 gpt/disk_653EK93XFS9A ONLINE 0 0 0 gpt/disk_653DK7WCFS9A ONLINE 0 0 0 gpt/disk_6525K2DGFS9A ONLINE 0 0 0 gpt/disk_652FK58FFS9A ONLINE 0 0 0 gpt/disk_653BK12IFS9A ONLINE 0 0 0 gpt/disk_653EK93QFS9A ONLINE 0 0 0 gpt/disk_653IK1IBFS9A ONLINE 0 0 0 gpt/disk_6539K3OJFS9A ONLINE 0 0 0 gpt/disk_653BK12FFS9A ONLINE 0 0 0 3040563092296029026 OFFLINE 0 0 0 was /dev/gpt/disk_256BYDPGS errors: No known data errors
Adding in the new drive
This is the command I used to replace the drive:
$ sudo zpool replace music 3040563092296029026 gpt/disk_653DK7WBFS9A $ zpool status music pool: music state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Aug 15 17:50:55 2015 3.45G scanned out of 1.86T at 208M/s, 2h36m to go 320M resilvered, 0.18% done config: NAME STATE READ WRITE CKSUM music DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gpt/disk_653AK2MXFS9A ONLINE 0 0 0 gpt/disk_653EK93XFS9A ONLINE 0 0 0 gpt/disk_653DK7WCFS9A ONLINE 0 0 0 gpt/disk_6525K2DGFS9A ONLINE 0 0 0 gpt/disk_652FK58FFS9A ONLINE 0 0 0 gpt/disk_653BK12IFS9A ONLINE 0 0 0 gpt/disk_653EK93QFS9A ONLINE 0 0 0 gpt/disk_653IK1IBFS9A ONLINE 0 0 0 gpt/disk_6539K3OJFS9A ONLINE 0 0 0 gpt/disk_653BK12FFS9A ONLINE 0 0 0 replacing-10 OFFLINE 0 0 0 3040563092296029026 OFFLINE 0 0 0 was /dev/gpt/disk_256BYDPGS gpt/disk_653DK7WBFS9A ONLINE 0 0 0 (resilvering) errors: No known data errors
I’ll just wait…
For extra points
I returned one of the Toshiba 3TB HDD which had smartctl errors shortly after starting up. The replacement arrived last week from Amazon. I put the drive into the system tonight and prepared it. Here’s what I did:
$ sudo gpart create -s gpt da7 da7 created $ sudo gpart add -a 4k -s 5860533095 -t freebsd-zfs -l disk_35AL161GS da7 da7p1 added
Here is the pool before I replace the missing device:
$ zpool status benchmarking pool: benchmarking state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: http://illumos.org/msg/ZFS-8000-2Q scan: resilvered 202G in 1h28m with 0 errors on Mon Aug 3 22:29:55 2015 config: NAME STATE READ WRITE CKSUM benchmarking DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gpt/disk_Z2T2UJJAS ONLINE 0 0 0 gpt/disk_13Q8U6GYS ONLINE 0 0 0 gpt/disk_256BWVLGS ONLINE 0 0 0 gpt/disk_256BY66GS ONLINE 0 0 0 gpt/disk_255BV69GS ONLINE 0 0 0 gpt/disk_255BS4NGS ONLINE 0 0 0 gpt/disk_255BUT1GS ONLINE 0 0 0 4796034587839379020 UNAVAIL 0 0 0 was /dev/gpt/disk_653EK93PFS9A errors: No known data errors
Here is the command I used to replace the missing device:
$ sudo zpool replace benchmarking 4796034587839379020 gpt/disk_35AL161GS
Just like that, it was busy resilvering:
$ zpool status benchmarking pool: benchmarking state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sat Aug 15 18:21:01 2015 263G scanned out of 1.71T at 763M/s, 0h33m to go 30.6G resilvered, 15.00% done config: NAME STATE READ WRITE CKSUM benchmarking DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gpt/disk_Z2T2UJJAS ONLINE 0 0 0 gpt/disk_13Q8U6GYS ONLINE 0 0 0 gpt/disk_256BWVLGS ONLINE 0 0 0 gpt/disk_256BY66GS ONLINE 0 0 0 gpt/disk_255BV69GS ONLINE 0 0 0 gpt/disk_255BS4NGS ONLINE 0 0 0 gpt/disk_255BUT1GS ONLINE 0 0 0 replacing-7 UNAVAIL 0 0 0 4796034587839379020 UNAVAIL 0 0 0 was /dev/gpt/disk_653EK93PFS9A gpt/disk_35AL161GS ONLINE 0 0 0 (resilvering) errors: No known data errors
Waiting waiting waiting
After 90 minutes, both resilverings were complete:
$ zpool status benchmarking music pool: benchmarking state: ONLINE scan: resilvered 202G in 0h39m with 0 errors on Sat Aug 15 19:00:07 2015 config: NAME STATE READ WRITE CKSUM benchmarking ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gpt/disk_Z2T2UJJAS ONLINE 0 0 0 gpt/disk_13Q8U6GYS ONLINE 0 0 0 gpt/disk_256BWVLGS ONLINE 0 0 0 gpt/disk_256BY66GS ONLINE 0 0 0 gpt/disk_255BV69GS ONLINE 0 0 0 gpt/disk_255BS4NGS ONLINE 0 0 0 gpt/disk_255BUT1GS ONLINE 0 0 0 gpt/disk_35AL161GS ONLINE 0 0 0 errors: No known data errors pool: music state: ONLINE scan: resilvered 173G in 1h25m with 0 errors on Sat Aug 15 19:15:59 2015 config: NAME STATE READ WRITE CKSUM music ONLINE 0 0 0 raidz3-0 ONLINE 0 0 0 gpt/disk_653AK2MXFS9A ONLINE 0 0 0 gpt/disk_653EK93XFS9A ONLINE 0 0 0 gpt/disk_653DK7WCFS9A ONLINE 0 0 0 gpt/disk_6525K2DGFS9A ONLINE 0 0 0 gpt/disk_652FK58FFS9A ONLINE 0 0 0 gpt/disk_653BK12IFS9A ONLINE 0 0 0 gpt/disk_653EK93QFS9A ONLINE 0 0 0 gpt/disk_653IK1IBFS9A ONLINE 0 0 0 gpt/disk_6539K3OJFS9A ONLINE 0 0 0 gpt/disk_653BK12FFS9A ONLINE 0 0 0 gpt/disk_653DK7WBFS9A ONLINE 0 0 0 errors: No known data errors
Interesting messages
I noticed these messages in /var/log/messages:
Aug 15 18:21:01 varm ZFS: vdev state changed, pool_guid=17468409599595688358 vdev_guid=17185678106577923296 Aug 15 18:36:18 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=13168585638132050393 vdev_guid=8485381449328651362'' Aug 15 18:36:18 varm ZFS: vdev state changed, pool_guid=13168585638132050393 vdev_guid=8485381449328651362 Aug 15 18:36:18 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=13168585638132050393 vdev_guid=8421655518573488526'' Aug 15 18:36:18 varm ZFS: vdev state changed, pool_guid=13168585638132050393 vdev_guid=8421655518573488526 Aug 15 18:36:18 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=13168585638132050393 vdev_guid=6595461994760359315'' Aug 15 18:36:18 varm ZFS: vdev state changed, pool_guid=13168585638132050393 vdev_guid=6595461994760359315 Aug 15 18:37:14 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=15362233628480096924 vdev_guid=6417995594589973460'' Aug 15 18:37:14 varm ZFS: vdev state changed, pool_guid=15362233628480096924 vdev_guid=6417995594589973460 Aug 15 18:37:14 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=15362233628480096924 vdev_guid=12132241653334970585'' Aug 15 18:37:14 varm ZFS: vdev state changed, pool_guid=15362233628480096924 vdev_guid=12132241653334970585 Aug 15 18:37:15 varm devd: Executing 'logger -p kern.notice -t ZFS 'vdev state changed, pool_guid=15362233628480096924 vdev_guid=12827680528713607564'' Aug 15 18:37:15 varm ZFS: vdev state changed, pool_guid=15362233628480096924 vdev_guid=12827680528713607564
Anyone care to tell me what they refer to?
Trolling
This completes our device replacements for today. Please tip your servers, mention @mwlauthor on Twitter (without referencing this post), and have a nice day. Seriously, please just mention him. :)