Added new drive when existing one gave errors, then zpool replace

In this post, I am working with FreeBSD 10.2 on this server.

Over the past few days, a 3 year old drive has been giving errors. The number of errors has been constant, but I have a spare drive here, so I decided to replace it. I have verified that it is out of warranty.

Rather than pull the error drive and replace it, I opted to add in a new drive and let ZFS resilver it first. This is the safest option because you can do the resilver knowing there is full redundancy in place while you do that.

TIP: Note the serial number of the new drive before you add it to the server.

Before you proceed, if the new drive has been previously used in a zpool, I suggest trying my preparation ideas first. They may help you avoid the issues I will encounter with gmirror and ZFS labels. See the end of this post for details.

Identifying the drives

After reboot, the new drive is ada0. The dying drive is ada1. I confirmed this with via smartctl. I also knew the serial number (Z2T4KGYASTZ6) of the drive I just inserted. Yes, that drive is also out of warranty, but relatively unused.

NOTE: I may have thought I knew the number of the drive, but I was wrong. See the output of smartctl below. The serial number I mention above is wrong.

This is the new drive, and by new, I mean it has 1 year of power on time. I have no idea what this drive may have been used for in the the past, but I am confident I am the original owner.

[dan@knew:~] $ sudo smartctl -a /dev/ada0 
smartctl 6.4 2015-06-04 r4109 [FreeBSD 10.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Toshiba 3.5" DT01ACA... Desktop HDD
Device Model:     TOSHIBA DT01ACA300
Serial Number:    Z2T3BAXAS
LU WWN Device Id: 5 000039 ff4c187ba
Firmware Version: MX6OABB0
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Feb 19 15:51:27 2016 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(24373) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 407) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   139   139   054    Pre-fail  Offline      -       71
  3 Spin_Up_Time            0x0007   139   139   024    Pre-fail  Always       -       420 (Average 405)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       38
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   124   124   020    Pre-fail  Offline      -       33
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       9642
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       38
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       59
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0002   181   181   000    Old_age   Always       -       33 (Min/Max 15/44)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[dan@knew:~] $

The gmirror issue

The first issue: I did not wipe this drive when starting, thus, gmirror tried to start up and use it. It failed.

$ gmirror list
Geom name: swap
State: DEGRADED
Components: 3
Balance: load
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2857317765
Providers:
1. Name: mirror/swap
   Mediasize: 2147483136 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
Consumers:
1. Name: ada0p2
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   State: ACTIVE
   Priority: 1
   Flags: NONE
   GenID: 0
   SyncID: 1
   ID: 2717712298

So I stopped and destroy that gmirror:

$ sudo gmirror stop -f swap
$ sudo gmirror destroy swap

Destroy and rebuild

Here is how I rebuild the disk to look like the others.

This first command is destructive. Issue it on the correct drive.

$ sudo gpart destroy -F ada0
ada0 destroyed

$ sudo gpart create -s gpt ada0
ada0 created

$ sudo gpart add -b 34 -s 94 -t freebsd-boot -l disk_Z2T4KGYASTZ6 ada0
ada0p1 added

$ sudo gpart add -s 8g -t freebsd-swap -l swap6a ada0
ada0p2 added

$ sudo  gpart add -t freebsd-zfs -s 2784G -l disk_Z2T4KGYASTZ6data ada0
ada0p3 added

The labels (i.e. the -l parameter) I used are derived from:

disk_Z2T4KGYASTZ6 – the disk serial number
swap6a – I know this new drive will replace gpt/disk6. I learned that via the output of glabel status
disk_Z2T4KGYASTZ6data – again, the disk serial number, with data appended to make it unique.

Labels are completely arbitrary and you can use whatever label you prefer.

Those handy commands, all in one place, are:

sudo gpart create -s gpt ada0
sudo gpart add -b 34 -s 94 -t freebsd-boot -l disk_Z2T4KGYASTZ6 ada0
sudo gpart add -s 8g -t freebsd-swap ada0
sudo gpart add -t freebsd-zfs -s 2784G -l disk_Z2T4KGYASTZ6data ada0

Where did I get those numbers? From the disk I am replacing.

$ gpart show ada0 ada1
=>        34  5860533101  ada0  GPT  (2.7T)
          34           6        - free -  (3.0K)
          40          88     1  freebsd-boot  (44K)
         128    16777216     2  freebsd-swap  (8.0G)
    16777344  5838471168     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

=>        34  5860533101  ada1  GPT  (2.7T)
          34          94     1  freebsd-boot  (47K)
         128    16777216     2  freebsd-swap  (8.0G)
    16777344  5838471168     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

If you take 5838471168 / 1024 / 1024 / 2, you get 2784G. I’m converting from bytes to GB, and dividing by 2 because these are 512 byte sectors.

Or to save time and ensure accuracy, you can copy the gpart information from one drive to another.

[dan@knew:~] $ gpart backup ada1 | sudo gpart restore ada0
[dan@knew:~] $ gpart show ada0 ada1
=>        34  5860533101  ada0  GPT  (2.7T)
          34          94     1  freebsd-boot  (47K)
         128    16777216     2  freebsd-swap  (8.0G)
    16777344  5838471168     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

=>        34  5860533101  ada1  GPT  (2.7T)
          34          94     1  freebsd-boot  (47K)
         128    16777216     2  freebsd-swap  (8.0G)
    16777344  5838471168     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

[dan@knew:~] $

The trouble with this: no labels. I used the previous method.

Replacing the drive – failed attempt

From man zpool(8), using zpool replace is a good option here. I have room for spare drive in the system. I just have to tell zpool to replace drive A with drive B.

$ sudo zpool replace system gpt/disk6 gpt/disk_Z2T4KGYASTZ6
cannot replace gpt/disk6 with gpt/disk_Z2T4KGYASTZ6: device is too small

$ gpart show ada0 ada1
=>        34  5860533101  ada0  GPT  (2.7T)
          34          94     1  freebsd-boot  (47K)
         128    16777216     2  freebsd-swap  (8.0G)
    16777344  5838471168     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

=>        34  5860533101  ada1  GPT  (2.7T)
          34          94     1  freebsd-boot  (47K)
         128    16777216     2  freebsd-swap  (8.0G)
    16777344  5838471168     3  freebsd-zfs  (2.7T)
  5855248512     5284623        - free -  (2.5G)

Wait. They are the same size.

Ahh, I’m using the wrong device, as Allan Jude pointed out (on IRC).

OK, this time, I got a very different message, when I appended data to the device.

$ sudo zpool replace system gpt/disk_Z2T4KGYASTZ6data
invalid vdev specification
use '-f' to override the following errors:
/dev/gpt/disk_Z2T4KGYASTZ6data is part of potentially active pool 'system'

This is zpool trying to save you from accidentally placing a disk from one pool into another. Remember how I said this disk has been previously used? Well, this is that ghost coming back to haunt me, again. In this case, it is the lingering ZFS label information from previous usage.

Searching for that message, I found I had encountered it 6 years earlier. That old post led me to run this command:

$ sudo zdb -l /dev/gpt/disk_Z2T4KGYASTZ6data
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 5000
    name: 'system'
    state: 0
    txg: 6915674
    pool_guid: 11353391169725922550
    hostid: 3600270990
    hostname: ''
    top_guid: 5976699353168341310
    guid: 2491056538036708260
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 5976699353168341310
        metaslab_array: 30
        metaslab_shift: 34
        ashift: 12
        asize: 2995734970368
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12249913144658353338
            path: '/dev/gpt/disk0'
            phys_path: '/dev/gpt/disk0'
            whole_disk: 1
            DTL: 146
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 2491056538036708260
            path: '/dev/gpt/disk1'
            phys_path: '/dev/gpt/disk1'
            whole_disk: 1
            DTL: 145
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2449415577503190274
            path: '/dev/gpt/disk2'
            phys_path: '/dev/gpt/disk2'
            whole_disk: 1
            DTL: 144
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 5000
    name: 'system'
    state: 0
    txg: 6915674
    pool_guid: 11353391169725922550
    hostid: 3600270990
    hostname: ''
    top_guid: 5976699353168341310
    guid: 2491056538036708260
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 5976699353168341310
        metaslab_array: 30
        metaslab_shift: 34
        ashift: 12
        asize: 2995734970368
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 12249913144658353338
            path: '/dev/gpt/disk0'
            phys_path: '/dev/gpt/disk0'
            whole_disk: 1
            DTL: 146
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 2491056538036708260
            path: '/dev/gpt/disk1'
            phys_path: '/dev/gpt/disk1'
            whole_disk: 1
            DTL: 145
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 2449415577503190274
            path: '/dev/gpt/disk2'
            phys_path: '/dev/gpt/disk2'
            whole_disk: 1
            DTL: 144
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
[dan@knew:~] $

To clear that out, you can use this command:

$ sudo zpool labelclear -f /dev/gpt/disk_Z2T4KGYASTZ6data

This confirms the labels are gone:

$ sudo zdb -l /dev/gpt/disk_Z2T4KGYASTZ6data
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3

Replacing the drive – successful attempt

After beating back the ghosts of ZFS-past, I issued the replace command again:

$ sudo zpool replace system gpt/disk6 gpt/disk_Z2T4KGYASTZ6data
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'system', you may need to update
boot code on newly attached disk 'gpt/disk_Z2T4KGYASTZ6data'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

That last recommendation is important. I will do that.

$ sudo gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
bootcode written to ada0

How are things looking?

$ zpool status
  pool: system
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Feb 19 03:44:25 2016
        369M scanned out of 19.8T at 5.85M/s, (scan is slow, no estimated time)
        32.1M resilvered, 0.00% done
config:

	NAME                             STATE     READ WRITE CKSUM
	system                           ONLINE       0     0     0
	  raidz2-0                       ONLINE       0     0     0
	    gpt/disk0                    ONLINE       0     0     0
	    gpt/disk1                    ONLINE       0     0     0
	    gpt/disk2                    ONLINE       0     0     0
	    gpt/disk3                    ONLINE       0     0     0
	    gpt/disk4                    ONLINE       0     0     0
	    gpt/disk5                    ONLINE       0     0     0
	    replacing-6                  ONLINE       0     0     0
	      gpt/disk6                  ONLINE       0     0     0
	      gpt/disk_Z2T4KGYASTZ6data  ONLINE       0     0     0  (resilvering)
	    gpt/disk7                    ONLINE       0     0     0
	    gpt/disk8                    ONLINE       0     0     0
	    gpt/disk9                    ONLINE       0     0     0

errors: No known data errors

This was at 10:45 PM

The morning after

At 6:38 AM the next day:

zpool status
  pool: system
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Feb 19 03:44:25 2016
        2.46T scanned out of 19.8T at 90.9M/s, 55h37m to go
        240G resilvered, 12.42% done
config:

	NAME                             STATE     READ WRITE CKSUM
	system                           ONLINE       0     0     0
	  raidz2-0                       ONLINE       0     0     0
	    gpt/disk0                    ONLINE       0     0     0
	    gpt/disk1                    ONLINE       0     0     0
	    gpt/disk2                    ONLINE       0     0     0
	    gpt/disk3                    ONLINE       0     0     0
	    gpt/disk4                    ONLINE       0     0     0
	    gpt/disk5                    ONLINE       0     0     0
	    replacing-6                  ONLINE       0     0     0
	      gpt/disk6                  ONLINE       0     0     0
	      gpt/disk_Z2T4KGYASTZ6data  ONLINE       0     0     0  (resilvering)
	    gpt/disk7                    ONLINE       0     0     0
	    gpt/disk8                    ONLINE       0     0     0
	    gpt/disk9                    ONLINE       0     0     0

errors: No known data errors

By about noon, it was at:

  scan: resilver in progress since Fri Feb 19 03:44:25 2016
        5.28T scanned out of 19.8T at 108M/s, 39h9m to go
        514G resilvered, 26.62% done

At 9 AM the following day:

  scan: resilver in progress since Fri Feb 19 03:44:25 2016
        14.3T scanned out of 19.8T at 122M/s, 13h9m to go
        1.36T resilvered, 72.33% done

Sunday

On Sunday morning, I found:

$ zpool status
  pool: system
 state: ONLINE
  scan: resilvered 1.88T in 51h0m with 0 errors on Sun Feb 21 06:44:49 2016
config:

	NAME                           STATE     READ WRITE CKSUM
	system                         ONLINE       0     0     0
	  raidz2-0                     ONLINE       0     0     0
	    gpt/disk0                  ONLINE       0     0     0
	    gpt/disk1                  ONLINE       0     0     0
	    gpt/disk2                  ONLINE       0     0     0
	    gpt/disk3                  ONLINE       0     0     0
	    gpt/disk4                  ONLINE       0     0     0
	    gpt/disk5                  ONLINE       0     0     0
	    gpt/disk_Z2T4KGYASTZ6data  ONLINE       0     0     0
	    gpt/disk7                  ONLINE       0     0     0
	    gpt/disk8                  ONLINE       0     0     0
	    gpt/disk9                  ONLINE       0     0     0

Relabelling

I will try relabelling to make sure the label matches the correct serial number. Watch this space for an update.

Disk prep

If I was doing this again, I would prepare the disk. I should do that to all my disks which have been used, and are waiting to be reused. It saves time and avoids confusion when it is best to have clear thoughts.

Be careful with these commands. Issue them on the correct drive.

I would: