Adding a failed HDD back into a ZFS mirror

I do have a FreeBSD-11 box, cuppy:

$ uname -a
FreeBSD cuppy.int.unixathome.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r279394: Sat Feb 28 21:01:21 UTC 2015
dan@cuppy.unixathome.org:/usr/obj/usr/src/sys/GENERIC  amd64
$ 

That box is used mostly for testing and/or erasing DLT tapes.

The current status of that box is not healthy. It’s running fine, but it is not optimal:

$ zpool status
  pool: system
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: none requested
config:

	NAME                      STATE     READ WRITE CKSUM
	system                    DEGRADED     0     0     0
	  mirror-0                DEGRADED     0     0     0
	    16763837457347917579  UNAVAIL      0     0     0  was /dev/gpt/disk0
	    gpt/disk1             ONLINE       0     0     0

errors: No known data errors

Let’s fix this.

Background

Why is one drive upset? Because I installed FreeBSD onto it. Why? I don’t know, but I did.

Which drive?

The first goal: preserve gpt/disk1 because that is our functioning drive in this mirror.

Which drive is that?

In the following output is our answer:

$ glabel status
                                      Name  Status  Components
               diskid/DISK-WD-WCAS84971893     N/A  ada0
gptid/c2eb5180-d980-11e5-bfb4-94de80aad24a     N/A  ada0p1
gptid/c2ebe3ba-d980-11e5-bfb4-94de80aad24a     N/A  ada0p2
                    ufsid/56cb3645d9f75e4d     N/A  ada0p2
gptid/c2ecaec6-d980-11e5-bfb4-94de80aad24a     N/A  ada0p3
                             gpt/bootcode1     N/A  ada1p1
gptid/2cf2e65f-5d3d-11e3-a1d8-0004aca3703d     N/A  ada1p1
                              gpt/swapada1     N/A  ada1p2
                                 gpt/disk1     N/A  ada1p3

Thus, we want to keep ada1 untouched because that drive contains gpt/disk1, which appears above in the output of zpool status. ada0 is what we need to add into the mirror.

What do those drives look like?

$ gpart show 
=>       34  976773101  ada0  GPT  (466G)
         34       1024     1  freebsd-boot  (512K)
       1058  968883200     2  freebsd-ufs  (462G)
  968884258    7888876     3  freebsd-swap  (3.8G)
  976773134          1        - free -  (512B)

=>       34  976773101  diskid/DISK-WD-WCAS84971893  GPT  (466G)
         34       1024                            1  freebsd-boot  (512K)
       1058  968883200                            2  freebsd-ufs  (462G)
  968884258    7888876                            3  freebsd-swap  (3.8G)
  976773134          1                               - free -  (512B)

=>       34  976773101  ada1  GPT  (466G)
         34          6        - free -  (3.0K)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048   16777216     2  freebsd-swap  (8.0G)
   16779264  954204160     3  freebsd-zfs  (455G)
  970983424    5789711        - free -  (2.8G)

Let’s clean up ada0.

The following command is destructive.

$ sudo gpart destroy -F ada0
Password:
ada0 destroyed

This command sets up ada0 just like ada1.

# gpart backup ada1 | gpart restore ada0
# gpart show
=>       34  976773101  ada1  GPT  (466G)
         34          6        - free -  (3.0K)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048   16777216     2  freebsd-swap  (8.0G)
   16779264  954204160     3  freebsd-zfs  (455G)
  970983424    5789711        - free -  (2.8G)

=>       40  976773088  ada0  GPT  (466G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048   16777216     2  freebsd-swap  (8.0G)
   16779264  954204160     3  freebsd-zfs  (455G)
  970983424    5789704        - free -  (2.8G)

=>       40  976773088  diskid/DISK-WD-WCAS84971893  GPT  (466G)
         40       1024                            1  freebsd-boot  (512K)
       1064        984                               - free -  (492K)
       2048   16777216                            2  freebsd-swap  (8.0G)
   16779264  954204160                            3  freebsd-zfs  (455G)
  970983424    5789704                               - free -  (2.8G)

Clear it first

This drive, ada0, was part ZFS before, which means it probably has old labels on it. Let’s check:

# glabel status
                                      Name  Status  Components
                             gpt/bootcode1     N/A  ada1p1
gptid/2cf2e65f-5d3d-11e3-a1d8-0004aca3703d     N/A  ada1p1
                              gpt/swapada1     N/A  ada1p2
                                 gpt/disk1     N/A  ada1p3
gptid/7c801b04-dd82-11e5-bafe-94de80aad24a     N/A  ada0p1
gptid/7c8124e3-dd82-11e5-bafe-94de80aad24a     N/A  ada0p2
gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a     N/A  ada0p3
               diskid/DISK-WD-WCAS84971893     N/A  ada0

Let’s check each partition of ada0, using the Names above.

# zdb -l /dev/gptid/7c801b04-dd82-11e5-bafe-94de80aad24a
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3


# zdb -l /dev/gptid/7c8124e3-dd82-11e5-bafe-94de80aad24a
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3


# zdb -l /dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a
--------------------------------------------
LABEL 0
--------------------------------------------
    version: 5000
    name: 'system'
    state: 0
    txg: 1814194
    pool_guid: 5342046208002196764
    hostid: 3355958386
    hostname: 'cuppy.int.unixathome.org'
    top_guid: 9377548653516348661
    guid: 16763837457347917579
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 9377548653516348661
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 488547811328
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 16763837457347917579
            path: '/dev/gpt/disk0'
            phys_path: '/dev/gpt/disk0'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 14981872129086548885
            path: '/dev/gpt/disk1'
            phys_path: '/dev/gpt/disk1'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 5000
    name: 'system'
    state: 0
    txg: 1814194
    pool_guid: 5342046208002196764
    hostid: 3355958386
    hostname: 'cuppy.int.unixathome.org'
    top_guid: 9377548653516348661
    guid: 16763837457347917579
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 9377548653516348661
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 488547811328
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 16763837457347917579
            path: '/dev/gpt/disk0'
            phys_path: '/dev/gpt/disk0'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 14981872129086548885
            path: '/dev/gpt/disk1'
            phys_path: '/dev/gpt/disk1'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 5000
    name: 'system'
    state: 0
    txg: 1814194
    pool_guid: 5342046208002196764
    hostid: 3355958386
    hostname: 'cuppy.int.unixathome.org'
    top_guid: 9377548653516348661
    guid: 16763837457347917579
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 9377548653516348661
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 488547811328
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 16763837457347917579
            path: '/dev/gpt/disk0'
            phys_path: '/dev/gpt/disk0'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 14981872129086548885
            path: '/dev/gpt/disk1'
            phys_path: '/dev/gpt/disk1'
            whole_disk: 1
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 5000
    name: 'system'
    state: 0
    txg: 1814194
    pool_guid: 5342046208002196764
    hostid: 3355958386
    hostname: 'cuppy.int.unixathome.org'
    top_guid: 9377548653516348661
    guid: 16763837457347917579
    vdev_children: 1
    vdev_tree:
        type: 'mirror'
        id: 0
        guid: 9377548653516348661
        metaslab_array: 33
        metaslab_shift: 32
        ashift: 12
        asize: 488547811328
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 16763837457347917579
            path: '/dev/gpt/disk0'
            phys_path: '/dev/gpt/disk0'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 14981872129086548885
            path: '/dev/gpt/disk1'
            phys_path: '/dev/gpt/disk1'
            whole_disk: 1
            create_txg: 4
    features_for_read:

Yes, we have to clear out the labels on ada0p3:

# zpool labelclear -f /dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a
labelclear operation failed.
	Vdev /dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a is a member (ACTIVE), of pool "system".
	To remove label information from this device, export or destroy
	the pool, or remove /dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a from the configuration of this pool
	and retry the labelclear operation

Ahh, I see. The labels for this drive, despite being repartitioned etc, still persist. Let me remove this drive from the pool configuration.

# zpool detach system 16763837457347917579
# zpool status
  pool: system
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: none requested
config:

	NAME         STATE     READ WRITE CKSUM
	system       ONLINE       0     0     0
	  gpt/disk1  ONLINE       0     0     0

errors: No known data errors

That worked. Now try the labelclear again:

# zpool labelclear -f /dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a
#

Success! Now on to adding the drive in.

Add it in

Here we go!

# zpool attach system gpt/disk1 /dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'system', you may need to update
boot code on newly attached disk '/dev/gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

There. One drive has been attach to another and a new device created. You can see that here:

# zpool status
  pool: system
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Feb 27 19:05:11 2016
        137M scanned out of 90.5G at 9.75M/s, 2h38m to go
        136M resilvered, 0.15% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	system                                          ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    gpt/disk1                                   ONLINE       0     0     0
	    gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a  ONLINE       0     0     0  (resilvering)

errors: No known data errors

The new device is

mirror-0

.

Make it bootable!

Heed that advice given in the output of zpool attach. Make that new drive bootable. For me, that’s ada0:

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
bootcode written to ada0
#

What about the swap partition?

I run swap in a gmirror:

# cat /etc/fstab
/dev/mirror/swap none swap sw         0 0

The current status, not surprisingly, is degraded:

# gmirror status
       Name    Status  Components
mirror/swap  DEGRADED  gpt/swapada1 (ACTIVE)

To fix that, I can issue these commands:

# gmirror insert swap gptid/7c8124e3-dd82-11e5-bafe-94de80aad24a
gmirror: Not all disks connected.

Oh, let’s make gmirror forget about missing devices:

# gmirror forget swap
#

Now let’s add in a new device:

# gmirror insert swap gptid/7c8124e3-dd82-11e5-bafe-94de80aad24a
# gmirror status
       Name    Status  Components
mirror/swap  DEGRADED  gpt/swapada1 (ACTIVE)
                       ada0p2 (SYNCHRONIZING, 3%)

How did I know what device to use? I know swap is partition 2 by looking at the output of gpart above. Looking in the output of glabel status and finding partition 2 for ada0, I found the value I needed in the Name column.

Hope that helps.

We now pause for resilvering and SYNCHRONIZING

Here, I wait for zpool and gmirror to finish with the new drive. The time is now 1:41 pm.

By 1:57pm, gmirror was happy:

# gmirror status
       Name    Status  Components
mirror/swap  COMPLETE  gpt/swapada1 (ACTIVE)
                       ada0p2 (ACTIVE)

By 2:02 pm, zpool was happy:

# zpool status
  pool: system
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: resilvered 90.5G in 0h31m with 0 errors on Sat Feb 27 19:36:36 2016
config:

	NAME                                            STATE     READ WRITE CKSUM
	system                                          ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    gpt/disk1                                   ONLINE       0     0     0
	    gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a  ONLINE       0     0     0

errors: No known data errors

See that upgrade?

Let’s upgrade the pool, as mentioned in the previous section.

# zpool upgrade system
This system supports ZFS pool feature flags.

Enabled the following features on 'system':
  multi_vdev_crash_dump
  spacemap_histogram
  enabled_txg
  hole_birth
  extensible_dataset
  embedded_data
  bookmarks
  filesystem_limits
  large_blocks

If you boot from pool 'system', don't forget to update boot code.
Assuming you use GPT partitioning and da0 is your boot disk
the following command will do it:

	gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

Always remember to do that gpart bootcode for every drive in the pool. Don’t be like me.

First, let’s check status again:

# zpool status
  pool: system
 state: ONLINE
  scan: resilvered 90.5G in 0h31m with 0 errors on Sat Feb 27 19:36:36 2016
config:

	NAME                                            STATE     READ WRITE CKSUM
	system                                          ONLINE       0     0     0
	  mirror-0                                      ONLINE       0     0     0
	    gpt/disk1                                   ONLINE       0     0     0
	    gptid/7c86d33c-dd82-11e5-bafe-94de80aad24a  ONLINE       0     0     0

errors: No known data errors

All good. Now to the bootcode, for each drive:

# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
bootcode written to ada0
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1
bootcode written to ada1
#

Reboot test

I then did a reboot, just to make sure it reboots. I have time now to deal with any issues, while recent changes are relatively fresh in my mind. That will not be the case when the next reboot occurs, which could be months away.

… Yep, good thing I rebooted. One of the drives was disabled in BIOS. I just tested by booting from one drive, and then the other. Both good.

We’re done.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top