Adding in a hot-spare for zfs on FreeBSD

But first, there’s more

Right after asking “Anyone running zfsd? Did you do anything in particular to configure it? I just added my first hot-spare to a zpool.”, ivy told me “noooooooo don’t use hot spares!! Keep a cold spare or at least an online device not attached to a pool. otherwise your zpool will randomly decide to attach its hot spare due to a temporary cabling issue or something like that. the only reason you need a hot spare is if you’re sending a system to Antarctica and literally can’t monitor it or log in to replace a failed disk with the spare”.

In the meantime, while I think and research, I’ve removed the hot spare.

[20:16 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 20:31:26 with 0 errors on Sun Apr 13 00:16:22 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0
	spares
	  gpt/SEAG_ZJV4HFPE    AVAIL   

errors: No known data errors
[20:17 r730-03 dvl ~] % sudo zpool remove data01 gpt/SEAG_ZJV4HFPE   
[20:17 r730-03 dvl ~] % 

edit 2025-04-14 I did a replace

I did a replace today. I started an RMA on the faulty drive.

[17:07 r730-03 dvl ~] % sudo zpool replace data01 gpt/HGST_8CJVT8YE gpt/SEAG_ZJV4HFPE
[17:08 r730-03 dvl ~] % zpool status data01                                          
  pool: data01
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Apr 14 17:08:39 2025
	166G / 21.8T scanned at 2.76G/s, 0B / 21.8T issued
	0B resilvered, 0.00% done, no estimated completion time
config:

	NAME                     STATE     READ WRITE CKSUM
	data01                   ONLINE       0     0     0
	  mirror-0               ONLINE       0     0     0
	    replacing-0          ONLINE       0     0     0
	      gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	      gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX      ONLINE       0     0     0
	  mirror-1               ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT      ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E    ONLINE       0     0     0
	  mirror-2               ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2      ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D    ONLINE       0     0     0

errors: No known data errors
[17:09 r730-03 dvl ~] % 

Now, I want for both the scrub and the RMA response.

The problem

For a few weeks, I’ve been ignoring these errors on r730-03:

Apr 13 18:47:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 13 18:47:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 4 Offline uncorrectable sectors

I’ve had a replacement drive on hand after I had to replace a drive back in December. Today, I installed that spare drive.

Reorganizing the screwdriver bits

This part doesn’t relate to the drive. It can be skipped.

While attaching the new drive to the drive cage, I noticed almost all of the bits for my screwdriver were not displaying their names.

Before - few labels are visible.
Before – few labels are visible.

I really like this gift, and I use it often. I especially like it for drive cages. See the end of this post for a link.

After installing the new drive, I went back up stairs and continued watching Coroner on Hulu (S1E1, Black Dog). This was a perfect time to rearrange the labels.

After, front:

After, front. Labels visible.
After, front. Labels visible.

After, back:

After, back. Labels visible.
After, back. Labels visible.

With that vital organizational change, I went about taking photos and typing this up.

What’s in this host now?

After inserting the drive into r730-03, this appeared in the logs:

Apr 13 18:31:37 r730-03 kernel: mrsas0: System PD created target ID: 0x0
Apr 13 18:31:38 r730-03 kernel: da7 at mrsas0 bus 1 scbus1 target 0 lun 0
Apr 13 18:31:38 r730-03 kernel: da7: <ATA ST12000VN0007-2G SC60> Fixed Direct Access SPC-4 SCSI device
Apr 13 18:31:38 r730-03 kernel: da7: Serial Number ZJV4HFPE
Apr 13 18:31:38 r730-03 kernel: da7: 150.000MB/s transfers
Apr 13 18:31:38 r730-03 kernel: da7: 11444224MB (23437770752 512 byte sectors)

I partition my ZFS drives. So let’s do that first. I’m using a recent post as my template. This new drive has the same number of sectors as the new drive in that post. I’ll use the same commands.

My new drive has no partitions.

[19:11 r730-03 dvl ~] % gpart show da7
gpart: No such geom: da7.

Here are the other drives in zpool:

[19:17 r730-03 dvl ~] % gpart show da0 da1 da2 da3 da4 da5
=>         40  23437770672  da0  GPT  (11T)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da1  GPT  (11T)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da2  GPT  (11T)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da3  GPT  (11T)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da4  GPT  (11T)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           72       - free -  (36K)

=>         34  23437770685  da5  GPT  (11T)
           34            6       - free -  (3.0K)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           79       - free -  (40K)

[19:18 r730-03 dvl ~] % 

You will noticed that da5 is 23437770685 sectors, and all the others are 23437770672, a difference of 13. One drive being bigger is fine. I’ll setthe new drive up to match the smaller size. I’ll pick da4 as the template.

The zpools:

[19:18 r730-03 dvl ~] % zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data01  32.7T  21.8T  11.0T        -         -    27%    66%  1.00x    ONLINE  -
zroot    412G  17.9G   394G        -         -    21%     4%  1.00x    ONLINE  -
[19:18 r730-03 dvl ~] % 

Their statuses:

[19:18 r730-03 dvl ~] % zpool status                       
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 20:31:26 with 0 errors on Sun Apr 13 00:16:22 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:01:14 with 0 errors on Thu Apr 10 03:50:20 2025
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada1p3  ONLINE       0     0     0
	    ada0p3  ONLINE       0     0     0

errors: No known data errors
[19:19 r730-03 dvl ~] % 

Set it up like an existing drive

I’m choosing da4 as the template. Here, I copy over the partition details:

[19:19 r730-03 dvl ~] % gpart backup da4 | sudo gpart restore da7
[19:23 r730-03 dvl ~] % gpart show da7
=>         34  23437770685  da7  GPT  (11T)
           34            6       - free -  (3.0K)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           79       - free -  (40K)

[19:23 r730-03 dvl ~] % 

I’m not happy with that, let me try manually.

[19:30 r730-03 dvl ~] % sudo gpart destroy -F da7
da7 destroyed

[19:30 r730-03 dvl ~] % sudo gpart create -s gpt da7
da7 created

[19:31 r730-03 dvl ~] % gpart show da7
=>         40  23437770672  da7  GPT  (11T)
           40  23437770672       - free -  (11T)

[19:31 r730-03 dvl ~] % sudo sudo gpart add -i 1 -t freebsd-zfs -a 4k -s 23437770600 da7
da7p1 added
[19:31 r730-03 dvl ~] % gpart show da7                                                  
=>         40  23437770672  da7  GPT  (11T)
           40  23437770600    1  freebsd-zfs  (11T)
  23437770640           72       - free -  (36K)

[19:31 r730-03 dvl ~] % 

I see the offset is slightly different. And I know it’s 4K aligned because of the arguments. I’m going with this.

Setting it up as a hot spare

This adds in the spare (but I remove it right after this, so maybe wait and read the whole thing first):

[19:35 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 20:31:26 with 0 errors on Sun Apr 13 00:16:22 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0
	spares
	  da7p1                AVAIL   

errors: No known data errors

Let’s do the fancy labels, which I could have specified on the gpart add:

[19:35 r730-03 dvl ~] % sudo gpart modify -i 1 -l SEAG_ZJV4HFPE da7
da7p1 modified

Now I want the fancy label in the zpool. Let’s remove that drive:

[19:37 r730-03 dvl ~] % sudo zpool remove data01 da7p1       

Then re-add it using the fancy label:

[19:39 r730-03 dvl ~] % sudo zpool add data01 spare gpt/SEAG_ZJV4HFPE


[19:40 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 20:31:26 with 0 errors on Sun Apr 13 00:16:22 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0
	spares
	  gpt/SEAG_ZJV4HFPE    AVAIL   

errors: No known data errors
[19:40 r730-03 dvl ~] % 

Done. Thank you for coming to my TED talk.

The parts mentioned here

As an Amazon Associate I earn from qualifying purchases. This is the screwdriver set I received for Christmas (2023?).

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top