Making sure I remove the correct drive

Yesterday, I discovered I had removed the wrong drive from a zpool.

In this post:

FreeBSD 14.2

Today, the zpool replace command has completed.

Next, I carefully chose the right drive to pull from the drive bays.

Status

This is the zpool status, just before it completed:

[20:27 r730-03 dvl ~] % zpool status data01                
  pool: data01
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Apr 21 21:10:09 2025
	22.1T / 22.1T scanned, 7.36T / 7.37T issued at 95.7M/s
	7.40T resilvered, 99.83% done, 00:02:15 to go
config:

	NAME                     STATE     READ WRITE CKSUM
	data01                   ONLINE       0     0     0
	  mirror-0               ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE    ONLINE       0     0     0
	    replacing-1          ONLINE       0     0     0
	      gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	      gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0  (resilvering)
	  mirror-1               ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT      ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E    ONLINE       0     0     0
	  mirror-2               ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2      ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D    ONLINE       0     0     0

errors: No known data errors

After it completed:

[20:27 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
  scan: resilvered 7.41T in 23:19:38 with 0 errors on Tue Apr 22 20:29:47 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0
	    gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

I see gpt/SG_ZHZ16KEX is no longer in the zpool.

Not trusting myself, I checked this way:

[20:33 r730-03 dvl ~] % zpool status data01 | grep gpt/SG_ZHZ16KEX
[20:33 r730-03 dvl ~] %

Checking again

Now, remember, the drives had the wrong label, and I fixed that. However, the device numbers were wrong, I’m sure.

I ran this command and searched for the removed device:

[20:49 r730-03 dvl ~] % glabel list
...
Geom name: da1p1
Providers:
1. Name: gpt/SG_ZHZ16KEX
   Mediasize: 12000138547200 (11T)
   Sectorsize: 512

...

OK, that connects da1 and the device id.

It also matches up with the following:

[20:51 r730-03 dvl ~] % tail /var/log/messages
Apr 22 19:17:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 19:17:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 22 19:47:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 19:47:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 22 20:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 20:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 22 20:47:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 20:47:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors

Turning the lights on

Let’s try my old method for identifying the drive:

[12:17 r730-03 dvl ~] % sudo dd if=/dev/zero of=/dev/gpt/SG_ZHZ16KEX bs=4M
^C400+0 records in
399+0 records out
1673527296 bytes transferred in 9.567196 secs (174923489 bytes/sec)

[12:18 r730-03 dvl ~] %

I saw the drive which had the LED on full time… That was the drive I removed.

Looking in /var/log/messages, I found:

Apr 23 12:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 23 12:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 23 12:18:11 r730-03 kernel: mrsas0: System PD deleted target ID: 0x2
Apr 23 12:18:11 r730-03 kernel: da1 at mrsas0 bus 1 scbus1 target 2 lun 0
Apr 23 12:18:11 r730-03 kernel: da1:   s/n 8CJVT8YE detached
Apr 23 12:18:11 r730-03 kernel: (da1:mrsas0:1:2:0): Periph destroyed

All consistent. Of note, this was drive bay 2 (or, target 2, as seen above).

More important is that the zpool status is still fine:

[12:20 r730-03 dvl ~] % zpool status       
  pool: data01
 state: ONLINE
  scan: resilvered 7.41T in 23:19:38 with 0 errors on Tue Apr 22 20:29:47 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0
	    gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:01:09 with 0 errors on Thu Apr 17 04:49:19 2025
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada1p3  ONLINE       0     0     0
	    ada0p3  ONLINE       0     0     0

errors: No known data errors

Checking one last thing:

[12:21 r730-03 dvl ~] % grep da1 /var/run/dmesg.boot | grep '^da1:' | sort -u
da1: 11444224MB (23437770752 512 byte sectors)
da1: 150.000MB/s transfers
da1:  Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 8CJVT8YE
[12:21 r730-03 dvl ~] %

The serial number there matches the drive I just pulled.

Phew.

That will stop the most nagging part of this issue: the smartd reports of the troublesome sectors.

Next tasks: box up the drive (already done), buy some postage, and put it into the mail system for the RMA.

The next day (today): that has been mailed and I found tracking details online. Time to publish.