Making sure I remove the correct drive

Yesterday, I discovered I had removed the wrong drive from a zpool.

In this post:

  • FreeBSD 14.2

Today, the zpool replace command has completed.

Next, I carefully chose the right drive to pull from the drive bays.

Status

This is the zpool status, just before it completed:

[20:27 r730-03 dvl ~] % zpool status data01                
  pool: data01
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Apr 21 21:10:09 2025
	22.1T / 22.1T scanned, 7.36T / 7.37T issued at 95.7M/s
	7.40T resilvered, 99.83% done, 00:02:15 to go
config:

	NAME                     STATE     READ WRITE CKSUM
	data01                   ONLINE       0     0     0
	  mirror-0               ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE    ONLINE       0     0     0
	    replacing-1          ONLINE       0     0     0
	      gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	      gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0  (resilvering)
	  mirror-1               ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT      ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E    ONLINE       0     0     0
	  mirror-2               ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2      ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D    ONLINE       0     0     0

errors: No known data errors

After it completed:

[20:27 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
  scan: resilvered 7.41T in 23:19:38 with 0 errors on Tue Apr 22 20:29:47 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0
	    gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

I see gpt/SG_ZHZ16KEX is no longer in the zpool.

Not trusting myself, I checked this way:

[20:33 r730-03 dvl ~] % zpool status data01 | grep gpt/SG_ZHZ16KEX
[20:33 r730-03 dvl ~] % 

Checking again

Now, remember, the drives had the wrong label, and I fixed that. However, the device numbers were wrong, I’m sure.

I ran this command and searched for the removed device:

[20:49 r730-03 dvl ~] % glabel list
...
Geom name: da1p1
Providers:
1. Name: gpt/SG_ZHZ16KEX
   Mediasize: 12000138547200 (11T)
   Sectorsize: 512

...

OK, that connects da1 and the device id.

It also matches up with the following:

[20:51 r730-03 dvl ~] % tail /var/log/messages
Apr 22 19:17:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 19:17:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 22 19:47:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 19:47:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 22 20:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 20:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 22 20:47:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 22 20:47:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors

Turning the lights on

Let’s try my old method for identifying the drive:

[12:17 r730-03 dvl ~] % sudo dd if=/dev/zero of=/dev/gpt/SG_ZHZ16KEX bs=4M
^C400+0 records in
399+0 records out
1673527296 bytes transferred in 9.567196 secs (174923489 bytes/sec)

[12:18 r730-03 dvl ~] % 

I saw the drive which had the LED on full time… That was the drive I removed.

Looking in /var/log/messages, I found:

Apr 23 12:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 23 12:17:21 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Apr 23 12:18:11 r730-03 kernel: mrsas0: System PD deleted target ID: 0x2
Apr 23 12:18:11 r730-03 kernel: da1 at mrsas0 bus 1 scbus1 target 2 lun 0
Apr 23 12:18:11 r730-03 kernel: da1:   s/n 8CJVT8YE detached
Apr 23 12:18:11 r730-03 kernel: (da1:mrsas0:1:2:0): Periph destroyed

All consistent. Of note, this was drive bay 2 (or, target 2, as seen above).

More important is that the zpool status is still fine:

[12:20 r730-03 dvl ~] % zpool status       
  pool: data01
 state: ONLINE
  scan: resilvered 7.41T in 23:19:38 with 0 errors on Tue Apr 22 20:29:47 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0
	    gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:01:09 with 0 errors on Thu Apr 17 04:49:19 2025
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada1p3  ONLINE       0     0     0
	    ada0p3  ONLINE       0     0     0

errors: No known data errors

Checking one last thing:

[12:21 r730-03 dvl ~] % grep da1 /var/run/dmesg.boot | grep '^da1:' | sort -u
da1: 11444224MB (23437770752 512 byte sectors)
da1: 150.000MB/s transfers
da1:  Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 8CJVT8YE
[12:21 r730-03 dvl ~] % 

The serial number there matches the drive I just pulled.

Phew.

That will stop the most nagging part of this issue: the smartd reports of the troublesome sectors.

Next tasks: box up the drive (already done), buy some postage, and put it into the mail system for the RMA.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top