What drive did I just remove from the system?

So there I was… ready to remove the drive from the system. This was the drive which was giving errors and which had already been replaced.

In this post:

  • FreeBSD 14.2

Let’s look at the drive I just wiped … I was doing this command:

[20:11 r730-03 dvl ~] % sudo dd if=/dev/zero of=/dev/gpt/HGST_8CJVT8YE bs=4M

Let’s run it again and see which drive LED lights up.

Yep, there it is. CTL-C, LED goes off.

Run it again, put my finger on the cage release button, CTL-C it, press the button, got the drive.

And:

Apr 21 20:12:59 r730-03 kernel: da2 at mrsas0 bus 1 scbus1 target 3 lun 0
Apr 21 20:12:59 r730-03 kernel: da2:  Fixed Direct Access SPC-4 SCSI device
Apr 21 20:12:59 r730-03 kernel: da2: Serial Number ZHZ16KEX
Apr 21 20:12:59 r730-03 kernel: da2: 150.000MB/s transfers
Apr 21 20:12:59 r730-03 kernel: da2: 11444224MB (23437770752 512 byte sectors)

Umm, that’s not the drive. The drive giving the errors is:

Apr 21 20:17:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors
Apr 21 20:17:22 r730-03 smartd[16597]: Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors

A quick check: zpool status is fine:

[20:10 r730-03 dvl ~] % zpool status
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 19:52:50 with 0 errors on Thu Apr 17 07:09:29 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:01:09 with 0 errors on Thu Apr 17 04:49:19 2025
config:

	NAME        STATE     READ WRITE CKSUM
	zroot       ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    ada1p3  ONLINE       0     0     0
	    ada0p3  ONLINE       0     0     0

errors: No known data errors

No problems there.

I put the drive back in, because: clearly, I still need it.

I see:

Apr 21 20:12:59 r730-03 kernel: da2 at mrsas0 bus 1 scbus1 target 3 lun 0
Apr 21 20:12:59 r730-03 kernel: da2:  Fixed Direct Access SPC-4 SCSI device
Apr 21 20:12:59 r730-03 kernel: da2: Serial Number ZHZ16KEX
Apr 21 20:12:59 r730-03 kernel: da2: 150.000MB/s transfers
Apr 21 20:12:59 r730-03 kernel: da2: 11444224MB (23437770752 512 byte sectors)

Looking in the logs, that serial still matches, OK, nothing seems wrong yet. Except for the labels.

[20:11 r730-03 dvl ~] % grep da2 /var/run/dmesg.boot 
da2 at mrsas0 bus 1 scbus1 target 3 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number ZHZ16KEX
da2: 150.000MB/s transfers
da2: 11444224MB (23437770752 512 byte sectors)
da2 at mrsas0 bus 1 scbus1 target 3 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number ZHZ16KEX
da2: 150.000MB/s transfers
da2: 11444224MB (23437770752 512 byte sectors)

Yeah, OK, everything is fine.

What?

I’m writing this up because now is no time for confusion. I’ve got to get this right.

The drive I pulled has label HGST_8CJVT8YE: I used that during the dd command above.

The drive I pulled has a serial number of ZHZ16KEX, which does not match that label.

The problem drive identity

The problem drive is da1.

Looking for that, I find:

[20:48 r730-03 dvl ~] % grep '^da1: ' /var/run/dmesg.boot | sort | uniq
da1: 11444224MB (23437770752 512 byte sectors)
da1: 150.000MB/s transfers
da1:  Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 8CJVT8YE

Looking at all the labels for gpart:

[20:13 r730-03 dvl ~] % gpart show -l
=>       40  937703008  ada0  GPT  (447G)
         40       1024     1  gptboot1  (512K)
       1064        984        - free -  (492K)
       2048   67108864     2  swap1  (32G)
   67110912  870590464     3  zfs1  (415G)
  937701376       1672        - free -  (836K)

=>       40  937703008  ada1  GPT  (447G)
         40       1024     1  gptboot0  (512K)
       1064        984        - free -  (492K)
       2048   67108864     2  swap0  (32G)
   67110912  870590464     3  zfs0  (415G)
  937701376       1672        - free -  (836K)

=>         40  23437770672  da0  GPT  (11T)
           40  23437770600    1  HGST_5PGGTH3D  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da4  GPT  (11T)
           40  23437770600    1  HGST_8CJW1G4E  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da1  GPT  (11T)
           40  23437770600    1  SG_ZHZ16KEX  (11T)
  23437770640           72       - free -  (36K)

=>         34  23437770685  da5  GPT  (11T)
           34            6       - free -  (3.0K)
           40  23437770600    1  SG_ZL2NJBT2  (11T)
  23437770640           79       - free -  (40K)

=>         40  23437770672  da3  GPT  (11T)
           40  23437770600    1  SG_ZHZ03BAT  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da7  GPT  (11T)
           40  23437770600    1  SEAG_ZJV4HFPE  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da2  GPT  (11T)
           40  23437770600    1  HGST_8CJVT8YE  (11T)
  23437770640           72       - free -  (36K)

Well, damn. The label for da2 contains the serial number for da1.

That needs to be fixed.

All the serial numbers

Here are all the serial numbers:

[20:41 r730-03 dvl ~] % grep 'Serial Number' /var/run/dmesg.boot | sort | uniq
ada0: Serial Number BTWA602402P7480FGN
ada1: Serial Number BTWA604405H2480FGN
cd0: Serial Number KZDHB6D4311
da0: Serial Number 5PGGTH3D
da1: Serial Number 8CJVT8YE
da2: Serial Number ZHZ16KEX
da3: Serial Number ZHZ03BAT
da4: Serial Number 8CJW1G4E
da5: Serial Number ZL2NJBT2
da6: Serial Number 012345678901

Fixing da1

Let’s fix the wrong label for da1, the problem drive, the one to be removed:

[20:48 r730-03 dvl ~] % sudo gpart modify -i 1 -l HGST_8CJVT8YE da1
da1p1 modified
[20:49 r730-03 dvl ~] % gpart show -l da1
=>         40  23437770672  da1  GPT  (11T)
           40  23437770600    1  HGST_8CJVT8YE  (11T)
  23437770640           72       - free -  (36K)

[20:49 r730-03 dvl ~] % 

There, now it matches:

[20:49 r730-03 dvl ~] % grep 8CJVT8YE /var/run/dmesg.boot | sort | uniq 
da1: Serial Number 8CJVT8YE
[20:50 r730-03 dvl ~] % 

Fix the others

First, I’ll fix da2 upon which we found the wrong serial number. What’s the right one?

[20:50 r730-03 dvl ~] % grep '^da2: ' /var/run/dmesg.boot | sort | uniq
da2: 11444224MB (23437770752 512 byte sectors)
da2: 150.000MB/s transfers
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number ZHZ16KEX

Here’s the fix:

[20:53 r730-03 dvl ~] % sudo gpart modify -i 1 -l SEAG_ZHZ16KEX da2
da2p1 modified
[20:53 r730-03 dvl ~] % gpart show -l da2
=>         40  23437770672  da2  GPT  (11T)
           40  23437770600    1  SEAG_ZHZ16KEX  (11T)
  23437770640           72       - free -  (36K)

[20:53 r730-03 dvl ~] % 

Checking the others

This is the current status:

[20:54 r730-03 dvl ~] % gpart show -l
=>       40  937703008  ada0  GPT  (447G)
         40       1024     1  gptboot1  (512K)
       1064        984        - free -  (492K)
       2048   67108864     2  swap1  (32G)
   67110912  870590464     3  zfs1  (415G)
  937701376       1672        - free -  (836K)

=>       40  937703008  ada1  GPT  (447G)
         40       1024     1  gptboot0  (512K)
       1064        984        - free -  (492K)
       2048   67108864     2  swap0  (32G)
   67110912  870590464     3  zfs0  (415G)
  937701376       1672        - free -  (836K)

=>         40  23437770672  da0  GPT  (11T)
           40  23437770600    1  HGST_5PGGTH3D  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da4  GPT  (11T)
           40  23437770600    1  HGST_8CJW1G4E  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da1  GPT  (11T)
           40  23437770600    1  HGST_8CJVT8YE  (11T)
  23437770640           72       - free -  (36K)

=>         34  23437770685  da5  GPT  (11T)
           34            6       - free -  (3.0K)
           40  23437770600    1  SG_ZL2NJBT2  (11T)
  23437770640           79       - free -  (40K)

=>         40  23437770672  da3  GPT  (11T)
           40  23437770600    1  SG_ZHZ03BAT  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da7  GPT  (11T)
           40  23437770600    1  SEAG_ZJV4HFPE  (11T)
  23437770640           72       - free -  (36K)

=>         40  23437770672  da2  GPT  (11T)
           40  23437770600    1  SEAG_ZHZ16KEX  (11T)
  23437770640           72       - free -  (36K)

I’ve compared the above list against the serial number. All good.

zpool status

Next, let’s figure out if this all make sense too, just in case. I have manually add in the device numbers.

[20:57 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 19:52:50 with 0 errors on Thu Apr 17 07:09:29 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/SEAG_ZJV4HFPE  ONLINE       0     0     0  da7
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0  da1
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0  da3
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0  da4
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0  da5
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0  dao

errors: No known data errors

That list of devices does not include the drive I removed: da2.

The plan

Let’s do another replace. Let’s replace da1 (gpt/SG_ZHZ16KEX) with da2 (gpt/SEAG_ZHZ16KEX).

I plan to issue this command: sudo zpool replace data01 gpt/SG_ZHZ16KEX gpt/SEAG_ZHZ16KEX

Checking the man page, I confirm, it is OLDDEV NEWDEV.

OK, let’s get that started, in a tmux session.

[21:09 r730-03 dvl ~] % sudo zpool replace data01 gpt/SG_ZHZ16KEX gpt/SEAG_ZHZ16KEX
[21:10 r730-03 dvl ~] % zpool status data01
  pool: data01
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Apr 21 21:10:09 2025
        56.4G / 22.1T scanned at 1.41G/s, 0B / 22.1T issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                     STATE     READ WRITE CKSUM
        data01                   ONLINE       0     0     0
          mirror-0               ONLINE       0     0     0
            gpt/SEAG_ZJV4HFPE    ONLINE       0     0     0
            replacing-1          ONLINE       0     0     0
              gpt/SG_ZHZ16KEX    ONLINE       0     0     0
              gpt/SEAG_ZHZ16KEX  ONLINE       0     0     0
          mirror-1               ONLINE       0     0     0
            gpt/SG_ZHZ03BAT      ONLINE       0     0     0
            gpt/HGST_8CJW1G4E    ONLINE       0     0     0
          mirror-2               ONLINE       0     0     0
            gpt/SG_ZL2NJBT2      ONLINE       0     0     0
            gpt/HGST_5PGGTH3D    ONLINE       0     0     0

errors: No known data errors
[21:10 r730-03 dvl ~] % 

See! See right there, the serial number is on both devices under replacing-1. :/

Hindsight

What did I do wrong? Looking at my blog post where I did the previous replace (search for “I did a replace”). In there, I did:

sudo zpool replace data01 gpt/HGST_8CJVT8YE gpt/SEAG_ZJV4HFPE

That removed gpt/HGST_8CJVT8YE from the zpool. How did I choose that device? I did this:

[21:17 r730-03 dvl ~] % grep ^da1 /var/run/dmesg.boot | sort | uniq
da1 at mrsas0 bus 1 scbus1 target 2 lun 0
da1: 11444224MB (23437770752 512 byte sectors)
da1: 150.000MB/s transfers
da1:  Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 8CJVT8YE

That told me the serial number: 8CJVT8YE – then I looked at zpool status and picked the device. The device with the wrong serial number in the label.

Tracking this down, this situation has been wrong since this blog post from August 2023. If you search that blog post for 8CJVT8YE you will find:

[23:39 r730-03 dvl ~] % grep da2 /var/run/dmesg.boot
da2 at mrsas0 bus 1 scbus1 target 2 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number 8CJVT8YE            



[14:07 r730-03 dvl ~] % sudo diskinfo -cit da2 
da2
	512         	# sectorsize
	12000138625024	# mediasize in bytes (11T)
	23437770752 	# mediasize in sectors
	4096        	# stripesize
	0           	# stripeoffset
	1458933     	# Cylinders according to firmware.
	255         	# Heads according to firmware.
	63          	# Sectors according to firmware.
	ATA HGST HUH721212AL	# Disk descr.
	8CJVT8YE            	# Disk ident.
...

[21:34 r730-03 dvl ~] % sudo smartctl -x /dev/da2
smartctl 7.4 2023-08-01 r5530 [FreeBSD 13.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:     HGST Ultrastar DC HC520 (He12)
Device Model:     HGST HUH721212ALE600
Serial Number:    8CJVT8YE

...

[14:24 r730-03 dvl ~] % sudo gpart add -t freebsd-zfs -a 4K -s 23437770600 -l HGST_8CJVT8YE da3
da3p1 added

You can see it right there. da2, da2, da2da3… oh no…

And in the lines under “Creating partitions”, you can see where I mislabeled the other drive with ZHZ16KEX. I did both drives with the wrong labels.

OK, we know who to blame.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top