Playing around with a ‘failing’ drive

In zpool replace, you can read about a drive which was giving errors and which I replaced.

At present, that drive is [still] giving these errors, but it not part of any spool.

[17:38 r730-03 dvl ~] % tail /var/log/messages
Dec 25 14:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Currently unreadable (pending) sectors
Dec 25 14:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 25 15:22:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Currently unreadable (pending) sectors
Dec 25 15:22:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 25 15:52:16 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Currently unreadable (pending) sectors
Dec 25 16:22:17 r730-03 syslogd: last message repeated 1 times
Dec 25 16:52:16 r730-03 syslogd: last message repeated 1 times
Dec 25 16:52:16 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 25 17:22:16 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Currently unreadable (pending) sectors
Dec 25 17:22:16 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors

I’m going to read the whole drive, then write to all of it, just for fun. Can I clear those messages?

I’ve read where you can calculate the problem sectors and write to them. I’m not going to do that.

Furthermore, I may or may not RMA this drive (return it for credit). If I do, the write will effectively erase the disk. At least it will be wiped enough for my purposes.

The read

Here is my read process:

[13:20 r730-03 dvl ~] % sudo dd if=/dev/da6 of=/dev/null bs=1M
load: 0.26  cmd: dd 20010 [physrd] 1062.84r 0.47u 15.61s 1% 3440k
253528+0 records in
253528+0 records out
265843376128 bytes transferred in 1062.841947 secs (250125032 bytes/sec)
load: 0.05  cmd: dd 20010 [physrd] 15386.81r 5.55u 216.49s 0% 3440k
3525678+0 records in
3525678+0 records out
3696941334528 bytes transferred in 15386.813635 secs (240266856 bytes/sec)

The lines starting with load are the output from pressing CTRL-t. Based on the first output, I’m guessing this read will take about 14 hours. You do your own math.

Now, we wait.

These errors turned up:

Dec 25 18:16:00 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 3d 7c 8e 45 00 00 45 00 
Dec 25 18:16:00 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 25 18:16:00 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 25 18:22:19 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 19 Currently unreadable (pending) sectors (changed +1)
Dec 25 18:22:19 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 25 18:22:19 r730-03 smartd[15472]: Device: /dev/da6 [SAT], ATA error count increased from 18 to 20
Dec 25 18:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 19 Currently unreadable (pending) sectors
Dec 25 18:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors

Later, these:

Dec 26 00:22:16 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 19 Currently unreadable (pending) sectors
Dec 26 00:22:16 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 26 00:44:02 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 81 52 33 00 00 00 45 00 
Dec 26 00:44:02 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 00:44:02 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 00:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 20 Currently unreadable (pending) sectors (changed +1)
Dec 26 00:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 26 00:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], ATA error count increased from 20 to 22

13:57

The next morning, the operation had completed. Based on the timestamp in the shell prompt, it had finished about 7 hour earlier.

[13:20 r730-03 dvl ~] % sudo dd if=/dev/da6 of=/dev/null bs=1M
load: 0.26  cmd: dd 20010 [physrd] 1062.84r 0.47u 15.61s 1% 3440k
253528+0 records in
253528+0 records out
265843376128 bytes transferred in 1062.841947 secs (250125032 bytes/sec)
load: 0.05  cmd: dd 20010 [physrd] 15386.81r 5.55u 216.49s 0% 3440k
3525678+0 records in
3525678+0 records out
3696941334528 bytes transferred in 15386.813635 secs (240266856 bytes/sec)
load: 0.29  cmd: dd 20010 [physrd] 17046.76r 6.07u 238.53s 0% 3440k
3884214+0 records in
3884214+0 records out
4072893579264 bytes transferred in 17046.768969 secs (238924666 bytes/sec)
load: 0.50  cmd: dd 20010 [physrd] 19866.79r 7.19u 274.91s 0% 3440k
4480673+0 records in
4480673+0 records out
4698326171648 bytes transferred in 19866.791258 secs (236491445 bytes/sec)
load: 0.12  cmd: dd 20010 [physrd] 43421.11r 14.40u 547.45s 0% 3440k
8870780+0 records in
8870780+0 records out
9301687009280 bytes transferred in 43421.114858 secs (214220364 bytes/sec)
11444224+0 records in
11444224+0 records out
12000138625024 bytes transferred in 62014.346335 secs (193505847 bytes/sec)
[6:34 r730-03 dvl ~] %

The messages persist within /var/log/messages:

Dec 26 13:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 32 Currently unreadable (pending) sectors
Dec 26 13:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors

Oh and we got many more errors:

Dec 26 06:32:09 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f e2 8a 00 00 45 00 
Dec 26 06:32:09 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:09 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:14 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f eb 8a 00 00 45 00 
Dec 26 06:32:14 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:14 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:17 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f eb cf 00 00 31 00 
Dec 26 06:32:17 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:17 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:23 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f ed 8a 00 00 45 00 
Dec 26 06:32:23 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:23 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:28 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f ee cf 00 00 31 00 
Dec 26 06:32:28 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:28 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:34 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f ef cf 00 00 31 00 
Dec 26 06:32:34 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:34 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:40 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f0 45 00 00 45 00 
Dec 26 06:32:40 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:40 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:45 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f1 45 00 00 45 00 
Dec 26 06:32:45 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:45 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:48 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f1 8a 00 00 45 00 
Dec 26 06:32:48 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:48 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:54 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f2 8a 00 00 45 00 
Dec 26 06:32:54 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:54 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:32:59 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f4 45 00 00 45 00 
Dec 26 06:32:59 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:32:59 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:02 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f4 cf 00 00 31 00 
Dec 26 06:33:02 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:02 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:08 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f5 00 00 00 45 00 
Dec 26 06:33:08 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:08 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:13 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f6 cf 00 00 31 00 
Dec 26 06:33:13 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:13 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:19 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f8 8a 00 00 45 00 
Dec 26 06:33:19 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:19 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:24 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f9 cf 00 00 31 00 
Dec 26 06:33:24 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:24 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:27 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f f9 8a 00 00 45 00 
Dec 26 06:33:27 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:27 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:33 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f fa cf 00 00 31 00 
Dec 26 06:33:33 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:33 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:38 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f fb 8a 00 00 45 00 
Dec 26 06:33:38 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:38 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:44 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f fc 45 00 00 45 00 
Dec 26 06:33:44 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:44 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:47 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f fc 00 00 00 45 00 
Dec 26 06:33:47 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:47 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:49 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f fc 8a 00 00 45 00 
Dec 26 06:33:49 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:49 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:33:55 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f fe 00 00 00 45 00 
Dec 26 06:33:55 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:33:55 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:34:01 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f ff 00 00 00 45 00 
Dec 26 06:34:01 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:34:01 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:34:04 r730-03 kernel: (da6:mrsas0:1:7:0): READ(10). CDB: 28 00 ae 9f ff cf 00 00 31 00 
Dec 26 06:34:04 r730-03 kernel: (da6:mrsas0:1:7:0): CAM status: SCSI Status Error
Dec 26 06:34:04 r730-03 kernel: (da6:mrsas0:1:7:0): SCSI status: OK
Dec 26 06:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 32 Currently unreadable (pending) sectors (changed +12)
Dec 26 06:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors

The graphs

This is an annotated screen shot of the LibreNMS graph of this host. You can see the read/write operations via dd.

Let’s try a write

I just started this:

[13:59 r730-03 dvl ~] % sudo dd of=/dev/da6 if=/dev/zero bs=1M

16:44

I’m back at the laptop:

[13:59 r730-03 dvl ~] % sudo dd of=/dev/da6 if=/dev/zero bs=1M
load: 0.51  cmd: dd 12575 [physwr] 12950.92r 5.40u 471.21s 3% 3432k
2976310+0 records in
2976310+0 records out
3120887234560 bytes transferred in 12950.920663 secs (240978021 bytes/sec)
load: 0.53  cmd: dd 12575 [physwr] 21612.61r 8.48u 762.17s 2% 3432k
4824091+0 records in
4824091+0 records out
5058426044416 bytes transferred in 21612.616707 secs (234049681 bytes/sec)
load: 0.24  cmd: dd 12575 [physwr] 25010.75r 9.66u 872.34s 2% 3432k
5512979+0 records in
5512979+0 records out
5780777467904 bytes transferred in 25010.762233 secs (231131599 bytes/sec)
dd: /dev/da6: end of device
11444225+0 records in
11444224+0 records out
12000138625024 bytes transferred in 62049.664409 secs (193395706 bytes/sec)
[16:44 r730-03 dvl ~] %

Hmm, I see that only just finished and took about 26 hours and 45 minutes.

Let’s check the logs:

Dec 27 05:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 30 Currently unreadable (pending) sectors
Dec 27 05:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 27 06:22:18 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 30 Currently unreadable (pending) sectors
Dec 27 06:22:18 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 27 06:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 30 Currently unreadable (pending) sectors
Dec 27 06:52:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 27 07:22:17 r730-03 syslogd: last message repeated 1 times
Dec 27 07:52:18 r730-03 syslogd: last message repeated 1 times
Dec 27 08:22:17 r730-03 smartd[15472]: Device: /dev/da6 [SAT], 18 Offline uncorrectable sectors
Dec 27 08:52:17 r730-03 syslogd: last message repeated 1 times
Dec 27 09:22:17 r730-03 syslogd: last message repeated 1 times
Dec 27 09:52:19 r730-03 syslogd: last message repeated 1 times
Dec 27 10:22:18 r730-03 syslogd: last message repeated 1 times
Dec 27 10:52:19 r730-03 syslogd: last message repeated 1 times

See how theCurrently unreadable (pending) sectors messages have stopped. I think they stopped when the dd wrote to those sectors.

The uncorrectable sectors message remains.

smart tests

This is the device current status:

[16:54 r730-03 dvl ~] % sudo smartctl -a /dev/da6
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Ultrastar DC HC520 (He12)
Device Model:     HGST HUH721212ALN604
Serial Number:    8CJR6GZE
LU WWN Device Id: 5 000cca 26fe64782
Firmware Version: LEGNW9U0
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
Sector Size:      4096 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 27 16:55:11 2024 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   87) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1285) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       96
  3 Spin_Up_Time            0x0007   162   162   024    Pre-fail  Always       -       411 (Average 400)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       14
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       348
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       8089
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       14
 22 Helium_Level            0x0023   100   100   025    Pre-fail  Always       -       6553700
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       430
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       430
194 Temperature_Celsius     0x0002   230   230   000    Old_age   Always       -       26 (Min/Max 19/40)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       348
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       18
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 65 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 65 occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 31 00 cf ff 9f 40 00   3d+09:16:23.225  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+09:16:20.192  READ LOG EXT
  2f 00 01 10 00 00 00 00   3d+09:16:20.192  READ LOG EXT
  60 45 00 00 ff 9f 40 00   3d+09:16:17.456  READ FPDMA QUEUED
  60 45 00 45 ff 9f 40 00   3d+09:16:17.306  READ FPDMA QUEUED

Error 64 occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 45 00 00 ff 9f 40 00   3d+09:16:20.192  READ FPDMA QUEUED
  60 45 00 45 ff 9f 40 00   3d+09:16:17.306  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+09:16:17.300  READ LOG EXT
  2f 00 01 10 00 00 00 00   3d+09:16:17.300  READ LOG EXT
  60 31 18 cf ff 9f 40 00   3d+09:16:14.551  READ FPDMA QUEUED

Error 63 occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 45 00 00 ff 9f 40 00   3d+09:16:17.300  READ FPDMA QUEUED
  60 31 18 cf ff 9f 40 00   3d+09:16:14.551  READ FPDMA QUEUED
  60 45 10 8a ff 9f 40 00   3d+09:16:14.551  READ FPDMA QUEUED
  60 45 08 45 ff 9f 40 00   3d+09:16:14.551  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+09:16:14.550  READ LOG EXT

Error 62 occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 45 18 00 fe 9f 40 00   3d+09:16:14.550  READ FPDMA QUEUED
  60 31 10 cf fe 9f 40 00   3d+09:16:11.738  READ FPDMA QUEUED
  60 45 08 8a fe 9f 40 00   3d+09:16:11.738  READ FPDMA QUEUED
  60 45 00 45 fe 9f 40 00   3d+09:16:11.738  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00   3d+09:16:11.733  READ LOG EXT

Error 61 occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 41 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 45 00 00 fe 9f 40 00   3d+09:16:11.733  READ FPDMA QUEUED
  60 31 18 cf fd 9f 40 00   3d+09:16:08.967  READ FPDMA QUEUED
  60 45 10 8a fd 9f 40 00   3d+09:16:08.967  READ FPDMA QUEUED
  60 45 08 45 fd 9f 40 00   3d+09:16:08.967  READ FPDMA QUEUED
  60 45 00 00 fd 9f 40 00   3d+09:16:08.967  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      7259         2929713821
# 2  Short offline       Completed: read failure       20%      7232         2929713821
# 3  Short offline       Completed: read failure       30%      7229         2929713821
# 4  Short offline       Completed without error       00%       183         -
# 5  Extended offline    Completed without error       00%        20         -
# 6  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

And more information:

[16:55 r730-03 dvl ~] % sudo smartctl -x /dev/da6
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     HGST Ultrastar DC HC520 (He12)
Device Model:     HGST HUH721212ALN604
Serial Number:    8CJR6GZE
LU WWN Device Id: 5 000cca 26fe64782
Firmware Version: LEGNW9U0
User Capacity:    12,000,138,625,024 bytes [12.0 TB]
Sector Size:      4096 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Dec 27 16:56:07 2024 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Disabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(   87) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (1285) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     PO-R--   100   100   016    -    0
  2 Throughput_Performance  P-S---   132   132   054    -    96
  3 Spin_Up_Time            POS---   162   162   024    -    411 (Average 400)
  4 Start_Stop_Count        -O--C-   100   100   000    -    14
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    348
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   128   128   020    -    18
  9 Power_On_Hours          -O--C-   099   099   000    -    8089
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    14
 22 Helium_Level            PO---K   100   100   025    -    6553700
192 Power-Off_Retract_Count -O--CK   100   100   000    -    430
193 Load_Cycle_Count        -O--C-   100   100   000    -    430
194 Temperature_Celsius     -O----   240   240   000    -    25 (Min/Max 19/40)
196 Reallocated_Event_Count -O--CK   100   100   000    -    348
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    18
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL     R/O    256  Device Statistics log
0x04       SL      R/O    255  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x0c       GPL     R/O    688  Pending Defects log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x12       GPL     R/O      1  SATA NCQ Non-Data log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x15       GPL     R/W      1  Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x24       GPL     R/O    256  Current Device Internal Status Data log
0x25       GPL     R/O    256  Saved Device Internal Status Data log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 65 (device log contains only the most recent 4 errors)
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 65 [0] occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 00 00 00 00 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 31 00 00 00 00 ae 9f ff cf 40 00  3d+09:16:23.225  READ FPDMA QUEUED
  2f 00 00 00 01 00 00 00 00 00 10 00 00  3d+09:16:20.192  READ LOG EXT
  2f 00 00 00 01 00 00 00 00 00 10 00 00  3d+09:16:20.192  READ LOG EXT
  60 00 45 00 00 00 00 ae 9f ff 00 40 00  3d+09:16:17.456  READ FPDMA QUEUED
  60 00 45 00 00 00 00 ae 9f ff 45 40 00  3d+09:16:17.306  READ FPDMA QUEUED

Error 64 [3] occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 00 00 00 00 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 45 00 00 00 00 ae 9f ff 00 40 00  3d+09:16:20.192  READ FPDMA QUEUED
  60 00 45 00 00 00 00 ae 9f ff 45 40 00  3d+09:16:17.306  READ FPDMA QUEUED
  2f 00 00 00 01 00 00 00 00 00 10 00 00  3d+09:16:17.300  READ LOG EXT
  2f 00 00 00 01 00 00 00 00 00 10 00 00  3d+09:16:17.300  READ LOG EXT
  60 00 31 00 18 00 00 ae 9f ff cf 40 00  3d+09:16:14.551  READ FPDMA QUEUED

Error 63 [2] occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 00 00 00 00 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 45 00 00 00 00 ae 9f ff 00 40 00  3d+09:16:17.300  READ FPDMA QUEUED
  60 00 31 00 18 00 00 ae 9f ff cf 40 00  3d+09:16:14.551  READ FPDMA QUEUED
  60 00 45 00 10 00 00 ae 9f ff 8a 40 00  3d+09:16:14.551  READ FPDMA QUEUED
  60 00 45 00 08 00 00 ae 9f ff 45 40 00  3d+09:16:14.551  READ FPDMA QUEUED
  2f 00 00 00 01 00 00 00 00 00 10 00 00  3d+09:16:14.550  READ LOG EXT

Error 62 [1] occurred at disk power-on lifetime: 8055 hours (335 days + 15 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 41 00 00 00 00 00 00 00 00 00 00  Error: UNC at LBA = 0x00000000 = 0

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 45 00 18 00 00 ae 9f fe 00 40 00  3d+09:16:14.550  READ FPDMA QUEUED
  60 00 31 00 10 00 00 ae 9f fe cf 40 00  3d+09:16:11.738  READ FPDMA QUEUED
  60 00 45 00 08 00 00 ae 9f fe 8a 40 00  3d+09:16:11.738  READ FPDMA QUEUED
  60 00 45 00 00 00 00 ae 9f fe 45 40 00  3d+09:16:11.738  READ FPDMA QUEUED
  2f 00 00 00 01 00 00 00 00 00 10 00 00  3d+09:16:11.733  READ LOG EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      7259         2929713821
# 2  Short offline       Completed: read failure       20%      7232         2929713821
# 3  Short offline       Completed: read failure       30%      7229         2929713821
# 4  Short offline       Completed without error       00%       183         -
# 5  Extended offline    Completed without error       00%        20         -
# 6  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
Device State:                        Active (0)
Current Temperature:                    25 Celsius
Power Cycle Min/Max Temperature:     23/28 Celsius
Lifetime    Min/Max Temperature:     19/40 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (1)

Index    Estimated Time   Temperature Celsius
   2    2024-12-27 14:49    25  ******
   3    2024-12-27 14:50    26  *******
 ...    ..( 11 skipped).    ..  *******
  15    2024-12-27 15:02    26  *******
  16    2024-12-27 15:03    25  ******
  17    2024-12-27 15:04    25  ******
  18    2024-12-27 15:05    26  *******
 ...    ..( 12 skipped).    ..  *******
  31    2024-12-27 15:18    26  *******
  32    2024-12-27 15:19    25  ******
  33    2024-12-27 15:20    25  ******
  34    2024-12-27 15:21    26  *******
 ...    ..( 44 skipped).    ..  *******
  79    2024-12-27 16:06    26  *******
  80    2024-12-27 16:07    25  ******
  81    2024-12-27 16:08    26  *******
 ...    ..( 46 skipped).    ..  *******
   0    2024-12-27 16:55    26  *******
   1    2024-12-27 16:56    25  ******

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              14  ---  Lifetime Power-On Resets
0x01  0x010  4            8089  ---  Power-on Hours
0x01  0x018  6     57399507914  ---  Logical Sectors Written
0x01  0x020  6      1797230238  ---  Number of Write Commands
0x01  0x028  6     97381993303  ---  Logical Sectors Read
0x01  0x030  6      1990435263  ---  Number of Read Commands
0x01  0x038  6     29122344150  ---  Date and Time TimeStamp
0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4            8084  ---  Spindle Motor Power-on Hours
0x03  0x010  4            8084  ---  Head Flying Hours
0x03  0x018  4             430  ---  Head Load Events
0x03  0x020  4             348  ---  Number of Reallocated Logical Sectors
0x03  0x028  4              96  ---  Read Recovery Attempts
0x03  0x030  4               0  ---  Number of Mechanical Start Failures
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4              65  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4               1  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              25  ---  Current Temperature
0x05  0x010  1              25  N--  Average Short Term Temperature
0x05  0x018  1              26  N--  Average Long Term Temperature
0x05  0x020  1              40  ---  Highest Temperature
0x05  0x028  1              19  ---  Lowest Temperature
0x05  0x030  1              39  N--  Highest Average Short Term Temperature
0x05  0x038  1              21  N--  Lowest Average Short Term Temperature
0x05  0x040  1              36  N--  Highest Average Long Term Temperature
0x05  0x048  1              24  N--  Lowest Average Long Term Temperature
0x05  0x050  4               0  ---  Time in Over-Temperature
0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
0x05  0x060  4               0  ---  Time in Under-Temperature
0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4              43  ---  Number of Hardware Resets
0x06  0x010  4              14  ---  Number of ASR Events
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0xff  =====  =               =  ===  == Vendor Specific Statistics (rev 1) ==
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

Pending Defects log (GP Log 0x0c)
No Defects Logged

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  2            1  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2            2  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

Let’s run a short test:

[16:56 r730-03 dvl ~] % sudo smartctl -t short /dev/da6                                 
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Dec 27 16:59:10 2024 UTC
Use smartctl -X to abort test.

Two minutes later, we see this in the output of sudo smartctl -a /dev/da6:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      8089         -
# 2  Short offline       Completed: read failure       90%      7259         2929713821
# 3  Short offline       Completed: read failure       20%      7232         2929713821
# 4  Short offline       Completed: read failure       30%      7229         2929713821
# 5  Short offline       Completed without error       00%       183         -
# 6  Extended offline    Completed without error       00%        20         -
# 7  Short offline       Completed without error       00%         0         -

Notice the three failed short tests. Those were run shortly after the first error messages appeared.

The long test

Next, a long test.

[16:59 r730-03 dvl ~] % sudo smartctl -t long /dev/da6      
smartctl 7.4 2023-08-01 r5530 [FreeBSD 14.1-RELEASE-p5 amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 1285 minutes for test to complete.
Test will complete after Sat Dec 28 14:25:00 2024 UTC
Use smartctl -X to abort test.
[17:00 r730-03 dvl ~] %

See you tomorrow.

2024-12-30

After a lost weekend sick in bed, I found:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      8109         -
# 2  Short offline       Completed without error       00%      8089         -

I am going to RMA the drive. And probably order a spare 12TB drive to keep on hand. For the next time.

But wait! There’s more

I always use caution when removing a drive from a running host. I know it is /dev/da6, but that doesn’t necessarily tell me what drive bay.

[19:37 r730-03 dvl ~] % grep da6 /var/run/dmesg.boot 
da6 at mrsas0 bus 1 scbus1 target 7 lun 0
da6:  Fixed Direct Access SPC-4 SCSI device
da6: Serial Number 8CJR6GZE
da6: 150.000MB/s transfers
da6: 11444224MB (2929721344 4096 byte sectors)

This says target 7. That makes me think drive bay 7.

I went into the server console and looked around. I found this:

The serial number matches. Drive bay 7.

When I got down to the basement, there it is, labelled:

Upon removal, this appeared in /var/log/messages:

Dec 30 19:34:12 r730-03 kernel: mrsas0: System PD deleted target ID: 0x7
Dec 30 19:34:12 r730-03 kernel: da6 at mrsas0 bus 1 scbus1 target 7 lun 0
Dec 30 19:34:12 r730-03 kernel: da6:   s/n 8CJR6GZE detached
Dec 30 19:34:12 r730-03 kernel: (da6:mrsas0:1:7:0): Periph destroyed

Now it’s time to pack up that drive and send it back.