You might recall that suspect drive
from the zpool replace on the weekend. Thomas Hurst suggested:
I, being one to take advice from people on the internet, and Michael W Lucas, decided to try his suggestion.
The drive in question.
[dan@knew:~] $ tail /var/log/messages Dec 14 00:00:00 knew newsyslog[88570]: logfile turned over Dec 14 00:23:04 knew smartd[2124]: Device: /dev/da17 [SAT], 40 Currently unreadable (pending) sectors Dec 14 00:53:04 knew syslogd: last message repeated 1 times Dec 14 01:23:03 knew syslogd: last message repeated 1 times [dan@knew:~] $ gpart show da17 => 34 9767541101 da17 GPT (4.5T) 34 6 - free - (3.0K) 40 9766000000 1 freebsd-zfs (4.5T) 9766000040 1541095 - free - (752M)
I start, by being sure geli is loaded.
[dan@knew:~] $ kldload geom_eli kldload: can't load geom_eli: Operation not permitted [dan@knew:~] $ sudo kldload geom_eli kldload: can't load geom_eli: module already loaded or in kernel
Why geli? Because Thomas suggested it.
OK, so let’s follow the documentation, in part. I omitted the -K /root/da2.key paramter.
[dan@knew:~] $ sudo geli init -s 4096 /dev/da17p1 Enter new passphrase: Reenter new passphrase: Metadata backup for provider /dev/da17p1 can be found in /var/backups/da17p1.eli and can be restored with the following command: # geli restore /var/backups/da17p1.eli /dev/da17p1 [dan@knew:~] $
Then I started the dd:
[dan@knew:~] $ sudo dd if=/dev/zero of=/dev/da17p1 bs=1M ^C1172+0 records in 1171+0 records out 1227882496 bytes transferred in 6.263732 secs (196030500 bytes/sec)
I control-C’d it so I could add time to the front of the command so I could see how long it took.
Well, there’s the time, right there, in the output, without using time.
OK, let’s try this again:
[dan@knew:~] $ sudo dd if=/dev/zero of=/dev/da17p1 bs=1M
Some time later
Here is what I found the next morning:
[dan@knew:~] $ sudo dd if=/dev/zero of=/dev/da17p1 bs=1M dd: /dev/da17p1: short write on character device dd: /dev/da17p1: end of device 4768555+0 records in 4768554+1 records out 5000192000000 bytes transferred in 29778.616710 secs (167912165 bytes/sec) [dan@knew:~] $
Why geli? It’s not all zeros, and the theory is that’s faster than /dev/random .
Let’s look at the logs
[dan@knew:~] $ tail /var/log/messages Dec 14 05:23:04 knew syslogd: last message repeated 1 times Dec 14 05:53:04 knew syslogd: last message repeated 1 times Dec 14 06:23:04 knew smartd[2124]: Device: /dev/da17 [SAT], 40 Currently unreadable (pending) sectors Dec 14 06:53:04 knew syslogd: last message repeated 1 times Dec 14 07:23:03 knew syslogd: last message repeated 1 times Dec 14 07:53:04 knew syslogd: last message repeated 1 times Dec 14 08:23:03 knew smartd[2124]: Device: /dev/da17 [SAT], 40 Currently unreadable (pending) sectors Dec 14 08:53:04 knew syslogd: last message repeated 1 times Dec 14 09:23:03 knew syslogd: last message repeated 1 times Dec 14 09:53:04 knew syslogd: last message repeated 1 times [dan@knew:~] $ [dan@knew:~] $ date Mon Dec 14 14:33:08 UTC 2020
There have been no smartd messages in the past 4.5 hours
EDIT: make that the past 15.5 hours:
[dan@knew:~] $ tail -2 /var/log/messages.0 Dec 14 09:53:04 knew syslogd: last message repeated 1 times Dec 15 00:00:00 knew newsyslog[67573]: logfile turned over [dan@knew:~] $ [dan@knew:~] $ tail /var/log/messages Dec 15 00:00:00 knew newsyslog[67573]: logfile turned over
Let’s do a diff on the before and after smartctl output
[dan@pro02:~/tmp] $ diff -ruN before after --- before 2020-12-14 09:39:39.000000000 -0500 +++ after 2020-12-14 09:37:33.000000000 -0500 @@ -3,22 +3,22 @@ 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 9291 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 138 - 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 12448 + 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 12504 7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0 - 9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 43747 + 9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 43795 10 Spin_Retry_Count 0x0033 102 100 030 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 138 -191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 5680 +191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 5681 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 129 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 741 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 33 (Min/Max 15/51) -196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1435 -197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 40 +196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1438 +197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 7 220 Disk_Shift 0x0002 100 100 000 Old_age Always - 0 -222 Loaded_Hours 0x0032 001 001 000 Old_age Always - 43578 +222 Loaded_Hours 0x0032 001 001 000 Old_age Always - 43626 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0 224 Load_Friction 0x0022 100 100 000 Old_age Always - 0 226 Load-in_Time 0x0026 100 100 000 Old_age Always - 203 [dan@pro02:~/tmp] $
From the above:
- Reallocated_Sector_Ct has increased by 56 from 12448 to 12504
- Current_Pending_Sector has dropped from 40 to 0
So… opinons on this drive now?
Well, Thomas said:
When I said “geli onetime”, I literally meant the “geli onetime” command, which keeps keys in RAM for one-time use, such as for encrypted swap partitions.
Also you completely bypassed it there by not using the .eli device :)
Finally – wow, that’s a *lot* of reallocated sectors.
And Dag-Erling Smørgrav said:
dd if=/dev/zero would have sufficed
So there you go.