I woke up to this today. Complicating the matter: this server is destined to be transported tomorrow morning.
This is FreeBSD 8.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # grep smartd /var/log/messages Apr 18 22:48:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 18 23:18:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 18 23:48:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 00:18:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 00:48:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 01:18:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 01:48:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 02:18:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 02:48:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 03:18:11 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 03:48:13 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 04:18:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 04:48:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 05:18:10 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 05:48:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 06:18:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 06:48:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 07:18:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 07:48:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 08:18:13 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 08:48:14 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 09:18:11 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 09:48:12 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 10:18:14 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 10:48:14 kraken smartd[1400]: Device: /dev/ada5, 2 Currently unreadable (pending) sectors Apr 19 11:18:13 kraken smartd[1400]: Device: /dev/ada5, ATA error count increased from 0 to 5 |
I guess the good news is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | # zpool status pool: storage state: ONLINE scan: scrub in progress since Fri Apr 19 03:11:14 2013 3.00T scanned out of 9.72T at 95.9M/s, 20h25m to go 103K repaired, 30.85% done config: NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/disk01-live ONLINE 0 0 0 gpt/disk02-live ONLINE 0 0 0 gpt/disk03-live ONLINE 0 0 0 gpt/disk04-live ONLINE 0 0 0 (repairing) gpt/disk05-live ONLINE 0 0 0 gpt/disk06-live ONLINE 0 0 0 gpt/disk07-live ONLINE 0 0 0 errors: No known data errors |
But wait, is that gpt/disk04-live really /dev/ada5? Yes. Yes it is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | $ gpart list ada5 Geom name: ada5 modified: false state: OK fwheads: 16 fwsectors: 63 last: 3907029134 first: 34 entries: 128 scheme: GPT Providers: 1. Name: ada5p1 Mediasize: 2000188135936 (1.8T) Sectorsize: 512 Stripesize: 0 Stripeoffset: 1048576 Mode: r1w1e2 rawuuid: 4ed25145-9dd7-11df-83c1-001b2151ab2d rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b label: disk04-live length: 2000188135936 offset: 1048576 type: freebsd-zfs index: 1 end: 3906619500 start: 2048 Consumers: 1. Name: ada5 Mediasize: 2000398934016 (1.8T) Sectorsize: 512 Mode: r1w1e3 |
After a question from Peter Wemm on Twitter, I added:
1 2 3 4 5 6 7 8 9 10 11 12 | # smartctl -P show /dev/ada5 smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Drive found in smartmontools Database. Drive identity strings: MODEL: Hitachi HDS722020ALA330 FIRMWARE: JKAOA28A match smartmontools Drive Database entry: MODEL REGEXP: Hitachi HDS722020ALA330 FIRMWARE REGEXP: .* MODEL FAMILY: Hitachi Deskstar 7K2000 ATTRIBUTE OPTIONS: None preset; no -v options are required. |
Full smartctl output follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | # smartctl -a /dev/ada5 smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.2-STABLE amd64] (local build) Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Hitachi Deskstar 7K2000 Device Model: Hitachi HDS722020ALA330 Serial Number: JK1130YAH324ST Firmware Version: JKAOA28A User Capacity: 2,000,398,934,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: ATA-8-ACS revision 4 Local Time is: Fri Apr 19 12:14:15 2013 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (22771) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 255) minutes. SCT capabilities: (0x003d) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 102 3 Spin_Up_Time 0x0007 124 124 024 Pre-fail Always - 609 (Average 552) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 62 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 7 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39 9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 27078 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 62 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 175 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 175 194 Temperature_Celsius 0x0002 171 171 000 Old_age Always - 35 (Lifetime Min/Max 19/47) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 8 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 SMART Error Log Version: 1 ATA Error Count: 5 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 5 occurred at disk power-on lifetime: 27077 hours (1128 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 5e 76 e2 f9 00 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 ce a8 3f e7 f9 40 00 26d+03:50:46.553 READ FPDMA QUEUED 60 ce b0 71 e6 f9 40 00 26d+03:50:46.553 READ FPDMA QUEUED 60 cd b8 a4 e5 f9 40 00 26d+03:50:46.553 READ FPDMA QUEUED 60 ce c0 d6 e4 f9 40 00 26d+03:50:46.553 READ FPDMA QUEUED 60 33 c8 a3 e4 f9 40 00 26d+03:50:46.552 READ FPDMA QUEUED Error 4 occurred at disk power-on lifetime: 27077 hours (1128 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 5e 76 e2 f9 00 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 ce a8 3f e7 f9 40 00 26d+03:50:27.938 READ FPDMA QUEUED 60 ce b0 71 e6 f9 40 00 26d+03:50:27.938 READ FPDMA QUEUED 60 cd b8 a4 e5 f9 40 00 26d+03:50:27.937 READ FPDMA QUEUED 60 ce c0 d6 e4 f9 40 00 26d+03:50:27.937 READ FPDMA QUEUED 60 33 c8 a3 e4 f9 40 00 26d+03:50:27.937 READ FPDMA QUEUED Error 3 occurred at disk power-on lifetime: 27077 hours (1128 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 5e 76 e2 f9 00 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 ce a8 3f e7 f9 40 00 26d+03:50:10.086 READ FPDMA QUEUED 60 ce b0 71 e6 f9 40 00 26d+03:50:10.085 READ FPDMA QUEUED 60 cd b8 a4 e5 f9 40 00 26d+03:50:10.085 READ FPDMA QUEUED 60 ce c0 d6 e4 f9 40 00 26d+03:50:10.085 READ FPDMA QUEUED 60 33 c8 a3 e4 f9 40 00 26d+03:50:10.085 READ FPDMA QUEUED Error 2 occurred at disk power-on lifetime: 27077 hours (1128 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 5e 76 e2 f9 00 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 ce b0 3f e7 f9 40 00 26d+03:49:52.579 READ FPDMA QUEUED 60 ce a8 71 e6 f9 40 00 26d+03:49:52.574 READ FPDMA QUEUED 60 cd f0 a4 e5 f9 40 00 26d+03:49:52.574 READ FPDMA QUEUED 61 04 b0 37 c5 71 40 00 26d+03:49:52.573 WRITE FPDMA QUEUED 61 02 f0 0c 51 67 40 00 26d+03:49:52.572 WRITE FPDMA QUEUED Error 1 occurred at disk power-on lifetime: 27077 hours (1128 days + 5 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 5e 76 e2 f9 00 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 ce f0 d6 e4 f9 40 00 26d+03:49:34.795 READ FPDMA QUEUED 60 33 e8 a3 e4 f9 40 00 26d+03:49:34.735 READ FPDMA QUEUED 60 34 b0 6f e4 f9 40 00 26d+03:49:34.671 READ FPDMA QUEUED 60 67 d0 08 e4 f9 40 00 26d+03:49:34.671 READ FPDMA QUEUED 60 67 d8 a1 e3 f9 40 00 26d+03:49:34.668 READ FPDMA QUEUED SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 24382 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. |