On supernews a drive acted up over night.
The main purpose of this post is for me to record the information. You might not find much useful here.
The host is running FreeBSD 12 and is a FreshPorts development box.
I saw this error in the logs:
Aug 8 03:17:15 supernews smartd[66288]: Device: /dev/twa0 [3ware_disk_00], ATA error count increased from 22 to 23
smart emailed me because I set that up some time ago.
The email looked like this:
This message was generated by the smartd daemon running on: host name: supernews DNS domain: example.org The following warning/error was logged by the smartd daemon: Device: /dev/twa0 [3ware_disk_00], ATA error count increased from 22 to 23 Device info: WDC WD740GD-00FLC0, S/N:WD-WMAKE2379003, FW:33.08F33, 74.3 GB For details see host's SYSLOG. You can also use the smartctl utility for further investigation. No additional messages about this problem will be sent.
I logged in and did some looking. I found the above mentioned entry in /var/log/messages
For the record, here is some of the stuff I saw:
[dan@supernews:~] $ sudo /usr/local/sbin/tw_cli info c0 u0 Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 DEGRADED* - - - 64K 195.548 u0-0 RAID-1 DEGRADED - - - - - u0-0-0 DISK DEGRADED - - p0 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 OK - - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 OK - - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548 [dan@supernews:~] $ sudo tw_cli info c0 Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 DEGRADED - - 64K 195.548 OFF ON u1 SPARE OK - - - 69.2404 - ON u2 SPARE OK - - - 69.2404 - OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 DEVICE-ERROR u0 69.25 GB 145226112 WD-WMAKE2379003 p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069 p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066 p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012 p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286 p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019 p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339 p7 OK u2 69.25 GB 145226112 WD-WMAKE2378696 Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 20-Nov-2017 [dan@supernews:~] $ sudo tw_cli info c0 u0 Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 DEGRADED* - - - 64K 195.548 u0-0 RAID-1 DEGRADED - - - - - u0-0-0 DISK DEGRADED - - p0 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 OK - - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 OK - - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548 [dan@supernews:~] $ sudo /usr/local/sbin/tw_cli //supernews> show all Error: (CLI:041) Invalid shell command. //supernews> /c0 show all /c0 Driver Version = 3.80.06.003 /c0 Model = 9550SX-8LP /c0 Available Memory = 112MB /c0 Firmware Version = FE9X 3.08.00.029 /c0 Bios Version = BE9X 3.10.00.003 /c0 Boot Loader Version = BL9X 3.01.00.006 /c0 Serial Number = L20805B5500320 /c0 PCB Version = Rev 032 /c0 PCHIP Version = 1.60 /c0 ACHIP Version = 1.70 /c0 Number of Ports = 8 /c0 Number of Drives = 8 /c0 Number of Units = 3 /c0 Total Optimal Units = 2 /c0 Not Optimal Units = 1 /c0 JBOD Export Policy = off /c0 Disk Spinup Policy = 1 /c0 Spinup Stagger Time Policy (sec) = 1 /c0 Auto-Carving Policy = off /c0 Auto-Carving Size = 2048 GB /c0 Auto-Rebuild Policy = on /c0 Rebuild Rate = 4 /c0 Verify Rate = 1 /c0 Controller Bus Type = PCI /c0 Controller Bus Width = 64 bits /c0 Controller Bus Speed = 66 Mhz Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy ------------------------------------------------------------------------------ u0 RAID-10 DEGRADED - - 64K 195.548 OFF ON u1 SPARE OK - - - 69.2404 - ON u2 SPARE OK - - - 69.2404 - OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 DEVICE-ERROR u0 69.25 GB 145226112 WD-WMAKE2379003 p1 OK u1 69.25 GB 145226112 WD-WMAKE2379069 p2 OK u0 69.25 GB 145226112 WD-WMAKE2379066 p3 OK u0 69.25 GB 145226112 WD-WMAKE2379012 p4 OK u0 69.25 GB 145226112 WD-WMAKE2379286 p5 OK u0 69.25 GB 145226112 WD-WMAKE2379019 p6 OK u0 69.25 GB 145226112 WD-WMAKE2394339 p7 OK u2 69.25 GB 145226112 WD-WMAKE2378696 Name OnlineState BBUReady Status Volt Temp Hours LastCapTest --------------------------------------------------------------------------- bbu On Yes OK OK OK 255 20-Nov-2017
I didn’t try to figure out how to start the rebuild, so I rebooted the server. Yeah, hackish.
If you know what I should have done, please let me know.
After the reboot
After the reboot:
[dan@supernews:~] $ uptime 10:21PM up 2 mins, 1 user, load averages: 1.06, 0.36, 0.14 [dan@supernews:~] $ sudsudo tw_cli info c0 u0 Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 REBUILDING 68% - - 64K 195.548 u0-0 RAID-1 REBUILDING 4% - - - - u0-0-0 DISK DEGRADED - - p1 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 OK - - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 OK - - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548
By the time I’d finished typing all of the above:
[dan@supernews:~] $ sudo tw_cli info c0 u0 Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 REBUILDING 78% - - 64K 195.548 u0-0 RAID-1 REBUILDING 35% - - - - u0-0-0 DISK DEGRADED - - p1 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 OK - - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 OK - - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548
It took about 30 minutes to complete the rebuild:
[dan@supernews:~] $ sudo tw_cli info c0 u0 Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 REBUILDING 78% - - 64K 195.548 u0-0 RAID-1 REBUILDING 35% - - - - u0-0-0 DISK DEGRADED - - p1 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 OK - - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 OK - - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548 [dan@supernews:~] $ uptime 10:47PM up 29 mins, 2 users, load averages: 0.41, 0.37, 0.35
I notice that /var/log/messages put it at about 27 minutes:
Aug 8 22:20:22 supernews kernel: twa0: INFO: (0x04: 0x000B): Rebuild started: unit=0, subunit=0 Aug 8 22:47:40 supernews kernel: twa0: INFO: (0x04: 0x0005): Rebuild completed: unit=0, subunit=0
But wait! There’s more!
When the above was completed, I went to Nagios and told it to recheck the faulted items. They came back clean, but a new one appeared: VERIFYING.
Checking the logs again, I found:
Aug 8 22:49:22 supernews kernel: twa0: INFO: (0x04: 0x0029): Verify started: unit=0, subunit=0 Aug 8 22:49:22 supernews kernel: twa0: INFO: (0x04: 0x0029): Verify started: unit=0, subunit=1 Aug 8 22:49:22 supernews kernel: twa0: INFO: (0x04: 0x0029): Verify started: unit=0, subunit=2
The current status is:
[dan@supernews:~] $ sudo tw_cli info c0 u0 Unit UnitType Status %RCmpl %V/I/M Port Stripe Size(GB) ------------------------------------------------------------------------ u0 RAID-10 VERIFYING - 15% - 64K 195.548 u0-0 RAID-1 VERIFYING 15% - - - - u0-0-0 DISK OK - - p1 - 65.1826 u0-0-1 DISK OK - - p2 - 65.1826 u0-1 RAID-1 VERIFYING 15% - - - - u0-1-0 DISK OK - - p6 - 65.1826 u0-1-1 DISK OK - - p5 - 65.1826 u0-2 RAID-1 VERIFYING 15% - - - - u0-2-0 DISK OK - - p3 - 65.1826 u0-2-1 DISK OK - - p4 - 65.1826 u0/v0 Volume - - - - - 195.548 [dan@supernews:~] $
I think this is similar to a zpool scrub.
Good night
Now it is time for beer and pizza. It’s Thursday night.