This post is based on a tweet and was created after a followup incident occurred today. The post was created to consolidate the information into a blog post so I could easily find it later.
Details about this host (disks, zpool, gpart, etc) are in this post.
On March 15 2022, I noticed these messages in /var/log/messages:
Mar 15 13:36:02 r720-01 kernel: mps1: mpssas_prepare_remove: Sending reset for target ID 22 Mar 15 13:36:02 r720-01 kernel: da13 at mps1 bus 0 scbus8 target 22 lun 0 Mar 15 13:36:02 r720-01 kernel: da13: <ATA Samsung SSD 850 3B6Q> s/n S3PTNF0JA11513Y detached Mar 15 13:36:02 r720-01 kernel: mps1: No pending commands: starting remove_device Mar 15 13:36:02 r720-01 kernel: mps1: Unfreezing devq for target ID 22 Mar 15 13:36:03 r720-01 kernel: (da13:mps1:0:22:0): Periph destroyed Mar 15 13:36:03 r720-01 ZFS[76470]: vdev state changed, pool_guid=1975810868733347630 vdev_guid=11376585178559251170 Mar 15 13:36:03 r720-01 ZFS[76474]: vdev is removed, pool_guid=1975810868733347630 vdev_guid=11376585178559251170 Mar 15 13:36:04 r720-01 kernel: da13 at mps1 bus 0 scbus8 target 22 lun 0 Mar 15 13:36:04 r720-01 kernel: da13: <ATA Samsung SSD 850 3B6Q> Fixed Direct Access SPC-4 SCSI device Mar 15 13:36:04 r720-01 kernel: da13: Serial Number S3PTNF0JA11513Y Mar 15 13:36:04 r720-01 kernel: da13: 600.000MB/s transfers Mar 15 13:36:04 r720-01 kernel: da13: Command Queueing enabled Mar 15 13:36:04 r720-01 kernel: da13: 476940MB (976773168 512 byte sectors) Mar 15 13:36:04 r720-01 kernel: da13: quirks=0x8<4K>
The zpool array was degraded. I don’t have a record of that message.
The tweet also mentions:
kernel mps1: mpssas_prepare_remove: Sending reset for target ID 22
The tweet mentions (via this gist) “It is lovely how zfs status tells you want commands to run.” – so presumably it told me to do this (as also found in the gist):
[r720-01 dan ~] % sudo zpool online tank_fast da13p1 [r720-01 dan ~] % zpool status tank_fast pool: tank_fast state: ONLINE scan: resilvered 607M in 00:00:05 with 0 errors on Wed Mar 16 17:37:56 2022 config: NAME STATE READ WRITE CKSUM tank_fast ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da14p1 ONLINE 0 0 0 da13p1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da11p1 ONLINE 0 0 0 da12p1 ONLINE 0 0 0 errors: No known data errors [r720-01 dan ~] %
Right after the above, this showed up in the logs:
Mar 16 17:37:51 r720-01 ZFS[64034]: vdev state changed, pool_guid=1975810868733347630 vdev_guid=11376585178559251170
Today, August 29 2022, I found this in /var/log/messages:
Aug 29 04:15:44 r720-01 kernel: mps1: IOC Fault 0x40007e23, Resetting Aug 29 04:15:44 r720-01 kernel: mps1: Reinitializing controller Aug 29 04:15:44 r720-01 kernel: mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Aug 29 04:15:44 r720-01 kernel: mps1: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
zpool status
The zpool status is fine:
[r720-01 dan ~] % zpool status 12:32:28 pool: data01 state: ONLINE scan: scrub repaired 0B in 00:27:01 with 0 errors on Mon Aug 29 04:36:50 2022 config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/data1 ONLINE 0 0 0 gpt/data2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/data3 ONLINE 0 0 0 gpt/data4 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/data5 ONLINE 0 0 0 gpt/data6 ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gpt/data7 ONLINE 0 0 0 gpt/data8 ONLINE 0 0 0 errors: No known data errors pool: tank_fast state: ONLINE scan: scrub repaired 0B in 00:10:26 with 0 errors on Mon Aug 29 04:20:44 2022 config: NAME STATE READ WRITE CKSUM tank_fast ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da14p1 ONLINE 0 0 0 da13p1 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 da11p1 ONLINE 0 0 0 da12p1 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0B in 00:03:05 with 0 errors on Mon Aug 29 04:13:26 2022 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/zfs0 ONLINE 0 0 0 gpt/zfs1 ONLINE 0 0 0 errors: No known data errors [r720-01 dan ~] % 12:33:44