da12 in my FreeBSD zfs array disappeared at :02 and came back at :04

This post is based on a tweet and was created after a followup incident occurred today. The post was created to consolidate the information into a blog post so I could easily find it later.

Details about this host (disks, zpool, gpart, etc) are in this post.

On March 15 2022, I noticed these messages in /var/log/messages:

Mar 15 13:36:02 r720-01 kernel: mps1: mpssas_prepare_remove: Sending reset for target ID 22
Mar 15 13:36:02 r720-01 kernel: da13 at mps1 bus 0 scbus8 target 22 lun 0
Mar 15 13:36:02 r720-01 kernel: da13: <ATA Samsung SSD 850 3B6Q>  s/n S3PTNF0JA11513Y      detached
Mar 15 13:36:02 r720-01 kernel: mps1: No pending commands: starting remove_device
Mar 15 13:36:02 r720-01 kernel: mps1: Unfreezing devq for target ID 22
Mar 15 13:36:03 r720-01 kernel: (da13:mps1:0:22:0): Periph destroyed
Mar 15 13:36:03 r720-01 ZFS[76470]: vdev state changed, pool_guid=1975810868733347630 vdev_guid=11376585178559251170
Mar 15 13:36:03 r720-01 ZFS[76474]: vdev is removed, pool_guid=1975810868733347630 vdev_guid=11376585178559251170
Mar 15 13:36:04 r720-01 kernel: da13 at mps1 bus 0 scbus8 target 22 lun 0
Mar 15 13:36:04 r720-01 kernel: da13: <ATA Samsung SSD 850 3B6Q> Fixed Direct Access SPC-4 SCSI device
Mar 15 13:36:04 r720-01 kernel: da13: Serial Number S3PTNF0JA11513Y     
Mar 15 13:36:04 r720-01 kernel: da13: 600.000MB/s transfers
Mar 15 13:36:04 r720-01 kernel: da13: Command Queueing enabled
Mar 15 13:36:04 r720-01 kernel: da13: 476940MB (976773168 512 byte sectors)
Mar 15 13:36:04 r720-01 kernel: da13: quirks=0x8<4K>

The zpool array was degraded. I don’t have a record of that message.

The tweet also mentions:

kernel mps1: mpssas_prepare_remove: Sending reset for target ID 22

The tweet mentions (via this gist) “It is lovely how zfs status tells you want commands to run.” – so presumably it told me to do this (as also found in the gist):

[r720-01 dan ~] % sudo zpool online tank_fast da13p1
[r720-01 dan ~] % zpool status tank_fast
  pool: tank_fast
 state: ONLINE
  scan: resilvered 607M in 00:00:05 with 0 errors on Wed Mar 16 17:37:56 2022
config:

	NAME        STATE     READ WRITE CKSUM
	tank_fast   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da14p1  ONLINE       0     0     0
	    da13p1  ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da11p1  ONLINE       0     0     0
	    da12p1  ONLINE       0     0     0

errors: No known data errors
[r720-01 dan ~] %

Right after the above, this showed up in the logs:

Mar 16 17:37:51 r720-01 ZFS[64034]: vdev state changed, pool_guid=1975810868733347630 vdev_guid=11376585178559251170

Today, August 29 2022, I found this in /var/log/messages:

Aug 29 04:15:44 r720-01 kernel: mps1: IOC Fault 0x40007e23, Resetting
Aug 29 04:15:44 r720-01 kernel: mps1: Reinitializing controller
Aug 29 04:15:44 r720-01 kernel: mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 29 04:15:44 r720-01 kernel: mps1: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>

zpool status

The zpool status is fine:

[r720-01 dan ~] % zpool status                                                                                                                                     12:32:28
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 00:27:01 with 0 errors on Mon Aug 29 04:36:50 2022
config:

	NAME           STATE     READ WRITE CKSUM
	data01         ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    gpt/data1  ONLINE       0     0     0
	    gpt/data2  ONLINE       0     0     0
	  mirror-1     ONLINE       0     0     0
	    gpt/data3  ONLINE       0     0     0
	    gpt/data4  ONLINE       0     0     0
	  mirror-2     ONLINE       0     0     0
	    gpt/data5  ONLINE       0     0     0
	    gpt/data6  ONLINE       0     0     0
	  mirror-3     ONLINE       0     0     0
	    gpt/data7  ONLINE       0     0     0
	    gpt/data8  ONLINE       0     0     0

errors: No known data errors

  pool: tank_fast
 state: ONLINE
  scan: scrub repaired 0B in 00:10:26 with 0 errors on Mon Aug 29 04:20:44 2022
config:

	NAME        STATE     READ WRITE CKSUM
	tank_fast   ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    da14p1  ONLINE       0     0     0
	    da13p1  ONLINE       0     0     0
	  mirror-1  ONLINE       0     0     0
	    da11p1  ONLINE       0     0     0
	    da12p1  ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:03:05 with 0 errors on Mon Aug 29 04:13:26 2022
config:

	NAME          STATE     READ WRITE CKSUM
	zroot         ONLINE       0     0     0
	  mirror-0    ONLINE       0     0     0
	    gpt/zfs0  ONLINE       0     0     0
	    gpt/zfs1  ONLINE       0     0     0

errors: No known data errors
[r720-01 dan ~] %                                                                                                                                                  12:33:44