Here I am, sitting on a beach, writing a blog post, and sipping a cool adult beverage. Reading email.
I see this:
Aug 14 08:49:25 knew kernel: mps0: IOC Fault 0x40007e23, Resetting Aug 14 08:49:25 knew kernel: mps0: Reinitializing controller Aug 14 08:49:25 knew kernel: mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd Aug 14 08:49:25 knew kernel: mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc> Aug 14 08:49:25 knew kernel: (da10:mps0:0:20:0): Invalidating pack Aug 14 08:49:25 knew kernel: da10 at mps0 bus 0 scbus0 target 20 lun 0 Aug 14 08:49:25 knew kernel: da10: <ATA TOSHIBA HDWE150 FP2A> s/n 4728K24SF57D detached Aug 14 08:49:25 knew kernel: GEOM_MIRROR: Device swap: provider da10p2 disconnected. Aug 14 08:49:25 knew ZFS[2544]: vdev I/O failure, zpool=system path=/dev/da10p3 offset=270336 size=8192 error=6 Aug 14 08:49:33 knew kernel: (da10:mps0:0:20:0): Periph destroyed Aug 14 08:49:33 knew ZFS[2576]: vdev state changed, pool_guid=15378250086669402288 vdev_guid=233954150417046622 Aug 14 08:49:33 knew ZFS[2580]: vdev is removed, pool_guid=15378250086669402288 vdev_guid=233954150417046622 Aug 14 08:49:33 knew kernel: da10 at mps0 bus 0 scbus0 target 20 lun 0 Aug 14 08:49:33 knew kernel: da10: <ATA TOSHIBA HDWE150 FP2A> Fixed Direct Access SPC-4 SCSI device Aug 14 08:49:33 knew kernel: da10: Serial Number 4728K24SF57D Aug 14 08:49:33 knew kernel: da10: 600.000MB/s transfers Aug 14 08:49:33 knew kernel: da10: Command Queueing enabled Aug 14 08:49:33 knew kernel: da10: 4769307MB (9767541168 512 byte sectors) Aug 14 08:49:34 knew ZFS[2623]: vdev state changed, pool_guid=15378250086669402288 vdev_guid=233954150417046622
I quickly ssh into the host to check zpool status:
[knew dan ~] % zpool status pool: nvd state: ONLINE scan: scrub repaired 0B in 00:08:55 with 0 errors on Wed Aug 10 05:09:42 2022 config: NAME STATE READ WRITE CKSUM nvd ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 nvd0p1 ONLINE 0 0 0 nvd1p1 ONLINE 0 0 0 errors: No known data errors pool: system state: ONLINE scan: resilvered 56K in 00:00:05 with 0 errors on Sun Aug 14 08:49:48 2022 config: NAME STATE READ WRITE CKSUM system ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 da3p3 ONLINE 0 0 0 da10p3 ONLINE 0 0 0 da9p3 ONLINE 0 0 0 da2p3 ONLINE 0 0 0 da13p3 ONLINE 0 0 0 da15p3 ONLINE 0 0 0 da11p3 ONLINE 0 0 0 da14p3 ONLINE 0 0 0 da8p3 ONLINE 0 0 0 da7p3 ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 da5p1 ONLINE 0 0 0 da6p1 ONLINE 0 0 0 da19p1 ONLINE 0 0 0 da12p1 ONLINE 0 0 0 da4p1 ONLINE 0 0 0 da1p1 ONLINE 0 0 0 da22p1 ONLINE 0 0 0 da16p1 ONLINE 0 0 0 da0p1 ONLINE 0 0 0 da18p1 ONLINE 0 0 0 errors: No known data errors pool: tank_fast01 state: ONLINE scan: scrub repaired 0B in 00:09:00 with 0 errors on Wed Aug 10 05:10:05 2022 config: NAME STATE READ WRITE CKSUM tank_fast01 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/S3Z8NB0KB11776R.Slot.11 ONLINE 0 0 0 gpt/S3Z8NB0KB11784L.Slot.05 ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE scan: scrub repaired 0B in 00:00:37 with 0 errors on Wed Aug 10 05:01:43 2022 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada1p3 ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 errors: No known data errors
Lines 15-17 are relevant. There was a resilver event, which completed at 08:49:48
The vdev state changed event occurred at 08:49:34
That all seems to tie in, time-wise.
More info than you want
This displays vdev guids:
[knew dan ~] % zpool status -g system pool: system state: ONLINE scan: resilvered 56K in 00:00:05 with 0 errors on Sun Aug 14 08:49:48 2022 config: NAME STATE READ WRITE CKSUM system ONLINE 0 0 0 17787792673755622491 ONLINE 0 0 0 15077920823230281604 ONLINE 0 0 0 233954150417046622 ONLINE 0 0 0 15344441343903378304 ONLINE 0 0 0 15656418522176711912 ONLINE 0 0 0 5265466717725104926 ONLINE 0 0 0 16216204204940261481 ONLINE 0 0 0 15196254092467021631 ONLINE 0 0 0 892331977375855894 ONLINE 0 0 0 16797368702065798832 ONLINE 0 0 0 5993655369518912555 ONLINE 0 0 0 9085889268805187753 ONLINE 0 0 0 5892227802261634203 ONLINE 0 0 0 9332658639709199239 ONLINE 0 0 0 250004220145174872 ONLINE 0 0 0 6216472763074854678 ONLINE 0 0 0 12795310201775582855 ONLINE 0 0 0 13315402097660581553 ONLINE 0 0 0 18428760864140250121 ONLINE 0 0 0 13603535286907309607 ONLINE 0 0 0 4677401754715191854 ONLINE 0 0 0 1933292688604201684 ONLINE 0 0 0 errors: No known data errors [knew dan ~] %
Line 11 shows the same vdev guid as the log entries.
Here is the zpool guid:
[knew dan ~] % zpool get guid system NAME PROPERTY VALUE SOURCE system guid 15378250086669402288 - [knew dan ~] %
That matches the pool_guid from the logs.
My concern: why did this happen? Everything recovered just fine. But why?