Today I noticed that all the drives in the R730 host had 150MB/s speeds mentioned in the boot messages. They are all SSDs on an SAS bus. They should be at 600.000MB/s.
The HBA in question is a AVAGO MegaRAID SAS FreeBSD (aka PERC H730P Mini). It can use either the mfi driver (default) or the mrsas driver (by setting hw.mfi.mrsas_enable=”1″ in /boot/loader.conf
This is an SSD in a PCIe slot:
ada1 at ahcich13 bus 0 scbus17 target 0 lun 0 ada1: <INTEL SSDSC2BB480G6K G201CS01> ACS-2 ATA SATA 3.x device ada1: Serial Number [redacted] ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes) ada1: Command Queueing enabled ada1: 457862MB (937703088 512 byte sectors)
This is a WD Blue SSD recently purchased:
da11 at mrsas0 bus 1 scbus1 target 11 lun 0 da11: <ATA WDC WDS400T2B0A 20WD> Fixed Direct Access SPC-4 SCSI device da11: Serial Number [redacted] da11: 150.000MB/s transfers da11: 3815447MB (7814037168 512 byte sectors)
This came to my attention while I was adding new SSDs into the host. I pulled an existing drive and reinserted it to see what happened. The console also mentioned 150.000MB/s transfers for that drive. So it wasn’t just my new drives. I think it’s important to note this was a running system. Here is that remove/insert:
Feb 24 18:15:37 r730-01 kernel: (da7:mrsas0:1:7:0): Invalidating pack Feb 24 18:15:37 r730-01 kernel: da7 at mrsas0 bus 1 scbus1 target 7 lun 0 Feb 24 18:15:37 r730-01 kernel: da7: <TOSHIBA PX05SMB080Y AS0B> s/n Y7P0A02DTEVE detached Feb 24 18:15:37 r730-01 kernel: (da7:mrsas0:1:7:0): Periph destroyed Feb 24 18:15:37 r730-01 ZFS[47352]: vdev state changed, pool_guid=801222743406015344 vdev_guid=2468393772196895325 Feb 24 18:15:37 r730-01 ZFS[47356]: vdev is removed, pool_guid=801222743406015344 vdev_guid=2468393772196895325 Feb 24 18:15:43 r730-01 kernel: mrsas0: System PD deleted target ID: 0x7 Feb 24 18:15:54 r730-01 kernel: mrsas0: System PD created target ID: 0x7 Feb 24 18:15:54 r730-01 kernel: da7 at mrsas0 bus 1 scbus1 target 7 lun 0 Feb 24 18:15:54 r730-01 kernel: da7: <TOSHIBA PX05SMB080Y AS0B> Fixed Direct Access SPC-4 SCSI device Feb 24 18:15:54 r730-01 kernel: da7: Serial Number Y7P0A02DTEVE Feb 24 18:15:54 r730-01 kernel: da7: 150.000MB/s transfers Feb 24 18:15:54 r730-01 kernel: da7: 763097MB (1562824368 512 byte sectors)
A diskinfo test showed it was getting 460 MB/s, which far exceeds the mrsas value.
I thought: what speeds go I get from the mfi driver? I commented out the hw.mfi.mrsas_enable=”1″ in /boot/loader.conf and booted into single user mode.
Single user mode didn’t help because I could not see/find the messages I wanted. I disabled jails via sysrc jail_enable=”NO” and restarted the host.
After getting connected via ssh, I found there were no such messages created by the mfi driver.
Looking back in the logs now, this is what I found:
Feb 24 19:31:07 r730-01 kernel: mfi0: <Invader> port 0x3000-0x30ff mem 0x92600000-0x9260ffff,0x92500000-0x925fffff irq 26 at device 0.0 numa-domain 0 on pci3 Feb 24 19:31:07 r730-01 kernel: mfi0: Using MSI Feb 24 19:31:07 r730-01 kernel: mfi0: Megaraid SAS driver Ver 4.23 Feb 24 19:31:07 r730-01 kernel: mfi0: FW MaxCmds = 928, limiting to 128 Feb 24 19:31:07 r730-01 kernel: mfi0: MaxCmd = 928, Drv MaxCmd = 128, MaxSgl = 70, state = 0xb73c03a0 Feb 24 19:31:07 r730-01 kernel: mfi0: 18285 (730582070s/0x0020/info) - Shutdown command received from host Feb 24 19:31:07 r730-01 kernel: mfi0: 18286 (boot + 10s/0x0020/info) - Firmware initialization started (PCI ID 005d/1000/1f47/1028) Feb 24 19:31:07 r730-01 kernel: mfi0: 18287 (boot + 10s/0x0020/info) - Firmware version 4.300.00-8368 Feb 24 19:31:07 r730-01 kernel: mfi0: 18288 (boot + 11s/0x0008/info) - Battery Present Feb 24 19:31:07 r730-01 kernel: mfi0: 18289 (boot + 11s/0x0020/info) - Package version 25.5.9.0001 Feb 24 19:31:07 r730-01 kernel: mfi0: 18290 (boot + 11s/0x0020/info) - Board Revision A03 Feb 24 19:31:07 r730-01 kernel: mfi0: 18291 (boot + 15s/0x0008/info) - Battery temperature is normal Feb 24 19:31:07 r730-01 kernel: mfi0: 18292 (boot + 15s/0x0008/info) - Current capacity of the battery is above threshold Feb 24 19:31:07 r730-01 kernel: mfi0: 18293 (boot + 27s/0x0004/info) - Enclosure PD 20(c None/p1) communication restored Feb 24 19:31:07 r730-01 kernel: mfi0: 18294 (boot + 27s/0x0002/info) - Inserted: Encl PD 20 Feb 24 19:31:07 r730-01 kernel: mfisyspd0 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd0: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 0) Feb 24 19:31:07 r730-01 kernel: mfisyspd0: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd1 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd1: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 1) Feb 24 19:31:07 r730-01 kernel: mfisyspd1: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd2 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd2: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 2) Feb 24 19:31:07 r730-01 kernel: mfisyspd2: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd3 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd3: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 3) Feb 24 19:31:07 r730-01 kernel: mfisyspd3: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd4 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd4: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 4) Feb 24 19:31:07 r730-01 kernel: mfisyspd4: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd5 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd5: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 5) Feb 24 19:31:07 r730-01 kernel: mfisyspd5: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd6 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd6: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 6) Feb 24 19:31:07 r730-01 kernel: mfisyspd6: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd7 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd7: 763097MB (1562824368 sectors) SYSPD volume (deviceid: 7) Feb 24 19:31:07 r730-01 kernel: mfisyspd7: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd8 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd8: 3815447MB (7814037168 sectors) SYSPD volume (deviceid: 8) Feb 24 19:31:07 r730-01 kernel: mfisyspd8: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd9 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd9: 3815447MB (7814037168 sectors) SYSPD volume (deviceid: 9) Feb 24 19:31:07 r730-01 kernel: mfisyspd9: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd10 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd10: 3815447MB (7814037168 sectors) SYSPD volume (deviceid: 10) Feb 24 19:31:07 r730-01 kernel: mfisyspd10: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd11 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd11: 3815447MB (7814037168 sectors) SYSPD volume (deviceid: 11) Feb 24 19:31:07 r730-01 kernel: mfisyspd11: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd12 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd12: 1831420MB (3750748848 sectors) SYSPD volume (deviceid: 14) Feb 24 19:31:07 r730-01 kernel: mfisyspd12: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfisyspd13 numa-domain 0 on mfi0 Feb 24 19:31:07 r730-01 kernel: mfisyspd13: 1831420MB (3750748848 sectors) SYSPD volume (deviceid: 15) Feb 24 19:31:07 r730-01 kernel: mfisyspd13: SYSPD volume attached Feb 24 19:31:07 r730-01 kernel: mfi0: 18295 (boot + 27s/0x0002/info) - Inserted: PD 20(c None/p1) Info: enclPd=20, scsiType=d, portMap=00, sasAddr=500056b32a7798fd,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18296 (boot + 27s/0x0002/info) - Inserted: PD 00(e0x20/s0) Feb 24 19:31:07 r730-01 kernel: mfi0: 18297 (boot + 27s/0x0002/info) - Inserted: PD 00(e0x20/s0) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5bfa,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18298 (boot + 27s/0x0002/info) - Inserted: PD 01(e0x20/s1) Feb 24 19:31:07 r730-01 kernel: mfi0: 18299 (boot + 27s/0x0002/info) - Inserted: PD 01(e0x20/s1) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5nvme0: Allocated 200MB host memory buffer Feb 24 19:31:07 r730-01 kernel: mfi0: 18300 (boot + 27s/0x0002/info) - Inserted: PD 02(e0x20/s2) Feb 24 19:31:07 r730-01 kernel: mfi0: 18301 (boot + 27s/0x0002/info) - Inserted: PD 02(e0x20/s2) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5d26,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18302 (boot + 27s/0x0002/info) - Inserted: PD 03(e0x20/s3) Feb 24 19:31:07 r730-01 kernel: mfi0: 18303 (boot + 27s/0x0002/info) - Inserted: PD 03(e0x20/s3) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5nvd0: 238475MB (488397168 512 byte sectors) Feb 24 19:31:07 r730-01 kernel: mfi0: 18304 (boot + 27s/0x0002/info) - Inserted: PD 04(e0x20/s4) Feb 24 19:31:07 r730-01 kernel: mfi0: 18305 (boot + 27s/0x0002/info) - Inserted: PD 04(e0x20/s4) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5c76,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18306 (boot + 27s/0x0002/info) - Inserted: PD 05(e0x20/s5) Feb 24 19:31:07 r730-01 kernel: mfi0: 18307 (boot + 27s/0x0002/info) - Inserted: PD 05(e0x20/s5) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5b1a,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18308 (boot + 27s/0x0002/info) - Inserted: PD 06(e0x20/s6) Feb 24 19:31:07 r730-01 kernel: mfi0: 18309 (boot + 27s/0x0002/info) - Inserted: PD 06(e0x20/s6) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5bbe,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18310 (boot + 27s/0x0002/info) - Inserted: PD 07(e0x20/s7) Feb 24 19:31:07 r730-01 kernel: mfi0: 18311 (boot + 27s/0x0002/info) - Inserted: PD 07(e0x20/s7) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=58ce38ee200f5b86,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18312 (boot + 27s/0x0002/info) - Inserted: PD 08(e0x20/s8) Feb 24 19:31:07 r730-01 kernel: mfi0: 18313 (boot + 27s/0x0002/info) - Inserted: PD 08(e0x20/s8) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500056b32a7798c8,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18314 (boot + 27s/0x0002/WARN) - PD 08(e0x20/s8) is not a certified drive Feb 24 19:31:07 r730-01 kernel: mfi0: 18315 (boot + 27s/0x0002/info) - Inserted: PD 09(e0x20/s9) Feb 24 19:31:07 r730-01 kernel: mfi0: 18316 (boot + 27s/0x0002/info) - Inserted: PD 09(e0x20/s9) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500056b32a7798c9,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18317 (boot + 27s/0x0002/WARN) - PD 09(e0x20/s9) is not a certified drive Feb 24 19:31:07 r730-01 kernel: mfi0: 18318 (boot + 27s/0x0002/info) - Inserted: PD 0a(e0x20/s10) Feb 24 19:31:07 r730-01 kernel: mfi0: 18319 (boot + 27s/0x0002/info) - Inserted: PD 0a(e0x20/s10) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500056b32a7798ca,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18320 (boot + 27s/0x0002/WARN) - PD 0a(e0x20/s10) is not a certified drive Feb 24 19:31:07 r730-01 kernel: mfi0: 18321 (boot + 27s/0x0002/info) - Inserted: PD 0b(e0x20/s11) Feb 24 19:31:07 r730-01 kernel: mfi0: 18322 (boot + 27s/0x0002/info) - Inserted: PD 0b(e0x20/s11) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500056b32a7798cb,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18323 (boot + 27s/0x0002/WARN) - PD 0b(e0x20/s11) is not a certified drive Feb 24 19:31:07 r730-01 kernel: mfi0: 18324 (boot + 27s/0x0002/info) - Inserted: PD 0e(e0x20/s14) Feb 24 19:31:07 r730-01 kernel: mfi0: 18325 (boot + 27s/0x0002/info) - Inserted: PD 0e(e0x20/s14) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500056b32a7798ce,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18326 (boot + 27s/0x0002/info) - Inserted: PD 0f(e0x20/s15) Feb 24 19:31:07 r730-01 kernel: mfi0: 18327 (boot + 27s/0x0002/info) - Inserted: PD 0f(e0x20/s15) Info: enclPd=20, scsiType=0, portMap=00, sasAddr=500056b32a7798cf,0000000000000000 Feb 24 19:31:07 r730-01 kernel: mfi0: 18328 (boot + 27s/0x0020/info) - Controller operating temperature within normal range, full operation restored Feb 24 19:31:07 r730-01 kernel: mfi0: 18329 (730582125s/0x0020/info) - Time established as 02/24/23 19:28:45; (49 seconds since power on) Feb 24 19:31:07 r730-01 kernel: mfi0: 18330 (730582233s/0x0020/info) - Host driver is loaded and operational Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed061eb0, status=0x3c, scsi_status=0
More ugently, these errors started right away:
Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed061eb0, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 127, sense_key 15, asc 72, ascq 139 Feb 24 19:31:07 r730-01 kernel: mfisyspd5: hard error cmd=read 574923824-574924375 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed061058, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 21, sense_key 12, asc 198, ascq 5 Feb 24 19:31:07 r730-01 kernel: mfisyspd5: hard error cmd=read 574924376-574924927 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed060db0, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 67, sense_key 0, asc 0, ascq 0 Feb 24 19:31:07 r730-01 kernel: mfisyspd5: hard error cmd=read 574924928-574925479 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed0610e0, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 96, sense_key 0, asc 0, ascq 244 Feb 24 19:31:07 r730-01 kernel: mfisyspd6: hard error cmd=read 574923888-574924439 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed060e38, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 0, sense_key 0, asc 246, ascq 255 Feb 24 19:31:07 r730-01 kernel: mfisyspd6: hard error cmd=read 574924440-574924991 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed05e5d8, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 64, sense_key 6, asc 96, ascq 0 Feb 24 19:31:07 r730-01 kernel: mfisyspd6: hard error cmd=read 574924992-574925543 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed062048, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 72, sense_key 4, asc 236, ascq 72 Feb 24 19:31:07 r730-01 kernel: mfisyspd0: hard error cmd=read 574923888-574924439 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed0619e8, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 112, sense_key 13, asc 192, ascq 72 Feb 24 19:31:07 r730-01 kernel: mfisyspd6: hard error cmd=read 1341126416-1341126967 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed05f298, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 36, sense_key 8, asc 131, ascq 236 Feb 24 19:31:07 r730-01 kernel: mfisyspd6: hard error cmd=read 1341127520-1341128071 Feb 24 19:31:07 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed05f320, status=0x3c, scsi_status=0 Feb 24 19:31:07 r730-01 kernel: mfi0: sense error 0, sense_key 8, asc 116, ascq 32 Feb 24 19:31:07 r730-01 kernel: mfisyspd5: hard error cmd=read 1341127536-1341128087 .. note the gap in time here... the full logs are shown. Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed060ec0, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 11, sense_key 4, asc 204, ascq 204 Feb 24 19:36:56 r730-01 kernel: mfisyspd2: hard error cmd=read 4423416-4423967 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed060068, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 72, sense_key 4, asc 36, ascq 48 Feb 24 19:36:56 r730-01 kernel: mfisyspd6: hard error cmd=read 4424304-4424855 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed061410, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 72, sense_key 4, asc 236, ascq 40 Feb 24 19:36:56 r730-01 kernel: mfisyspd1: hard error cmd=read 4424320-4424871 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed05ff58, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 64, sense_key 5, asc 0, ascq 0 Feb 24 19:36:56 r730-01 kernel: mfisyspd5: hard error cmd=read 4424344-4424895 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed05f298, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 36, sense_key 8, asc 131, ascq 236 Feb 24 19:36:56 r730-01 kernel: mfisyspd7: hard error cmd=read 4424336-4424887 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed0621e0, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 53, sense_key 15, asc 72, ascq 195 Feb 24 19:36:56 r730-01 kernel: mfisyspd3: hard error cmd=read 4424896-4425447 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed060d28, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 13, sense_key 3, asc 133, ascq 192 Feb 24 19:36:56 r730-01 kernel: mfisyspd3: hard error cmd=read 4426392-4426943 Feb 24 19:36:56 r730-01 kernel: mfi0: I/O error, cmd=0xfffffe02ed05fed0, status=0x3c, scsi_status=0 Feb 24 19:36:56 r730-01 kernel: mfi0: sense error 72, sense_key 12, asc 0, ascq 0 Feb 24 19:36:56 r730-01 kernel: mfisyspd3: hard error cmd=read 4426944-4427495 ...
For many pages…
At 19:32:52, I looked at zpool status:
[r730-01 dvl ~] % zpool status data01 pool: data01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: resilvered 147M in 00:00:02 with 0 errors on Fri Feb 24 19:30:55 2023 config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 2 0 0 gpt/Y7P0A022TEVE ONLINE 0 0 2 gpt/Y7P0A02ATEVE ONLINE 351 0 1 gpt/Y7P0A02DTEVE ONLINE 0 0 129 gpt/Y7P0A02GTEVE ONLINE 378 0 0 gpt/Y7P0A02LTEVE ONLINE 163 0 1 gpt/Y7P0A02MTEVE ONLINE 0 0 2 gpt/Y7P0A02QTEVE ONLINE 0 0 2 gpt/Y7P0A033TEVE ONLINE 0 0 2 errors: No known data errors
At the time, I thought was was because I pulled a drive from this zpool. I still think that.
At 19:32:57, 5 seconds later, I start a scrub:
[r730-01 dvl ~] % sudo zpool scrub data01
The errors
I tailed the log and saw man more vdev related messages. The 7000 lines or so are at https://gist.github.com/dlangille/9d783e40b3461de5c93d6379b5d5b767
It’s about this time that I noticed the errors you see above; they were scrolling past in my IPMI session I had used during single-user mode.
zpool status
At 19:38:01:
[r730-01 dvl ~] % zpool status pool: data01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub in progress since Fri Feb 24 19:33:38 2023 4.26T scanned at 16.4G/s, 40.1G issued at 154M/s, 4.26T total 674M repaired, 0.92% done, 07:57:18 to go config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 83.2K 0 0 gpt/Y7P0A022TEVE ONLINE 81.1K 0 2 (repairing) gpt/Y7P0A02ATEVE ONLINE 81.1K 0 1 (repairing) gpt/Y7P0A02DTEVE ONLINE 83.7K 0 129 (repairing) gpt/Y7P0A02GTEVE ONLINE 84.3K 0 0 (repairing) gpt/Y7P0A02LTEVE ONLINE 79.2K 0 1 (repairing) gpt/Y7P0A02MTEVE ONLINE 86.0K 0 2 (repairing) gpt/Y7P0A02QTEVE ONLINE 80.9K 0 2 (repairing) gpt/Y7P0A033TEVE ONLINE 80.5K 0 2 (repairing) errors: No known data errors
At 19:39:42 I stopped the scrub to reboot. I know scrubs will resume, but this is what I wanted to do.
[r730-01 dvl ~] % sudo zpool scrub -s data01
Before rebooting, I enabled mrsas again.
At 19:43:53, after the reboot
[r730-01 dvl ~] % sudo zpool scrub data01 [r730-01 dvl ~] % zpool status data01 pool: data01 state: ONLINE scan: scrub in progress since Fri Feb 24 19:44:05 2023 102G scanned at 12.8G/s, 3.59M issued at 459K/s, 4.26T total 0B repaired, 0.00% done, no estimated completion time config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/Y7P0A022TEVE ONLINE 0 0 0 gpt/Y7P0A02ATEVE ONLINE 0 0 0 gpt/Y7P0A02DTEVE ONLINE 0 0 0 gpt/Y7P0A02GTEVE ONLINE 0 0 0 gpt/Y7P0A02LTEVE ONLINE 0 0 0 gpt/Y7P0A02MTEVE ONLINE 0 0 0 gpt/Y7P0A02QTEVE ONLINE 0 0 0 gpt/Y7P0A033TEVE ONLINE 0 0 0 errors: No known data errors
Interesting. Now no errors.
Later, at 19:45:11:
[r730-01 dvl ~] % zpool status data01 pool: data01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub in progress since Fri Feb 24 19:44:05 2023 4.26T scanned at 11.7G/s, 249G issued at 683M/s, 4.26T total 81.3M repaired, 5.71% done, 01:42:36 to go config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/Y7P0A022TEVE ONLINE 0 0 2.30K (repairing) gpt/Y7P0A02ATEVE ONLINE 0 0 2.75K (repairing) gpt/Y7P0A02DTEVE ONLINE 0 0 2.57K (repairing) gpt/Y7P0A02GTEVE ONLINE 0 0 2.81K (repairing) gpt/Y7P0A02LTEVE ONLINE 0 0 2.24K (repairing) gpt/Y7P0A02MTEVE ONLINE 0 0 2.70K (repairing) gpt/Y7P0A02QTEVE ONLINE 0 0 2.08K (repairing) gpt/Y7P0A033TEVE ONLINE 0 0 2.16K (repairing)
By 20:21:55, there’s is less than a minute to go:
[r730-01 dvl ~] % zpool status data01 pool: data01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub in progress since Fri Feb 24 19:44:05 2023 4.26T scanned at 1.78G/s, 4.18T issued at 1.75G/s, 4.26T total 81.3M repaired, 98.21% done, 00:00:44 to go config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/Y7P0A022TEVE ONLINE 0 0 2.30K (repairing) gpt/Y7P0A02ATEVE ONLINE 0 0 2.75K (repairing) gpt/Y7P0A02DTEVE ONLINE 0 0 2.57K (repairing) gpt/Y7P0A02GTEVE ONLINE 0 0 2.81K (repairing) gpt/Y7P0A02LTEVE ONLINE 0 0 2.24K (repairing) gpt/Y7P0A02MTEVE ONLINE 0 0 2.70K (repairing) gpt/Y7P0A02QTEVE ONLINE 0 0 2.08K (repairing) gpt/Y7P0A033TEVE ONLINE 0 0 2.16K (repairing) errors: No known data errors
By 20:24:48, the scrub was done.
[r730-01 dvl ~] % zpool status data01 pool: data01 state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P scan: scrub repaired 81.3M in 00:41:29 with 0 errors on Fri Feb 24 20:25:34 2023 config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/Y7P0A022TEVE ONLINE 0 0 2.30K gpt/Y7P0A02ATEVE ONLINE 0 0 2.75K gpt/Y7P0A02DTEVE ONLINE 0 0 2.57K gpt/Y7P0A02GTEVE ONLINE 0 0 2.81K gpt/Y7P0A02LTEVE ONLINE 0 0 2.24K gpt/Y7P0A02MTEVE ONLINE 0 0 2.70K gpt/Y7P0A02QTEVE ONLINE 0 0 2.08K gpt/Y7P0A033TEVE ONLINE 0 0 2.16K errors: No known data errors [r730-01 dvl ~] %
As far as I can tell, this wasn’t hardware. The zpool is fine.
Please tell me you insights into this.