This morning I noticed this in the logs after doing some pkg upgrade. I was mainly updating openvpn, but in that operation, fail2ban was removed (because I went from python312 to python314). I noticed it missing on one host:
Can't exec "/usr/local/bin/fail2ban-client": No such file or directory at /usr/local/etc/snmp/fail2ban line 116.
I ran this grep to verify fail2ban had been removed from another host:
[12:31 r730-01 dvl ~] % grep fail /var/log/messages Jun 30 15:38:58 r730-01 upsmon[2715]: Poll UPS [ups04@gw01.int.unixathome.org] failed - Driver not connected Jun 30 15:39:03 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Connection refused Jun 30 15:39:28 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 15:40:18 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 15:40:43 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 15:40:53 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Connection refused Jun 30 21:07:06 r730-01 upsmon[2715]: Poll UPS [ups04@gw01.int.unixathome.org] failed - Driver not connected Jun 30 21:07:06 r730-01 kernel: Jun 30 21:07:06 r730-01 upsmon[2715]: Poll UPS [ups04@gw01.int.unixathome.org] failed - Driver not connected Jun 30 21:07:23 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 21:07:23 r730-01 kernel: Jun 30 21:07:23 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 21:07:50 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 21:08:44 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Operation timed out Jun 30 21:08:51 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Connection refused Jun 30 21:08:58 r730-01 upsmon[2715]: UPS [ups04@gw01.int.unixathome.org]: connect failed: Connection failure: Connection refused Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing queued i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o
Well, this host doesn’t run fail2ban, but those messages are “interesting”.
How much is nvme mentioned in the logs?
[12:31 r730-01 dvl ~] % grep nvme3 /var/log/messages Jun 29 18:02:51 r730-01 kernel: nvme3:mem 0x92000000-0x92003fff at device 0.0 numa-domain 0 on pci7 Jun 29 18:02:51 r730-01 kernel: nda3 at nvme3 bus 0 scbus19 target 0 lun 1 Jun 30 21:11:44 r730-01 kernel: nvme3: mem 0x92000000-0x92003fff at device 0.0 numa-domain 0 on pci7 Jun 30 21:11:44 r730-01 kernel: nda3 at nvme3 bus 0 scbus19 target 0 lun 1 Jul 2 04:21:30 r730-01 kernel: nvme3: Resetting controller due to a timeout. Jul 2 04:21:30 r730-01 kernel: nvme3: event="start" Jul 2 04:21:30 r730-01 kernel: nvme3: Waiting for reset to complete Jul 2 04:21:50 r730-01 kernel: nvme3: Waiting for reset to complete Jul 2 04:21:50 r730-01 kernel: nvme3: controller ready did not become 0 within 20500 ms Jul 2 04:21:50 r730-01 kernel: nvme3: event="timed_out" Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:1 cid:126 nsid:1 lba:5816680880 len:8 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:1 cid:126 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:1 cid:125 nsid:1 lba:5819691576 len:8 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:1 cid:125 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=5ab381b0 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:1 cid:119 nsid:1 lba:7155944008 len:8 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 5, Retries exhausted Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=5ae17238 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:1 cid:119 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 5, Retries exhausted Jul 2 04:21:50 r730-01 kernel: nvme3: failing queued i/o Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:2 cid:0 nsid:1 lba:3948824896 len:8 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:2 cid:0 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=aa870a48 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 5, Retries exhausted Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=eb5e4940 0 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 5, Retries exhausted Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:2 cid:121 nsid:1 lba:5969482320 len:8 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:2 cid:121 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nda3 at nvme3 bus 0 scbus19 target 0 lun 1 Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:3 cid:125 nsid:1 lba:2128907408 len:1360 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:3 cid:125 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: WRITE (01) sqid:3 cid:116 nsid:1 lba:4236260504 len:16 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:3 cid:116 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=63cf1250 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:3 cid:121 nsid:1 lba:3846619816 len:8 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee48c90 0 54f 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): WRITE (01). NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0 cdw=fc803498 0 f 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:3 cid:121 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=e546c2a8 0 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:4 cid:112 nsid:1 lba:923353176 len:8 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:4 cid:112 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:4 cid:115 nsid:1 lba:7403218720 len:8 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=37094058 0 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:4 cid:115 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=b9442720 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:6 cid:123 nsid:1 lba:4092436328 len:8 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:6 cid:123 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:6 cid:117 nsid:1 lba:4119853392 len:8 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=f3ed9f68 0 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:6 cid:117 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=f58ff950 0 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: FLUSH (00) sqid:11 cid:123 nsid:1 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:11 cid:123 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): FLUSH (00). NCB: opc=0 fuse=0 nsid=1 prp1=0 prp2=0 cdw=0 0 0 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:14 cid:119 nsid:1 lba:2128908768 len:2048 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:14 cid:119 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:14 cid:121 nsid:1 lba:2128910816 len:2048 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:14 cid:121 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:14 cid:123 nsid:1 lba:2128912864 len:1368 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:14 cid:123 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:14 cid:122 nsid:1 lba:2128914232 len:2048 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee491e0 0 7ff 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:14 cid:122 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee499e0 0 7ff 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee4a1e0 0 557 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:14 cid:126 nsid:1 lba:2128916280 len:2048 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:14 cid:126 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee4a738 0 7ff 0 0 0 Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:14 cid:125 nsid:1 lba:2128918328 len:1368 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:14 cid:125 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:15 cid:121 nsid:1 lba:7603680584 len:8 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:15 cid:121 cdw0:0 Jul 2 04:21:50 r730-01 kernel: nvme3: failing outstanding i/o Jul 2 04:21:50 r730-01 kernel: nvme3: READ (02) sqid:16 cid:126 nsid:1 lba:4406184968 len:8 Jul 2 04:21:50 r730-01 kernel: nvme3: ABORTED_BY_REQUEST (00/07) crd:0 m:0 dnr:1 p:0 sqid:16 cid:126 cdw0:0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee4af38 0 7ff 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=7ee4b738 0 557 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=c536f548 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): READ (02). NCB: opc=2 fuse=0 nsid=1 prp1=0 prp2=0 cdw=6a10c08 1 7 0 0 0 Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): CAM status: NVME Status Error Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): NVMe status: ABORTED_BY_REQUEST (00/07) DNR Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Error 6, Periph was invalidated Jul 2 04:21:50 r730-01 kernel: nvme3: Failed controller, stopping watchdog timeout. Jul 2 04:21:50 r730-01 kernel: (nda3:nvme3:0:0:1): Periph destroyed Jul 2 04:21:50 r730-01 kernel: nvme3: Failed controller, stopping watchdog timeout.
Well, that’s a lot. How’s the status?
[12:31 r730-01 dvl ~] % zpool status pool: data01 state: ONLINE scan: scrub repaired 0B in 00:00:07 with 0 errors on Thu Jul 2 03:48:55 2026 config: NAME STATE READ WRITE CKSUM data01 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/Y7P0A022TEVE ONLINE 0 0 0 gpt/Y7P0A02ATEVE ONLINE 0 0 0 gpt/Y7P0A02DTEVE ONLINE 0 0 0 gpt/Y7P0A02GTEVE ONLINE 0 0 0 gpt/Y7P0A02LTEVE ONLINE 0 0 0 gpt/Y7P0A02MTEVE ONLINE 0 0 0 gpt/Y7P0A02QTEVE ONLINE 0 0 0 gpt/Y7P0A033TEVE ONLINE 0 0 0 errors: No known data errors pool: data02 state: ONLINE scan: scrub repaired 0B in 00:03:59 with 0 errors on Thu Jul 2 03:52:59 2026 config: NAME STATE READ WRITE CKSUM data02 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/S6WSNJ0T208743F ONLINE 0 0 0 gpt/S6WSNJ0T207774T ONLINE 0 0 0 errors: No known data errors pool: data03 state: ONLINE scan: scrub repaired 0B in 01:16:19 with 0 errors on Thu Jul 2 05:05:31 2026 config: NAME STATE READ WRITE CKSUM data03 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/WD_22492H800867 ONLINE 0 0 0 gpt/WD_230151801284 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/WD_230151801478 ONLINE 0 0 0 gpt/WD_230151800473 ONLINE 0 0 0 errors: No known data errors pool: data04 state: DEGRADED status: One or more devices have been removed. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0B in 01:11:17 with 0 errors on Thu Jul 2 05:00:37 2026 config: NAME STATE READ WRITE CKSUM data04 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gpt/S7KGNU0Y722875X ONLINE 0 0 0 gpt/S7KGNU0Y915666E ONLINE 0 0 0 gpt/S7KGNU0Y912937J ONLINE 0 0 0 gpt/S7KGNU0Y912955D REMOVED 0 0 0 gpt/S7U8NJ0Y716854P ONLINE 0 0 0 gpt/S7U8NJ0Y716801F ONLINE 0 0 0 gpt/S757NS0Y700758M ONLINE 0 0 0 gpt/S757NS0Y700760R ONLINE 0 0 0 errors: No known data errors pool: zroot state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0B in 00:00:53 with 0 errors on Thu Jul 2 03:50:16 2026 config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/zfs0_20170718AA0000185556 ONLINE 0 0 0 gpt/zfs1_20170719AA1178164201 ONLINE 0 0 0 errors: No known data errors
Then I checked Nagios – it had found the same issue. I hadn’t check Nagios before today. Oh oh.
Let’s look at that device:
[12:47 r730-01 dvl ~] % sudo nvmecontrol identify nvme3 nvmecontrol: Identify request failed [12:47 r730-01 dvl ~] %
I went to LibreNMS to see if there was any trending information about that drive. It was not found. I suspect when it dropped out, LibreNMS also dropped it. If that’s the case, that’s not helpful.
Let’s try a reboot.
After a reboot, that drive (S7KGNU0Y912955D) was not found. My next idea: open up the case and reseat that device.
I’m hoping that device is not dead. It went into service 7 months ago and priced have jumped more than slightly lately.
When I checked another device:
[13:16 r730-01 dvl ~] % sudo smartctl -a /dev/nvme4 smartctl 7.5 2025-04-30 r5714 [FreeBSD 15.0-RELEASE-p11 amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 990 EVO Plus 4TB ... Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 1% Data Units Read: 76,836,586 [39.3 TB] Data Units Written: 47,333,046 [24.2 TB] Host Read Commands: 2,617,490,888 Host Write Commands: 1,200,934,619 Controller Busy Time: 8,225 Power Cycles: 23 Power On Hours: 6,562 ...
That usage level is not outrageous. All units in this zpool should be more-or-less identically used.
Drive is not dead
I powered off the host, and pulled out the ASUS Hyper M.2 X16 Gen 4 card. I move the NVMe card in question to a portable carrier. I hooked that up to my Macbook. It was identified as a “Samsung SSD 990 PRO 4TB” – that tells me it’s not completely dead.
bsdimp suggested I hook that up to a FreeBSD box.
While monitoring /var/log/messages, I did just not. Nothing. :(
I tried another USB port; nothing. I then tried a USB port on the back of the host:
Jul 2 15:02:59 r730-03 kernel: da8 at umass-sim1 bus 1 scbus19 target 0 lun 0 Jul 2 15:02:59 r730-03 kernel: da8: <Samsung SSD 990 PRO 4TB 1.00> Fixed Direct Access SPC-4 SCSI device Jul 2 15:02:59 r730-03 kernel: da8: Serial Number 01293805127E Jul 2 15:02:59 r730-03 kernel: da8: 40.000MB/s transfers Jul 2 15:02:59 r730-03 kernel: da8: 3815447MB (7814037168 512 byte sectors) Jul 2 15:02:59 r730-03 kernel: da8: quirks=0x2<NO_6_BYTE>
That give me hope. As does this:
[15:03 r730-03 dvl ~] % sudo smartctl -a /dev/da8 smartctl 7.5 2025-04-30 r5714 [FreeBSD 15.0-RELEASE-p11 amd64] (local build) Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Number: Samsung SSD 990 PRO 4TB Serial Number: S7KGNU0Y912955D Firmware Version: 4B2QJXD7 PCI Vendor/Subsystem ID: 0x144d IEEE OUI Identifier: 0x002538 Total NVM Capacity: 4,000,787,030,016 [4.00 TB] Unallocated NVM Capacity: 0 Controller ID: 1 NVMe Version: 2.0 Number of Namespaces: 1 Namespace 1 Size/Capacity: 4,000,787,030,016 [4.00 TB] Namespace 1 Utilization: 3,057,326,026,752 [3.05 TB] Namespace 1 Formatted LBA Size: 512 Namespace 1 IEEE EUI-64: 002538 4951a0eec6 Local Time is: Thu Jul 2 15:03:56 2026 UTC Firmware Updates (0x16): 3 Slots, no Reset required Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test Optional NVM Commands (0x0055): Comp DS_Mngmt Sav/Sel_Feat Timestmp Log Page Attributes (0x2f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Log0_FISE_MI Maximum Data Transfer Size: 512 Pages Warning Comp. Temp. Threshold: 82 Celsius Critical Comp. Temp. Threshold: 85 Celsius Supported Power States St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat 0 + 9.39W - - 0 0 0 0 0 0 1 + 9.39W - - 1 1 1 1 0 0 2 + 9.39W - - 2 2 2 2 0 0 3 - 0.0400W - - 3 3 3 3 4200 2700 4 - 0.0050W - - 4 4 4 4 500 21800 Supported LBA Sizes (NSID 0x1) Id Fmt Data Metadt Rel_Perf 0 + 512 0 0 === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 34 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 56,743,286 [29.0 TB] Data Units Written: 11,391,387 [5.83 TB] Host Read Commands: 2,383,774,544 Host Write Commands: 425,045,375 Controller Busy Time: 802 Power Cycles: 21 Power On Hours: 5,614 Unsafe Shutdowns: 10 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Warning Comp. Temperature Time: 0 Critical Comp. Temperature Time: 0 Temperature Sensor 1: 34 Celsius Temperature Sensor 2: 36 Celsius Warning: NVMe Get Log truncated to 0x200 bytes, 0x200 bytes zero filled Error Information (NVMe Log 0x01, 16 of 64 entries) No Errors Logged Warning: NVMe Get Log truncated to 0x200 bytes, 0x034 bytes zero filled Self-test Log (NVMe Log 0x06, NSID 0xffffffff) Self-test status: No self-test in progress No Self-tests Logged
Back into the box
I disconnected that mobile carrier from the FreeBSD USB port. I installed it back onto the PCIe card, swapping it with another device. It was in the slot farthest from the fan. Now it’s one slow closer to the fan.
I booted up the host. And I see:
[15:29 r730-01 dvl ~] % zpool status data04 pool: data04 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Jul 2 15:28:49 2026 1.73T / 9.30T scanned, 10.9G / 7.58T issued at 1.81G/s 1.84G resilvered, 0.14% done, 01:11:23 to go config: NAME STATE READ WRITE CKSUM data04 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/S7KGNU0Y722875X ONLINE 0 0 0 gpt/S7KGNU0Y915666E ONLINE 0 0 0 gpt/S7KGNU0Y912937J ONLINE 0 0 0 gpt/S7KGNU0Y912955D ONLINE 0 0 2 (resilvering) gpt/S7U8NJ0Y716854P ONLINE 0 0 0 gpt/S7U8NJ0Y716801F ONLINE 0 0 0 gpt/S757NS0Y700758M ONLINE 0 0 0 gpt/S757NS0Y700760R ONLINE 0 0 0 errors: No known data errors
This is as good as can be expected. :)











