Finding out more about nvme on FreeBSD

Recently I’ve been playing with NVMe to find out more about monitoring for wear.

Tried nvme-cli:

[17:49 r730-01 dvl ~] % nvme list
Failed to scan topology: No such file or directory

Seems it is a known problem.

Went with this instead:

[17:52 r730-01 dvl ~] % sudo nvmecontrol devlist
 nvme0: Samsung SSD 980 PRO with Heatsink 1TB
    nvme0ns1 (953869MB)
 nvme1: Samsung SSD 980 PRO with Heatsink 1TB
    nvme1ns1 (953869MB)

With more information

[17:53 r730-01 dvl ~] % sudo nvmecontrol identify nvme0ns1
Size:                        1953525168 blocks
Capacity:                    1953525168 blocks
Utilization:                 1797757632 blocks
Thin Provisioning:           Not Supported
Number of LBA Formats:       1
Current LBA Format:          LBA Format #00
Metadata Capabilities
  Extended:                  Not Supported
  Separate:                  Not Supported
Data Protection Caps:        Not Supported
Data Protection Settings:    Not Enabled
Multi-Path I/O Capabilities: Not Supported
Reservation Capabilities:    Not Supported
Format Progress Indicator:   0% remains
Deallocate Logical Block:    Read 00h
Optimal I/O Boundary:        0 blocks
NVM Capacity:                1000204886016 bytes
Globally Unique Identifier:  00000000000000000000000000000000
IEEE EUI64:                  002538b22140998d
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Best

[17:53 r730-01 dvl ~] % sudo nvmecontrol identify nvme1ns1
Size:                        1953525168 blocks
Capacity:                    1953525168 blocks
Utilization:                 1797629704 blocks
Thin Provisioning:           Not Supported
Number of LBA Formats:       1
Current LBA Format:          LBA Format #00
Metadata Capabilities
  Extended:                  Not Supported
  Separate:                  Not Supported
Data Protection Caps:        Not Supported
Data Protection Settings:    Not Enabled
Multi-Path I/O Capabilities: Not Supported
Reservation Capabilities:    Not Supported
Format Progress Indicator:   0% remains
Deallocate Logical Block:    Read 00h
Optimal I/O Boundary:        0 blocks
NVM Capacity:                1000204886016 bytes
Globally Unique Identifier:  00000000000000000000000000000000
IEEE EUI64:                  002538b221409d56
LBA Format #00: Data Size:   512  Metadata Size:     0  Performance: Best

Next, I got this from https://bsd.network/web/@normis@g.dodies.lv/115266285819611598:

[11:26 r730-01 dvl ~] % sudo nvmecontrol logpage -p 2 nvme0
SMART/Health Information Log
============================
Critical Warning State:         0x00
 Available spare:               0
 Temperature:                   0
 Device reliability:            0
 Read only:                     0
 Volatile memory backup:        0
Temperature:                    314 K, 40.85 C, 105.53 F
Available spare:                100
Available spare threshold:      10
Percentage used:                0
Data units (512,000 byte) read: 10156699
Data units written:             7954064
Host read commands:             65865260
Host write commands:            154530656
Controller busy time (minutes): 162
Power cycles:                   37
Power on hours:                 19976
Unsafe shutdowns:               14
Media errors:                   0
No. error info log entries:     0
Warning Temp Composite Time:    0
Error Temp Composite Time:      0
Temperature Sensor 1:           314 K, 40.85 C, 105.53 F
Temperature Sensor 2:           318 K, 44.85 C, 112.73 F
Temperature 1 Transition Count: 0
Temperature 2 Transition Count: 0
Total Time For Temperature 1:   0
Total Time For Temperature 2:   0

And based on https://bsd.network/web/@feld@friedcheese.us/115266300728028173 we have SMART:

[11:26 r730-01 dvl ~] % sudo smartctl -a /dev/nvme0        
smartctl 7.5 2025-04-30 r5714 [FreeBSD 14.3-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 980 PRO with Heatsink 1TB
Serial Number:                      S6DVLJ0T207774T
Firmware Version:                   4B2QGXA7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            920,451,907,584 [920 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 b22140998d
Local Time is:                      Fri Sep 26 11:29:27 2025 UTC
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.49W       -        -    0  0  0  0        0       0
 1 +     4.48W       -        -    1  1  1  1        0     200
 2 +     3.18W       -        -    2  2  2  2        0    1000
 3 -   0.0400W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    10,156,699 [5.20 TB]
Data Units Written:                 7,954,064 [4.07 TB]
Host Read Commands:                 65,865,260
Host Write Commands:                154,530,656
Controller Busy Time:               162
Power Cycles:                       37
Power On Hours:                     19,976
Unsafe Shutdowns:                   14
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               41 Celsius
Temperature Sensor 2:               45 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
No Self-tests Logged

We have numbers here, courtesy of https://bsd.network/web/@TomAoki@bsd.cafe/115266335070733088

[11:33 r730-01 dvl ~] % sudo nvmecontrol logpage -p 2 nvme0 | fgrep written
Data units written:             7954064
[11:33 r730-01 dvl ~] % sudo nvmecontrol logpage -p 2 nvme0 | fgrep unit   
Data units (512,000 byte) read: 10156699
Data units written:             7954064

From https://bsd.network/web/@wollman@mastodon.social/115267385196485390 we have:

[11:44 r730-01 dvl ~] % sudo nvmecontrol logpage -p 2 nvme0 | grep 'Available spare' 
 Available spare:               0
Available spare:                100
Available spare threshold:      10

Following on from that:

The second one is the one to monitor. When it gets below the 3rd value, replace the drive.

re: https://bsd.network/web/@wollman@mastodon.social/115271384378052392

That make me think it’s an easy Nagios check to write.