Way too many snapshots

In this post

In this post, we have:

  • FreeBSD 14.1-RELEASE-p5
  • r730-03
  • Lots of boring repetitive sections, so skip over that to find what you need

This article was written over a couple of days.

The zpool in question is 3 pairs of 12TB HDD:

[13:44 r730-03 dvl ~/tmp] % zpool status data01
  pool: data01
 state: ONLINE
  scan: scrub in progress since Fri Jan 17 05:29:21 2025
	22.1T / 26.1T scanned at 200M/s, 19.2T / 25.9T issued at 173M/s
	0B repaired, 74.06% done, 11:17:54 to go
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

Yes, that’s a scrub going on….

Background

The other day during the ZFS users meeting, numbers of snapshots was mentioned. I found about 61,000 snapshots on one host.

What I didn’t know was that 58,000 snapshots were one on filesystem:

[20:23 r730-03 dvl ~] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/HA.snapshots 
[20:26 r730-03 dvl ~] % wc -l ~/tmp/HA.snapshots                                                 
   57850 /home/dvl/tmp/HA.snapshots

How old are they?

[20:30 r730-03 dvl ~] % head -4 ~/tmp/HA.snapshots
NAME                                                                             USED  AVAIL  REFER  MOUNTPOINT
data01/snapshots/homeassistant-r730-01@autosnap_2023-05-01_00:00:10_monthly     6.34G      -  20.8G  -
data01/snapshots/homeassistant-r730-01@autosnap_2023-06-01_00:00:00_monthly     3.39G      -  20.3G  -
data01/snapshots/homeassistant-r730-01@autosnap_2023-07-01_00:00:08_monthly     2.89G      -  19.2G  -
[20:30 r730-03 dvl ~] % tail -4 ~/tmp/HA.snapshots
data01/snapshots/homeassistant-r730-01@autosnap_2025-01-17_19:30:07_frequently  88.6M      -  18.0G  -
data01/snapshots/homeassistant-r730-01@autosnap_2025-01-17_19:45:01_frequently  84.5M      -  18.0G  -
data01/snapshots/homeassistant-r730-01@autosnap_2025-01-17_20:00:04_hourly         0B      -  18.0G  -
data01/snapshots/homeassistant-r730-01@autosnap_2025-01-17_20:00:04_frequently     0B      -  18.0G  -
[20:30 r730-03 dvl ~] % 
[20:30 r730-03 dvl ~] % 
[20:30 r730-03 dvl ~] % grep -c autosnap ~/tmp/HA.snapshots 
57849

I don’t need snapshots from May 2023… Let’s clean those up.

All of them are snapshots taken on the remote host (r730-01), not this host (r730-03).

Let’s look at what we have at the origin:

[20:37 r730-01 dvl ~] % zfs list -r -t snapshot data02/vm/hass              
NAME                                                                             USED  AVAIL  REFER  MOUNTPOINT
data02/vm/hass@syncoid_r730-01.int.unixathome.org_2023-02-27:21:27:41-GMT00:00  16.0G      -  16.2G  -
data02/vm/hass@autosnap_2024-10-01_00:00:05_monthly                             4.86G      -  11.9G  -
data02/vm/hass@autosnap_2024-11-01_00:00:05_monthly                             7.64G      -  17.4G  -
data02/vm/hass@autosnap_2024-12-01_00:00:10_monthly                             3.13G      -  14.0G  -
data02/vm/hass@autosnap_2024-12-14_00:00:18_daily                                460M      -  17.9G  -
data02/vm/hass@autosnap_2024-12-15_00:00:01_daily                                274M      -  18.0G  -
data02/vm/hass@autosnap_2024-12-16_00:00:08_daily                                460M      -  18.1G  -
data02/vm/hass@autosnap_2024-12-17_00:00:10_daily                                245M      -  16.3G  -
data02/vm/hass@autosnap_2024-12-18_00:00:09_daily                                267M      -  16.6G  -
data02/vm/hass@autosnap_2024-12-19_00:00:06_daily                                342M      -  16.7G  -
data02/vm/hass@autosnap_2024-12-20_00:00:11_daily                                352M      -  16.8G  -
data02/vm/hass@autosnap_2024-12-21_00:00:20_daily                                357M      -  16.9G  -
data02/vm/hass@autosnap_2024-12-22_00:00:06_daily                                329M      -  17.0G  -
data02/vm/hass@autosnap_2024-12-23_00:00:08_daily                                570M      -  17.1G  -
data02/vm/hass@autosnap_2024-12-24_00:00:18_daily                                301M      -  17.8G  -
data02/vm/hass@autosnap_2024-12-25_00:00:08_daily                                295M      -  17.9G  -
data02/vm/hass@autosnap_2024-12-26_00:00:13_daily                                333M      -  18.0G  -
data02/vm/hass@autosnap_2024-12-27_00:00:15_daily                                332M      -  18.1G  -
data02/vm/hass@autosnap_2024-12-28_00:00:12_daily                                327M      -  18.2G  -
data02/vm/hass@autosnap_2024-12-29_00:00:12_daily                                314M      -  18.2G  -
data02/vm/hass@autosnap_2024-12-30_00:00:11_daily                                459M      -  18.3G  -
data02/vm/hass@autosnap_2024-12-31_00:00:23_daily                                328M      -  17.0G  -
data02/vm/hass@autosnap_2025-01-01_00:00:06_monthly                                0B      -  17.2G  -
data02/vm/hass@autosnap_2025-01-01_00:00:06_daily                                  0B      -  17.2G  -
data02/vm/hass@autosnap_2025-01-02_00:00:05_daily                                324M      -  17.3G  -
data02/vm/hass@autosnap_2025-01-03_00:00:09_daily                                317M      -  17.4G  -
data02/vm/hass@autosnap_2025-01-04_00:00:07_daily                                319M      -  17.6G  -
data02/vm/hass@autosnap_2025-01-05_00:00:15_daily                                387M      -  17.7G  -
data02/vm/hass@autosnap_2025-01-06_00:00:11_daily                                469M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-07_02:00:11_daily                                374M      -  17.1G  -
data02/vm/hass@autosnap_2025-01-08_00:00:09_daily                                363M      -  17.2G  -
data02/vm/hass@autosnap_2025-01-09_00:00:11_daily                                378M      -  17.3G  -
data02/vm/hass@autosnap_2025-01-10_00:00:08_daily                                377M      -  17.4G  -
data02/vm/hass@autosnap_2025-01-11_14:45:05_daily                                235M      -  17.4G  -
data02/vm/hass@autosnap_2025-01-12_00:00:17_daily                                336M      -  17.5G  -
data02/vm/hass@autosnap_2025-01-13_00:00:10_daily                               1.29G      -  19.4G  -
data02/vm/hass@autosnap_2025-01-14_00:00:01_daily                                309M      -  17.6G  -
data02/vm/hass@autosnap_2025-01-15_00:00:20_daily                                325M      -  17.7G  -
data02/vm/hass@autosnap_2025-01-15_20:00:01_hourly                               120M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-15_21:00:15_hourly                               106M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-15_22:00:15_hourly                               106M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-15_23:00:21_hourly                               107M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_00:00:22_daily                                  0B      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_00:00:22_hourly                                 0B      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_01:00:15_hourly                              99.0M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_02:00:04_hourly                              99.3M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_03:00:05_hourly                               107M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_04:00:01_hourly                               111M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_05:00:14_hourly                               117M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_06:00:03_hourly                               105M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_07:00:09_hourly                               104M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_08:00:07_hourly                               106M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_09:00:19_hourly                              98.0M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_10:00:14_hourly                              99.0M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_11:00:02_hourly                               109M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_12:00:12_hourly                               101M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_13:00:06_hourly                               102M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_14:00:02_hourly                               109M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_15:00:10_hourly                               111M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_16:00:07_hourly                               110M      -  17.8G  -
data02/vm/hass@autosnap_2025-01-16_17:00:05_hourly                               111M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-16_18:00:13_hourly                               113M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-16_19:00:12_hourly                               105M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-16_20:00:01_hourly                              97.9M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-16_21:00:15_hourly                              97.8M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-16_22:00:04_hourly                              98.9M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-16_23:00:11_hourly                               109M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_00:00:08_daily                                  0B      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_00:00:08_hourly                                 0B      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_01:00:04_hourly                               108M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_02:00:06_hourly                               109M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_03:00:06_hourly                               108M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_04:00:15_hourly                               114M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_05:00:05_hourly                               112M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_06:00:02_hourly                               105M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_07:00:13_hourly                               105M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_08:00:02_hourly                               106M      -  17.9G  -
data02/vm/hass@autosnap_2025-01-17_09:00:13_hourly                               107M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_10:00:12_hourly                              99.4M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_11:00:10_hourly                               100M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_12:00:01_hourly                               110M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_13:00:10_hourly                               101M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_14:00:02_hourly                               102M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_15:00:13_hourly                               104M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_16:00:07_hourly                              92.8M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_16:15:06_frequently                          87.5M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_16:30:06_frequently                          87.8M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_16:45:03_frequently                          88.4M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_17:00:07_hourly                                 0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_17:00:07_frequently                             0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_17:15:07_frequently                          88.6M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_17:30:07_frequently                          89.0M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_17:45:05_frequently                          90.3M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_18:00:05_hourly                                 0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_18:00:05_frequently                             0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_18:15:04_frequently                          88.9M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_18:30:01_frequently                          89.6M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_18:45:02_frequently                          89.3M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_19:00:13_hourly                                 0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_19:00:13_frequently                             0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_19:15:04_frequently                          88.6M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_19:30:07_frequently                          88.5M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_19:45:01_frequently                          84.4M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_20:00:04_hourly                                 0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_20:00:04_frequently                             0B      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_20:15:04_frequently                          82.8M      -  18.0G  -
data02/vm/hass@autosnap_2025-01-17_20:30:07_frequently                          79.6M      -  18.0G  -
[20:38 r730-01 dvl ~] % 

OK, I’m deleting every before 2024-10-01_00. I’ll start with all daily and frequently from not 2025, or 2024 months 10, 11, or 12.

Just the names

[20:43 r730-03 dvl /usr/local/etc/cron.d] % cut -f 1 -w ~/tmp/HA.snapshots | grep -v '^NAME$' > ~/tmp/HA.snapshot.names 
[20:43 r730-03 dvl /usr/local/etc/cron.d] % wc -l  ~/tmp/HA.snapshot.names
   57849 /home/dvl/tmp/HA.snapshot.names

Delete about 56000 snapshots

After some playing I did this:

[20:48 r730-03 dvl /usr/local/etc/cron.d] % egrep -e 'autosnap_202[3-4]' ~/tmp/HA.snapshot.names | grep -v 'monthly' | egrep -ve '2024-1[0-2]' | xargs -n 1 sudo zfs destroy 

Then I read the manpage

From the manpage I read:

An inclusive range of snapshots may be specified by separating the first and last snapshots with a percent sign. The first and/or last snapshots may be left blank, in which case the filesystem’s oldest or newest snapshot will be implied.

I control-c’d my script and got a fresh list of snapshots. The latest one was data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_01:00:12_hourly and the one before the next monthly was data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_23:45:03_frequently.

Let’s try this:

[20:55 r730-03 dvl /usr/local/etc/cron.d] % zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_01:00:12_hourly%autosnap_2023-09-30_23:45:03_frequently
cannot destroy snapshots: permission denied

That took a long time to get back to me. So I tried what I meant:


[20:57 r730-03 dvl /usr/local/etc/cron.d] % zfs destroy -nv data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_01:00:12_hourly%autosnap_2023-09-30_23:45:03_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_01:00:12_hourly
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_02:00:08_hourly
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_03:00:01_hourly
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_04:00:00_hourly
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-25_05:00:10_hourly
...
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_22:00:09_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_22:15:02_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_22:30:03_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_22:45:01_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_23:00:07_hourly
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_23:00:07_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_23:15:02_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_23:30:03_frequently
would destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-09-30_23:45:03_frequently
would reclaim 53.3G

In other ssh sessions, I issued a similar commands and waited:

[21:13 r730-03 dvl ~] % ps auwwx | grep destroy
root    6311    0.0  0.0   20368    9844  1  I+   21:08        0:00.01 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-10-01_00:00:02_daily%autosnap_2023-10-31_23:45:00_frequently
root    8576    0.0  0.0   20368    9848 10  I+   21:11        0:00.01 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-02-01_00:00:11_daily%autosnap_2024-02-29_23:45:02_frequently
root    6312    0.0  0.0   20368    9840  2  Is   21:08        0:00.00 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-10-01_00:00:02_daily%autosnap_2023-10-31_23:45:00_frequently
root    6313    0.0  0.4  528456  481940  2  I+   21:08        0:49.43 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-10-01_00:00:02_daily%autosnap_2023-10-31_23:45:00_frequently
root    6412    0.0  0.0   20368    9848  5  Is   21:09        0:00.00 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-11-01_00:00:13_daily%autosnap_2023-11-30_23:45:05_frequently
root    6413    0.0  0.0   19528    8400  5  D+   21:09        0:00.00 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-11-01_00:00:13_daily%autosnap_2023-11-30_23:45:05_frequently
root    6900    0.0  0.0   20368    9844  6  I+   21:10        0:00.01 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-12-01_00:00:03_daily%autosnap_2023-12-31_23:45:02_frequently
root    6901    0.0  0.0   20368    9840  7  Is   21:10        0:00.00 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-12-01_00:00:03_daily%autosnap_2023-12-31_23:45:02_frequently
root    6902    0.0  0.0   19528    8396  7  D+   21:10        0:00.00 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-12-01_00:00:03_daily%autosnap_2023-12-31_23:45:02_frequently
root    6411    0.0  0.0   20368    9852  3  I+   21:09        0:00.01 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2023-11-01_00:00:13_daily%autosnap_2023-11-30_23:45:05_frequently
root    8462    0.0  0.0   20368    9856  8  I+   21:10        0:00.01 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-01-01_00:00:19_daily%autosnap_2024-01-31_23:45:01_frequently
root    8463    0.0  0.0   20368    9852  9  Is   21:10        0:00.00 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-01-01_00:00:19_daily%autosnap_2024-01-31_23:45:01_frequently
root    8464    0.0  0.0   19528    8408  9  D+   21:10        0:00.00 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-01-01_00:00:19_daily%autosnap_2024-01-31_23:45:01_frequently
root    8577    0.0  0.0   20368    9844 11  Is   21:11        0:00.00 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-02-01_00:00:11_daily%autosnap_2024-02-29_23:45:02_frequently
root    8578    0.0  0.0   19528    8396 11  D+   21:11        0:00.00 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-02-01_00:00:11_daily%autosnap_2024-02-29_23:45:02_frequently
root    8676    0.0  0.0   20368    9848 12  I+   21:13        0:00.01 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-03-01_00:00:03_daily%autosnap_2024-08-31_23:45:05_frequently
root    8677    0.0  0.0   20368    9844 13  Is   21:13        0:00.00 sudo zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-03-01_00:00:03_daily%autosnap_2024-08-31_23:45:05_frequently
root    8678    0.0  0.0   19528    8396 13  D+   21:13        0:00.00 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-03-01_00:00:03_daily%autosnap_2024-08-31_23:45:05_frequently
dvl     8758    0.0  0.0   12808    2380 14  S+   21:13        0:00.00 grep destroy

I wanted to keep some monthly snapshots, then I gave up and went for a much larger range: 2024-03-01 to 2024-08-31.

iostat

Did you know iostat will prompt you for drives? Well, it did in my zsh shell. In the following command, I pressed CTL-D after typing 10:

[21:16 r730-03 dvl ~] % sudo iostat -w 2 -c 10
ada0  ada1  cd0   da0   da1   da2   da3   da4   da5   da6

Which means you get to see this:

[21:16 r730-03 dvl ~] % sudo iostat -w 2 -c 10 da0 da1 da2 da4 da4 da5 da6                       
       tty              da0              da1              da2              da4              da5              da6             cpu
 tin  tout KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s KB/t   tps  MB/s  us ni sy in id
   0    37  134   135 17.72  140   133 18.17  128   151 18.90  140   126 17.21  129   144 18.11 15.4     0  0.00   0  0  1  0 99

   1   392  0.0     0  0.00  4.5    23  0.10  4.6    19  0.09  5.0    31  0.15  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   123  0.0     0  0.00  4.6    32  0.14  4.4    26  0.11  4.6    27  0.12  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   136  0.0     0  0.00  5.2    25  0.12  4.5    23  0.10  4.8    39  0.18  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   130  0.0     0  0.00  4.5    30  0.13  4.6    32  0.14  4.8    28  0.13  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   129  0.0     0  0.00  4.8    21  0.10  4.8    20  0.09  5.0    29  0.14  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   131  0.0     0  0.00  4.3    15  0.06  5.0    23  0.11  4.8    34  0.16  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   128  0.0     0  0.00  5.5    14  0.08  5.3    22  0.12  4.7    34  0.15  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   133  0.0     0  0.00  4.6    16  0.07  4.6    14  0.06  5.3    22  0.12  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99
   0   130  128     1  0.13  4.1    31  0.12  4.5    39  0.17 14.2    33  0.46  0.0     0  0.00  0.0     0  0.00   0  0  1  0 99

Running so many destroys….

Those 12 concurrent destroys, each acting upon the same filesysteme, ran for several hours. The system became unusable. Monitoring checks failed:

It wasn’t until about 01:42 (about 2.5 hours later) that the checks went green.

I’m also pleased that the pool capacity went from 79% down to 67% – the gap in the graph is because monitoring could get data from the filesystem during the multiple concurrent destroys.

So how many are left?

We still have 16,000 snapshots left.

[12:54 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now
[12:54 r730-03 dvl ~/tmp] % wc -l  ~/tmp/now
   16695 /home/dvl/tmp/now

Let’s try one more range delete

I decided to try one more range delete, that is: delete everything between this and that. I also timed it and captured the list of deleted snapshots.

[13:04 r730-03 dvl ~] % time sudo zfs destroy -v data01/snapshots/homeassistant-r730-01@autosnap_2024-09-01_00:00:19_daily%autosnap_2024-09-30_23:45:03_frequently
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-01_00:00:19_daily
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-01_00:00:19_hourly
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-01_00:00:19_frequently
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-01_00:15:06_frequently
...
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-30_23:00:07_frequently
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-30_23:15:05_frequently
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-30_23:30:02_frequently
will destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-09-30_23:45:03_frequently
will reclaim 339G
sudo zfs destroy -v   0.03s user 0.04s system 0% cpu 38:00.89 total

I haven’t listed all the output here. However, I did get a count: 3630 snapshots. That’s not trivial.

It took 38 minutes. Again, the system because unusable – there were more gaps in the monitoring.

When running this command, I could not get sysutils/btop to display. However, iostat ran just fine. Filesystem commands such as zfs list did not run. Indeed, once the command was entered, I could not CTL-C out of it. CTL-T did work, as shown here:

[13:07 r730-03 dvl ~] % zfs list
load: 1.12  cmd: zfs 52910 [rrl->rr_cv] 2.63r 0.00u 0.00s 0% 8432k
load: 1.12  cmd: zfs 52910 [rrl->rr_cv] 3.78r 0.00u 0.00s 0% 8432k
load: 1.12  cmd: zfs 52910 [rrl->rr_cv] 3.97r 0.00u 0.00s 0% 8432k
load: 1.12  cmd: zfs 52910 [rrl->rr_cv] 4.43r 0.00u 0.00s 0% 8432k
load: 2.51  cmd: zfs 52910 [rrl->rr_cv] 538.48r 0.00u 0.00s 0% 8432k

Let’s try one at a time

Let’s go back to my long time approach, delete one-at-a-time with xargs.

In this test, I’m going to delete snapshots from Oct 2024.

[13:43 r730-03 dvl ~] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now 
[13:49 r730-03 dvl ~] % grep autosnap_2024-10 ~/tmp/now | cut -f 1 -w | grep -v monthly | wc -l
    3751
[13:50 r730-03 dvl ~] % grep autosnap_2024-10 ~/tmp/now | cut -f 1 -w | grep -v monthly | head 
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:00:05_daily
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:00:05_hourly
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:00:05_frequently
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:15:04_frequently
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:30:02_frequently
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:45:01_frequently
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:00:02_hourly
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:00:02_frequently
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:15:05_frequently
data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:30:04_frequently

A short test to validate:

[13:51 r730-03 dvl ~] % grep autosnap_2024-10 ~/tmp/now | cut -f 1 -w | grep -v monthly | head | xargs -n 1 echo sudo destroy       
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:00:05_daily
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:00:05_hourly
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:00:05_frequently
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:15:04_frequently
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:30:02_frequently
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_00:45:01_frequently
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:00:02_hourly
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:00:02_frequently
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:15:05_frequently
sudo destroy data01/snapshots/homeassistant-r730-01@autosnap_2024-10-01_01:30:04_frequently

And here we go, adding time, removing head, and adding in the zfs command.

[13:52 r730-03 dvl ~] % time grep autosnap_2024-10 ~/tmp/now | cut -f 1 -w | grep -v monthly | xargs -n 1 sudo zfs destroy 
grep autosnap_2024-10 ~/tmp/now  0.01s user 0.00s system 0% cpu 1:39:34.11 total
cut -f 1 -w  0.01s user 0.00s system 0% cpu 2:22:20.67 total
grep -v monthly  0.00s user 0.00s system 0% cpu 3:05:17.08 total
xargs -n 1 sudo zfs destroy  26.03s user 22.66s system 0% cpu 3:39:04.70 total
[17:31 r730-03 dvl ~] % 

That took 3.5 hours, however, the system was quite usable. And there are many more snapshots to delete.

While that ran, btop was usable:

See also, Why does the same command appear on two different ports with different times??

November, it’s time for November

This next session, I’ll do within tmux.

[17:49 r730-03 dvl ~] % time grep autosnap_2024-11 ~/tmp/now | cut -f 1 -w | grep -v monthly | xargs -n 1 echo sudo zfs destroy | wc -l
    3598
grep autosnap_2024-11 ~/tmp/now  0.01s user 0.00s system 1% cpu 0.926 total
cut -f 1 -w  0.00s user 0.01s system 0% cpu 1.351 total
grep -v monthly  0.00s user 0.00s system 0% cpu 1.793 total
xargs -n 1 echo sudo zfs destroy  0.72s user 1.33s system 100% cpu 2.044 total
wc -l  0.01s user 0.01s system 1% cpu 2.044 total

With 3600 snapshots to delete, I’m going to assume it’ll take another 3.5 hours. Here goes:

[17:51 r730-03 dvl ~] % time grep autosnap_2024-11 ~/tmp/now | cut -f 1 -w | grep -v monthly | xargs -n 1 sudo zfs destroy        

And that died when I rebooting my laptop, despite it being in a tmux session. Why?

sudo: 3 incorrect password attempts
Sorry, try again.
Sorry, try again.
sudo: 3 incorrect password attempts
grep autosnap_2024-11 ~/tmp/now  0.01s user 0.01s system 0% cpu 1:26:09.42 total
cut -f 1 -w  0.00s user 0.02s system 0% cpu 1:51:09.42 total
grep -v monthly  0.00s user 0.00s system 0% cpu 1:51:13.94 total
xargs -n 1 sudo zfs destroy  19.57s user 15.38s system 0% cpu 1:51:16.56 total

My sudo auth is based on ssh keys.

Let’s go again, redoing the list, because we’ve deleted some stuff from there.

[22:13 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now                        
[22:14 r730-03 dvl ~/tmp] % wc -l ~/tmp/now 
    7345 /home/dvl/tmp/now
[22:14 r730-03 dvl ~] % grep autosnap_2024-11 ~/tmp/now | cut -f 1 -w | grep -v monthly | wc -l
    1579
[22:14 r730-03 dvl ~/tmp] % time grep autosnap_2024-11 ~/tmp/now | cut -f 1 -w | grep -v monthly | xargs -n 1 sudo zfs destroy
grep autosnap_2024-11 ~/tmp/now  0.01s user 0.00s system 94% cpu 0.006 total
cut -f 1 -w  0.00s user 0.00s system 79% cpu 0.006 total
grep -v monthly  0.00s user 0.00s system 0% cpu 11:08.96 total
xargs -n 1 sudo zfs destroy  9.37s user 7.62s system 1% cpu 21:47.85 total

That didn’t take so long.

frequently and hourly

Next up:

[22:42 r730-03 dvl ~/tmp] % time grep autosnap_2024-12 ~/tmp/now | cut -f 1 -w | egrep -e 'frequently|hourly' | wc -l        
    3710
grep autosnap_2024-12 ~/tmp/now  0.01s user 0.01s system 35% cpu 0.035 total
cut -f 1 -w  0.01s user 0.01s system 24% cpu 0.042 total
egrep -e 'frequently|hourly'  0.05s user 0.00s system 99% cpu 0.047 total
wc -l  0.00s user 0.00s system 6% cpu 0.046 total

Let’s go again:

[22:44 r730-03 dvl ~/tmp] % time grep autosnap_2024-12 ~/tmp/now | cut -f 1 -w | egrep -e 'frequently|hourly' | xargs -n 1 sudo zfs destroy
load: 0.66  cmd: sudo 12117 [running] 0.00r 0.00u 0.00s 0% 6620k
load: 1.50  cmd: sudo 40626 [select] 0.97r 0.01u 0.00s 0% 9824k
grep autosnap_2024-12 ~/tmp/now  0.01s user 0.00s system 0% cpu 22:24.66 total
cut -f 1 -w  0.00s user 0.01s system 0% cpu 34:18.98 total
egrep -e 'frequently|hourly'  0.04s user 0.01s system 0% cpu 46:33.18 total
xargs -n 1 sudo zfs destroy  22.43s user 18.07s system 1% cpu 55:10.18 total

That took an hour.

Down near 2000 now

The next morning, I found the scrub had finished:

[14:32 r730-03 dvl ~] % zpool status data01                       
  pool: data01
 state: ONLINE
  scan: scrub repaired 0B in 1 days 15:49:11 with 0 errors on Sat Jan 18 21:18:32 2025
config:

	NAME                   STATE     READ WRITE CKSUM
	data01                 ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    gpt/HGST_8CJVT8YE  ONLINE       0     0     0
	    gpt/SG_ZHZ16KEX    ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    gpt/SG_ZHZ03BAT    ONLINE       0     0     0
	    gpt/HGST_8CJW1G4E  ONLINE       0     0     0
	  mirror-2             ONLINE       0     0     0
	    gpt/SG_ZL2NJBT2    ONLINE       0     0     0
	    gpt/HGST_5PGGTH3D  ONLINE       0     0     0

errors: No known data errors

And only about 200 snapshots remaining:

[23:39 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now
[13:21 r730-03 dvl ~/tmp] % wc -l ~/tmp/now                                                                          
    2132 /home/dvl/tmp/now

I’m going to leave that to sanoid.

sanoid?

I’m already running sanoid over here. Let’s see if that can help clean up old stuff for the long term. I want to remove snapshots, but not create snapshots.

This is where the snapshots are created, on r730-01. This is an extract from /usr/local/etc/sanoid/sanoid.conf on that host:

[data02/vm]
    use_template = vm
    recursive    = zfs

# based on production
[template_vm]
    frequently = 15
    hourly     = 48
    daily      = 35
    monthly    = 4
    autosnap   = yes
    autoprune  = yes

This is what I have prepared on r730-03, in the same file name:

[data01/snapshots]
    process_children_only = yes
    use_template          = snapshots
    recursive             = yes

[template_snapshots]
# based on template_vm on r730-01
    frequently = 60
    hourly     = 72
    daily      = 120 
    monthly    = 24
    autosnap   = no
    autoprune  = yes

The autosnap directive should prevent sanoid from creating snapshots here. The autoprune will prune old snapshots based on the local directives, not the directives on the original host.

You will notice the difference between “recursive = yes” and “recursive= zfs“. The “zfs” type of recursion does not allow for “the possibility to override settings for child datasets.” – “yes” allows you to specify difference settings for child datasets. “zfs” does not.

Now I want until 51 past the hour to see how my sanoid instance cleans up the remaining snapshots.

[13:26 r730-03 dvl ~/tmp] % cat /usr/local/etc/cron.d/sanoid 
# mail any output to `dan', no matter whose crontab this is
MAILTO=dan@example.org

PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin

*/15 * * * * root /usr/bin/lockf -t 0 /tmp/.sanoid-cron-snapshot /usr/local/bin/sanoid --take-snapshots
51   * * * * root /usr/bin/lockf -t 0 /tmp/.sanoid-cron-prune    /usr/local/bin/sanoid --prune-snapshots

51 past the hour

Look what I see now:

[13:26 r730-03 dvl ~/tmp] % ps auwwx | grep destroy
root   31653    0.0  0.0   19528    8916  -  S    13:51        0:00.01 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2025-01-01_05:30:07_frequently
dvl    31655    0.0  0.0   12808    2396  2  S+   13:51        0:00.00 grep destroy
[13:51 r730-03 dvl ~/tmp] % ps auwwx | grep sanoid 
root   30411    0.2  0.0   36332   23544  -  S    13:51        0:01.21 /usr/local/bin/perl /usr/local/bin/sanoid --prune-snapshots
root   30410    0.0  0.0   12712    2124  -  Is   13:51        0:00.00 /usr/bin/lockf -t 0 /tmp/.sanoid-cron-prune /usr/local/bin/sanoid --prune-snapshots
dvl    31667    0.0  0.0   12808    2392  2  S+   13:51        0:00.00 grep sanoid
[13:51 r730-03 dvl ~/tmp] % 
[13:51 r730-03 dvl ~/tmp] % ps auwwx | grep destroy
root   31736    0.0  0.0   19528    8920  -  S    13:52        0:00.01 zfs destroy data01/snapshots/homeassistant-r730-01@autosnap_2025-01-01_14:30:02_frequently
dvl    31738    0.0  0.0   12808    2392  2  S+   13:52        0:00.00 grep destroy
[13:52 r730-03 dvl ~/tmp] % 

Now go the numbers:

[13:54 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now
[13:54 r730-03 dvl ~/tmp] % wc -l ~/tmp/now                                                           
    1914 /home/dvl/tmp/now

I suppose I could have let sanoid handle this from the start….

A bit more cleanup

We still have too many hourly snapshots:

[14:17 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now
[14:18 r730-03 dvl ~/tmp] % grep -c hourly now                                                        
419

So let’s destroy all but the latest 72 hourly snapshots:

[14:21 r730-03 dvl ~/tmp] % grep hourly ~/tmp/now | cut -f 1 -w | head -347 | xargs -n 1 sudo zfs destroy                                  
[14:27 r730-03 dvl ~/tmp] % 

This assumes the snapshots are sorted already. I could have run this through sort before head.

There’s a similar problem with frequently:

[14:27 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now
[14:28 r730-03 dvl ~/tmp] % grep -c hourly now                                                           
72
[14:29 r730-03 dvl ~/tmp] % grep -c frequently now
236
[14:29 r730-03 dvl ~/tmp] % grep frequently ~/tmp/now | cut -f 1 -w | sort | head -1761 | xargs -n 1 sudo zfs destroy
[14:33 r730-03 dvl ~/tmp] % 

This time I did use sort.

Are we there yet?

Based on the sanoid configuration:

    # based on template_vm on r730-01
    frequently = 60
    hourly     = 72
    daily      = 120 
    monthly    = 24
    autosnap   = no
    autoprune  = yes

Let’s compare:

[15:32 r730-03 dvl ~/tmp] % zfs list -r -t snapshot data01/snapshots/homeassistant-r730-01 > ~/tmp/now
[15:32 r730-03 dvl ~/tmp] % grep -c freq now
5
[15:32 r730-03 dvl ~/tmp] % grep -c hourly now
73
[15:32 r730-03 dvl ~/tmp] % grep -c daily now 
50
[15:32 r730-03 dvl ~/tmp] % grep -c monthly now
16

Oh, perhaps I overdid it with daily and monthly. Not to worry.

This host now has 1/20 of the snapshots it had before:

[15:32 r730-03 dvl ~/tmp] % zfs list -r -t snapshot | wc -l                                           
    3098

I’ll leave it at that.

How much did this free up?

I wasn’t keep tracking, so let’s refer to LibreNMS:

From the looks of that, data01 went from 7.31TB free to 13.63TB free – the deleted snapshots had consumed 6.3TB – in a 22.45TB pool, that represents 28% of the pool. That is not trivial.

You can also see the two gaps where I was doing multiple destroys.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top