Is deleting empty snapshots faster?

During the 2025-01-22 OpenZFS Production User Call, ‘atomic operations’ was mentioned with respect to previous blog post, might be expected.

In this post:

Let’s do a test

Speculation about empty snapshots was mentioned during the call. I did a test with 3000 snapshots

First, I create a filesystem for this testing:

[21:29 r730-03 dvl ~] % sudo zfs create data01/snapshots/deleting

Then, use jot to create 3000 snapshots:

[21:32 r730-03 dvl ~] % jot 3000 | xargs -I % -n 1 echo sudo zfs snapshot data01/snapshots/deleting@%  | head        
sudo zfs snapshot data01/snapshots/deleting@1
sudo zfs snapshot data01/snapshots/deleting@2
sudo zfs snapshot data01/snapshots/deleting@3
sudo zfs snapshot data01/snapshots/deleting@4
sudo zfs snapshot data01/snapshots/deleting@5
sudo zfs snapshot data01/snapshots/deleting@6
sudo zfs snapshot data01/snapshots/deleting@7
sudo zfs snapshot data01/snapshots/deleting@8
sudo zfs snapshot data01/snapshots/deleting@9
sudo zfs snapshot data01/snapshots/deleting@10
xargs: echo: terminated with signal 13; aborting

[21:32 r730-03 dvl ~] % jot 3000 | xargs -I % -n 1 sudo zfs snapshot data01/snapshots/deleting@%
[21:38 r730-03 dvl ~] % zfs list -r -t snapshot data01/snapshots/deleting | wc -l
    3001

That creation took 6 minutes.

Let’s delete.

[21:41 r730-03 dvl ~] % time sudo zfs destroy data01/snapshots/deleting@1%3000
sudo zfs destroy data01/snapshots/deleting@1%3000  0.01s user 0.01s system 0% cpu 39.270 total
[21:43 r730-03 dvl ~] % 

40 seconds to destroy. That’s impression.

Next, more.

Let’s try 60,000 empty snapshots

For my next trick, let’s create 60,000 snapshots

[21:43 r730-03 dvl ~] % zfs list -r -t snapshot data01/snapshots/deleting | wc -l
no datasets available
       0
[21:45 r730-03 dvl ~] % jot 60000 | xargs -I % -n 1 sudo zfs snapshot data01/snapshots/deleting@%           
[4:56 r730-03 dvl ~] % 

So that took 7 hours to create. Wow. It ran over night. It is now the 23rd.

How long does it take to list them?

[12:43 r730-03 dvl ~] % time zfs list -r -t snapshot data01/snapshots/deleting > ~/tmp/deleting
zfs list -r -t snapshot data01/snapshots/deleting > ~/tmp/deleting  2.54s user 48.47s system 99% cpu 51.042 total

50 seconds. That’s OK.

60,000 deletes starting on the 23rd

I started the delete. Actually, it’s not 60,000 deletes. It’s one destroy, of 60,000 snapshots.

[12:52 r730-03 dvl ~] % time sudo zfs destroy data01/snapshots/deleting@1%60000

After starting the above command, I started btop, and ran several zfs list. Eventually, the zfs list command hung (did not come back to the command line.

I stopped btop and tried running it again, it did not start and did not come back to the command line.

4 hours later

It’s been running about 4 hours now.

At present, I cannot ssh to the host:

[11:44 pro02 dan ~] % r730-03
kex_exchange_identification: read: Connection reset by peer
Connection reset by 10.55.0.143 port 22

I have tried to ssh into various jails on that host: same result.

There are plenty of Nagios notifications:

swap issues

Connecting to the console, I see lots of swap related messages.

The console is scrolling, so the system is still alive. I’m going to leave it for a bit longer.

NOTE: the zfs destroy command is not responding to CTL-t.

23:39

The zfs destroy started at about 12:52. It’s now 12.5 hours later…

The host is responding the pings:

[18:38 pro04 dvl ~] % ping r730-03                                              
PING r730-03.int.unixathome.org (10.55.0.143): 56 data bytes
64 bytes from 10.55.0.143: icmp_seq=0 ttl=63 time=5.167 ms
64 bytes from 10.55.0.143: icmp_seq=1 ttl=63 time=7.554 ms
^C
--- r730-03.int.unixathome.org ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 5.167/6.361/7.554/1.193 ms
[18:41 pro04 dvl ~] % 

Still no ssh response.

I’m headed out for dinner, so we’ll check back later.

13:30 – the next day

This morning, all the existing ssh sessions have been disconnected. The host is no longer responding to pings. Attempts to ssh time out. Samba mounts have disconnected.

[12:52 r730-03 dvl ~] % time sudo zfs destroy data01/snapshots/deleting@1%60000
client_loop: send disconnect: Broken pipe
[22:59 pro02 dan ~] % ping r730-03  
PING r730-03.int.unixathome.org (10.55.0.143): 56 data bytes
Request timeout for icmp_seq 0
^C
--- r730-03.int.unixathome.org ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
[8:30 pro02 dan ~] % 



[12:21 pro02 dan ~] % r730-03

ssh: connect to host r730-03.int.unixathome.org port 22: Operation timed out
[8:31 pro02 dan ~] % 

The console is still scrolling the swap_pager: indefinite wait buffer: bufobj 0: blkno: 7301: size: 12288 (for example).

None of the overnight backups succeeded (this host is the destination).

21:05 – the 24th

Still completely unresponsive. The console is still scrolling with swap_pager messages.

23:54

I’ve noticed that the screen shots of the console seem to be cycling and repeating the same sequences.

I’ve put the two shots here (the first is repeated from above).

Earlier in the process

Just before rebooting

It is time to terminate the experiment.

After the reboot

After the reboot, the host is back, and everything is green on Nagios.

The bad news is: none of the 60,000 snapshots were deleted. It is an atomic operation and it did not complete.

I’ll go back to the xargs solution.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top