Loading up an old copy of a PostgreSQL database, zfs snapshots, and sanoid snapshot management

Sometimes you want to go back to a known good state. And you want to do it quickly, without waiting around. In my case, that’s going to be a zfs snapshot.

Today I am preparing to test some code changes so that FreshPorts properly creates a new category. See these recent blog posts for details:

In this post:

  1. FreeBSD 14.1-RELEASE-p5
  2. postgresql16-server-16.4

dbclone

The FreeBSD jail dbclone collects database backups from the database servers. It then does a test restore to make sure the restore process works. The commit in question occurred at 2024-11-06 15:17:35 UTC – so I want a back earlier than that.

Looking in the jail, I found:

[10:53 dbclone dvl ~rsyncer/backups/aws-1/postgresql] % ls -l
total 3510909
-rw-r--r--  1 rsyncer rsyncer 3594468260 2024.11.08 02:13 freshports.org.dump
-rw-r--r--  1 rsyncer rsyncer       3963 2024.11.08 02:13 globals.sql

That won’t do. Let’s check snapshots.

[10:53 r730-01 dvl ~] % zfs list | grep dbclone
[10:53 r730-01 dvl ~] % 

Wrong host….

[10:53 r730-03 dvl ~] % zfs list | grep dbclone
data01/dbclone.backups.rsyncer                 421G  3.09T   137G  /jails/dbclone/usr/home/rsyncer/backups
data01/dbclone.postgres                       54.7G  8.77T  54.7G  /jails/dbclone/var/db/postgres
data01/jails/dbclone                          25.1G  8.77T  18.6G  /jails/dbclone

Let’s go look in snapshots here:

[11:08 r730-03 dvl ~] % cd /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot
cd: permission denied: /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot

[11:09 r730-03 dvl ~] % sudo ls -l /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot
total 51
drwxr-x---  15 10839 10839 16 Jun  2 12:04 autosnap_2024-11-06_00:00:03_daily
drwxr-x---  15 10839 10839 16 Jun  2 12:04 autosnap_2024-11-07_00:00:01_daily
drwxr-x---  15 10839 10839 16 Jun  2 12:04 autosnap_2024-11-08_00:00:12_daily

Two things come from that:

  1. I want more snapshots
  2. autosnap_2024-11-06_00:00:03_daily might do
[11:11 r730-03 dvl ~] % sudo ls -l /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot/autosnap_2024-11-06_00:00:03_daily/aws-1/postgresql
total 7030162
-rw-r--r--  1 10839 10839 3598726805 Nov  5 02:14 freshports.org.dump
-rw-r--r--  1 10839 10839       3963 Nov  5 02:14 globals.sql
[11:11 r730-03 dvl ~] % 

Well, let’s try the later snapshot.

[11:11 r730-03 dvl ~] % sudo ls -l /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot/autosnap_2024-11-07_00:00:01_daily/aws-1/postgresql
total 7028826
-rw-r--r--  1 10839 10839 3598138018 Nov  6 02:13 freshports.org.dump
-rw-r--r--  1 10839 10839       3963 Nov  6 02:13 globals.sql

That’ll do.

Copying

Get it to the server:

[11:13 r730-03 dvl ~] % sudo scp /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot/autosnap_2024-11-07_00:00:01_daily/aws-1/postgresql/freshports.org.dump dvl@pg03:freshports.org.dump
freshports.org.dump           100% 3431MB 173.7MB/s   00:19    

Let’s check that file:

[11:15 pg03 dvl ~] % md5 freshports.org.dump
MD5 (freshports.org.dump) = 8609e1ecb8b7da74a09d3c56ac850b24

[11:13 r730-03 dvl ~] % sudo md5 /jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot/autosnap_2024-11-07_00:00:01_daily/aws-1/postgresql/freshports.org.dump 
MD5 (/jails/dbclone/usr/home/rsyncer/backups/.zfs/snapshot/autosnap_2024-11-07_00:00:01_daily/aws-1/postgresql/freshports.org.dump) = 8609e1ecb8b7da74a09d3c56ac850b24

They match.

Loading into the server

SQL_ASCII is used because I cannot control what committers put into files.

[11:14 pg03 dvl ~] % createdb freshports.dvl-category-testing -E SQL_ASCII -T template0 -O postgres
[11:18 pg03 dvl ~] % time pg_restore -j 60 -d freshports.dvl-category-testing -Fc freshports.org.dump
pg_restore: error: could not execute query: ERROR:  role "rdsadmin" does not exist
Command was: ALTER SCHEMA public OWNER TO rdsadmin;

The rdsadmin role is what Amazon RDS uses. It come with the dump.

Now I wait….

After a walk and a coffee, it’s done:

pg_restore: warning: errors ignored on restore: 1
pg_restore -j 60 -d freshports.dvl-category-testing -Fc freshports.org.dump  163.15s user 34.95s system 7% cpu 45:34.95 total

I think that was 45 minutes.

However, I’m going to do this again, on a different server. I want to make use of ZFS snapshots to rollback the database. That will allow me to test, rollback, test again, with a known good starting point. When doing that rollback, I want to stop PostgreSQL. This other server has no databases on it now. Shutting down my main development server would affect FreshPorts dev, test, and stage. Let’s not do that.

Copy over:

[13:14 pg03 dvl ~] % scp freshports.org.dump pg02:
freshports.org.dump                                                                                   100% 3431MB 124.0MB/s   00:27    
[13:21 pg03 dvl ~] % 

Create and load. On this host, I don’t have rights, so I become the postgres user first:

[13:22 pg02 dvl ~] % sudo su -l postgres
$ createdb freshports.dvl-category-testing -E SQL_ASCII -T template0 -O postgres
$ time pg_restore -j 60 -d freshports.dvl-category-testing -Fc ~dvl/freshports.org.dump
pg_restore: error: could not execute query: ERROR:  role "rdsadmin" does not exist
Command was: ALTER SCHEMA public OWNER TO rdsadmin;

Then I cleaned up the other server while waiting:

[13:21 pg03 dvl ~] % dropdb freshports.dvl-category-testing
[13:24 pg03 dvl ~] % rm freshports.org.dump 
[13:24 pg03 dvl ~] % 

Snapshot changes

While waiting, I looked at snapshots. I’m holding only three days. Let’s see the config

[13:15 r730-03 dvl ~] % sudoedit /usr/local/etc/sanoid/sanoid.conf

Where I found:

[data01/dbclone.backups.rsyncer]
    use_template          = dbclone_backups
    recursive             = yes

[template_dbclone_backups]
    frequently = 0
    hourly     = 0
    daily      = 3
    monthly    = 0
    autosnap   = yes
    autoprune  = yes

How much space is that using?

[13:15 r730-03 dvl ~] % zfs list -r -t snapshot data01/dbclone.backups.rsyncer
NAME                                                                USED  AVAIL  REFER  MOUNTPOINT
data01/dbclone.backups.rsyncer@autosnap_2024-11-06_00:00:03_daily  94.4G      -   137G  -
data01/dbclone.backups.rsyncer@autosnap_2024-11-07_00:00:01_daily  94.4G      -   137G  -
data01/dbclone.backups.rsyncer@autosnap_2024-11-08_00:00:12_daily  94.4G      -   137G  -
[13:15 r730-03 dvl ~] % zfs list data01/dbclone.backups.rsyncer         
NAME                             USED  AVAIL  REFER  MOUNTPOINT
data01/dbclone.backups.rsyncer   421G  3.09T   137G  /jails/dbclone/usr/home/rsyncer/backups
[13:29 r730-03 dvl ~] % zpool list
NAME     SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data01  32.7T  23.8T  8.94T        -         -    24%    72%  1.00x    ONLINE  -
zroot    412G  8.96G   403G        -         -    16%     2%  1.00x    ONLINE  -

So that zpool has about 9TB free. There’s another 3TB or so until it gets to the lauded 80% full mark.

Each day of backups seems to take about 100GB. So keeping another 4 days will take up an additional 400GB.

Taking two weekly snapshots will give take another 200GB. Let’s go 5 instead. Let’s also do two monthly backups.

So that 400GB + 500GB + 200GB or about 1.1TB additional space. We can try that and see how it goes.

This is what I have now:

# This should take about 100GB / day - so this approach should be about 1.5TB
[template_dbclone_backups]
    frequently = 0
    hourly     = 0
    daily      = 7
    weekly     = 5
    monthly    = 2
    autosnap   = yes
    autoprune  = yes

What about backups?

In Bacula, I have these backups of that file, about 2 weeks.

Select the Client (1-57): 7
freshports.org.dumppath):freshports.org.dump
+---------+------------------------------------------------------------+---------------------+---------+-----------+----------+----------------+
| jobid   | name                                                       | starttime           | jobtype | jobstatus | jobfiles | jobbytes       |
+---------+------------------------------------------------------------+---------------------+---------+-----------+----------+----------------+
| 370458  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-08 03:05:01 | B       | T         | 313      | 70510052692    |
| 370430  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-07 03:05:01 | B       | T         | 313      | 70510670150    |
| 370402  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-06 03:05:01 | B       | T         | 313      | 70506778230    |
| 370374  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-05 03:05:01 | B       | T         | 313      | 70507566372    |
| 370346  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-04 03:05:01 | B       | T         | 313      | 70508440245    |
| 370316  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-03 03:05:02 | B       | T         | 313      | 70512076117    |
| 370288  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-02 03:05:01 | B       | T         | 313      | 70514046375    |
| 370260  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-11-01 03:05:01 | B       | T         | 313      | 70515975713    |
| 370232  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-31 03:05:01 | B       | T         | 313      | 70517150624    |
| 370204  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-30 03:05:02 | B       | T         | 313      | 70518734908    |
| 370176  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-29 03:05:02 | B       | T         | 313      | 70519297782    |
| 370148  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-28 03:05:01 | B       | T         | 313      | 70519894667    |
| 370120  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-27 03:05:01 | B       | T         | 313      | 70520230768    |
| 370092  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-26 03:05:01 | B       | T         | 313      | 70520094047    |
| 370064  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-25 03:05:01 | B       | T         | 313      | 70521136255    |
| 370036  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-24 03:05:01 | B       | T         | 313      | 70521941315    |
| 370008  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-23 03:05:02 | B       | T         | 313      | 70524146942    |
| 369980  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-22 03:05:02 | B       | T         | 313      | 70521638676    |
| 369952  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-21 03:05:01 | B       | T         | 313      | 70522558679    |
| 369924  | /home/rsyncer/backups/aws-1/postgresql/freshports.org.dump | 2024-10-20 03:05:01 | B       | T         | 313      | 70528981985    |
+---------+------------------------------------------------------------+---------------------+---------+-----------+----------+----------------+

Coming back to the restore

45 minutes later, I came back to many GRANT errors, such as:

pg_restore: error: could not execute query: ERROR:  role "www" does not exist
Command was: GRANT SELECT ON TABLE public.watch_notice TO www;
GRANT SELECT,INSERT,UPDATE ON TABLE public.watch_notice TO commits;
GRANT SELECT ON TABLE public.watch_notice TO rsyncer;
GRANT SELECT ON TABLE public.watch_notice TO freshsource_ro;

I should have loaded the globals.sql file first.

Here we go again.

Copy over the globals.sql file from last night’s backup:

[14:18 pg03 dvl ~rsyncer/backups/database-backup/postgresql] % ls -l globals.sql 
-rw-r--r--  1 rsyncer rsyncer 18145 2024.11.08 03:36 globals.sql
[14:18 pg03 dvl ~rsyncer/backups/database-backup/postgresql] % scp globals.sql pg02:

Load up:

$ psql template1 < ~dvl/globals.sql 
SET
SET
SET
CREATE ROLE
ALTER ROLE
CREATE ROLE
ALTER ROLE
...
CREATE ROLE
ALTER ROLE
CREATE ROLE
ALTER ROLE
ERROR:  role "postgres" already exists
ALTER ROLE
CREATE ROLE
ALTER ROLE
CREATE ROLE
...
ALTER ROLE
CREATE ROLE
ALTER ROLE
CREATE ROLE
ALTER ROLE
CREATE ROLE
ALTER ROLE
ERROR:  role "rsyncer" already exists
ALTER ROLE
ERROR:  role "snmpd" already exists
ALTER ROLE
CREATE ROLE
ALTER ROLE
CREATE ROLE
ALTER ROLE
CREATE ROLE
ALTER ROLE
CREATE ROLE
...
GRANT ROLE
GRANT ROLE

And start again:

$ createdb freshports.dvl-category-testing -E SQL_ASCII -T template0 -O postgres
$ time pg_restore -j 60 -d freshports.dvl-category-testing -Fc ~dvl/freshports.org.dump
pg_restore: error: could not execute query: ERROR:  role "rdsadmin" does not exist
Command was: ALTER SCHEMA public OWNER TO rdsadmin;

Not shown: Alexa, 45 minutes

Later, we get:

pg_restore: warning: errors ignored on restore: 1
     2800.34 real       164.01 user        34.47 sys

OK, we're done loading up.

Snapshot

First, shutdown PostgreSQL on the database server (jail):

[15:16 pg02 dvl ~] % sudo service postgresql stop

Then, on the jail host:

[15:16 r730-01 dvl ~] % zfs list | grep pg02
data02/jails/pg02                                                   17.9G   565G  13.1G  /jails/pg02
data03/pg02                                                         85.9G  5.98T    88K  none
data03/pg02/postgres                                                85.9G  5.98T  26.7G  /jails/pg02/var/db/postgres
data03/pg02/rsyncer                                                  856K  5.98T   112K  /jails/pg02/usr/home/rsyncer/backups

We want data03/pg02/postgres. Here goes the snapshot:

[15:16 r730-01 dvl ~] % sudo zfs snapshot data03/pg02/postgres@before-filesystems-category
[15:17 r730-01 dvl ~] % sudo zfs hold freshports data03/pg02/postgres@before-filesystems-category
[15:18 r730-01 dvl ~] % 

And, because I just learned about zfs hold, I place a hold.

Testing the snapshot

Here, I did a simple table change:

freshports.dvl-category-testing=# select * from portcount;
 count 
-------
    13
(1 row)

freshports.dvl-category-testing=# update portcount set count = 12;
UPDATE 1
freshports.dvl-category-testing=# select * from portcount;
 count 
-------
    12
(1 row)

freshports.dvl-category-testing=# 

That's an unused table. I think it should be deleted.

Next, I shutdown the postgresql service, did the rollback, started postgresql, and did the query again:

freshports.dvl-category-testing=# select * from portcount;
 count 
-------
    13
(1 row)

That's good for me.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

Leave a Comment

Scroll to Top