Jun 102013
 

I attended a few of the talks at PGCon 2013 last month. One talk, for which I took several notes and made a few choice tweets. The main one I’m following up on is using the -Fc option on pg_dump. It was during Magnus Haganders’ talk on PostgreSQL Backup Strategies that I posted that tweet. In the two weeks since that talk, I managed to do some testing. Let’s have a look at that.

What’s this -Fc option anyway?

The -F option on pg_dump allows you to specify the format of the output. The default output is plain-text. That’s what you get when you do this:

pg_dump bacula > bacula.sql

and is functionally equivalent to doing:

pg_dump -Fp bacula > bacula.sql

There are other formats (directory, tar), but today I’m concerned with the custom (c) option. By default, this output is also compressed (see the -Z option). What is of most value is the ability to use pg_restore with the custom output.

What’s the big deal about custom format?

The more interesting option on pg_restore was -j. This allows you to run multiple concurrent jobs for the loading of data and building of indexes. This can greatly reduce the amount of time needed to restore a large database if your server has multiple processors.

pg_restore can be much more useful than just doing:

psql bacula < bacula.sql

With pg_restore you can be very selective about what you restore. See the -L and -l options.

How much does pg_dump compress?

I was curious. I wanted to compare the various pg_dump times for each level of compression.

I did a series of test such as this:

$ time pg_dump bacula > bacula.sql

real    4m22.654s
user    0m17.080s
sys     0m7.766s

In the following table, I'm using the 'real' value shown above. In addition, I'm also showing the time it took to restore the dump via 'pg_store -d testing'.

command dump time file size restore time
pg_dump 4m22.654s 13G 34m56.354s
pg_dump -Fc 11m12.443s 4.6G 34m27.218s
pg_dump -Z0 -Fc 4m18.742s 14G 36m8.156s
pg_dump -Z1 -Fc 5m0.498s 5.4G 36m26.047s
pg_dump -Z2 -Fc 5m24.376s 5.2G 36m11.556s
pg_dump -Z3 -Fc 6m20.513s 5.1G 36m12.259s
pg_dump -Z4 -Fc 6m46.074s 4.9G 36m31.050s
pg_dump -Z5 -Fc 8m22.397s 4.7G 34m53.817s
pg_dump -Z6 -Fc 11m18.154s 4.6G 35m54.344s
pg_dump -Z7 -Fc 14m21.447s 4.5G 35m14.134s
pg_dump -Z8 -Fc 25m15.000s 4.5G (45MB smaller) 35m22.793s
pg_dump -Z9 -Fc 26m40.550s 4.5G (1.8MB smaller) 35m11.992s

Conclusions

Looking at the above, the default compression (i.e. without specifying -Z) is about the same as -Z6, but takes only 4 minutes versus 11 minutes.

I will be using just plain -Fc, and I will not be using the -Z option.

pg_restore times

As you can see from the above times, compression level has very little performance hit on pg_restore.

What about -j?

Let's try -j with pg_restore:

$ time pg_restore -j 8 -d testing < bacula.dump.11m12.443s
pg_restore: [custom archiver] parallel restore from standard input is not supported

real    0m0.073s
user    0m0.000s
sys     0m0.000s

Oh, OK, let's go with this format:

$ time pg_restore -j 8 -d testing bacula.dump.11m12.443s

The original test for this took about 34.5 minutes. I won't consider -j 8 to be much of a savings unless it's under 32 minutes. Anything less just might be noise.

*insert pause here*

Wow! 17 minutes! That's a huge performance increase! I am impressed. I will be using -j every time now.

Disclosure

I'm not just a PGCon attendee. I founded the conference and I'm the organizer.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

  4 Responses to “Using compression with PostgreSQL’s pg_dump”

  1. > Looking at the above, the default compression (i.e. without specifying -Z) is about the same as -Z6, but takes only 4 minutes versus 11 minutes.

    I think you misread your own table:

    pg_dump -Fc 11m12.443s 4.6G 34m27.218s
    pg_dump -Z6 -Fc 11m18.154s 4.6G 35m54.344s

    gzip (and many other zlib-using programs) simply default to compression level six. That’s why the numbers were so close, I believe.

    • From the bottom of this page of the gzip manual:

      `–fast’
      `–best’
      `-n’

      Regulate the speed of compression using the specified digit n, where `-1′ or `–fast’ indicates the fastest compression method (less compression) and `–best’ or `-9′ indicates the slowest compression method (optimal compression). The default compression level is `-6′ (that is, biased towards high compression at expense of speed).

    • Yes. I agree. I think I was looking at plain pg_dump (no options).

  2. If PostgreSQL was built on a system with the zlib compression library installed, the custom dump format will compress data as it writes it to the output file. This will produce dump file sizes similar to using gzip, but it has the added advantage that tables can be restored selectively.
    Source: Official PostgreSQL Site