This work from August 2022 was interesting at the time.
My database server gets slow from time to time. It seems to happen repeatedly. I’m not sure why, but inserting large numbers of rows takes a very long time. This is a typical use-case for Bacula when recording what was backed up.
I started with this, the duplication of the batch table created when Bacula is storing attribute for later insertion into the table.
baculatest=# begin; BEGIN baculatest=# drop table my_batch; DROP TABLE baculatest=# explain analyse baculatest-# CREATE TABLE my_batch AS baculatest-# SELECT F.FileIndex, 1 AS JobId, P.path, FN.name, F.LStat, F.MD5, F.DeltaSeq baculatest-# FROM file F baculatest-# JOIN Path P ON (F.pathid = P.pathid) baculatest-# JOIN Filename FN ON (F.filenameid = FN.filenameid) baculatest-# WHERE F.jobid = 219920; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Join (cost=422996.35..631574.90 rows=125958 width=235) (actual time=94826.901..95260.394 rows=4749 loops=1) Hash Cond: (f.filenameid = fn.filenameid) -> Hash Join (cost=149927.11..355514.16 rows=125958 width=200) (actual time=2090.849..2519.090 rows=4749 loops=1) Hash Cond: (f.pathid = p.pathid) -> Index Scan using file_jobid_idx on file f (cost=0.57..202911.02 rows=125958 width=101) (actual time=319.150..744.433 rows=4749 loops=1) Index Cond: (jobid = 219920) -> Hash (cost=102547.35..102547.35 rows=3790335 width=107) (actual time=1770.288..1770.288 rows=3790435 loops=1) Buckets: 524288 Batches: 1 Memory Usage: 519843kB -> Seq Scan on path p (cost=0.00..102547.35 rows=3790335 width=107) (actual time=0.009..548.173 rows=3790435 loops=1) -> Hash (cost=165419.66..165419.66 rows=8611966 width=43) (actual time=92733.117..92733.117 rows=8614267 loops=1) Buckets: 1048576 Batches: 1 Memory Usage: 651816kB -> Seq Scan on filename fn (cost=0.00..165419.66 rows=8611966 width=43) (actual time=0.007..89487.772 rows=8614267 loops=1) Planning time: 0.400 ms Execution time: 95735.278 ms (14 rows) baculatest=#
From here, I tried a test insert:
baculatest=# begin; BEGIN baculatest=# explain analyse baculatest-# INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) baculatest-# SELECT B.FileIndex, B.JobId, P.PathId, FN.FilenameId, B.LStat, B.MD5, B.DeltaSeq baculatest-# FROM my_batch B baculatest-# JOIN Path P ON (B.Path = P.Path) baculatest-# JOIN Filename FN ON (B.Name = FN.Name); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Insert on file (cost=422995.77..451240.89 rows=360576 width=82) (actual time=2264425.067..2264425.067 rows=0 loops=1) -> Hash Join (cost=422995.77..451240.89 rows=360576 width=82) (actual time=4736.298..97566.308 rows=264825 loops=1) Hash Cond: (b.name = fn.name) -> Hash Join (cost=149926.54..168706.54 rows=360576 width=110) (actual time=1573.740..93180.433 rows=264825 loops=1) Hash Cond: (b.path = p.path) -> Seq Scan on my_batch b (cost=0.00..11117.76 rows=360576 width=138) (actual time=16.278..91044.256 rows=264825 loops=1) -> Hash (cost=102547.35..102547.35 rows=3790335 width=107) (actual time=1556.063..1556.063 rows=3790435 loops=1) Buckets: 524288 Batches: 1 Memory Usage: 514378kB -> Seq Scan on path p (cost=0.00..102547.35 rows=3790335 width=107) (actual time=0.005..381.064 rows=3790435 loops=1) -> Hash (cost=165419.66..165419.66 rows=8611966 width=43) (actual time=3144.386..3144.386 rows=8614267 loops=1) Buckets: 1048576 Batches: 1 Memory Usage: 635819kB -> Seq Scan on filename fn (cost=0.00..165419.66 rows=8611966 width=43) (actual time=0.007..786.008 rows=8614267 loops=1) Planning time: 0.257 ms Execution time: 2264502.071 ms (14 rows) baculatest=#
That took a very long time. The explain analyze details are over here.
Let’s try some stuff.
baculatest=# delete from file where jobid = 1; DELETE 264825 baculatest=# analyze file; ANALYZE baculatest=# analyze path; ANALYZE baculatest=# select count(*) from path; count --------- 3790435 (1 row) baculatest=# select count(*) from file; count ----------- 369890493 (1 row)
Yes, that’s 370 million files with 3.7 million paths.
Let’s try another insert:
baculatest=# begin; BEGIN baculatest=# explain analyse baculatest-# INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) baculatest-# SELECT B.FileIndex, B.JobId, P.PathId, FN.FilenameId, B.LStat, B.MD5, B.DeltaSeq baculatest-# FROM my_batch B baculatest-# JOIN Path P ON (B.Path = P.Path) baculatest-# JOIN Filename FN ON (B.Name = FN.Name); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Insert on file (cost=423005.04..445744.48 rows=264825 width=96) (actual time=1325888.705..1325888.705 rows=0 loops=1) -> Hash Join (cost=423005.04..445744.48 rows=264825 width=96) (actual time=4858.415..74111.454 rows=264825 loops=1) Hash Cond: (b.name = fn.name) -> Hash Join (cost=149935.81..165723.59 rows=264825 width=118) (actual time=1709.446..69526.403 rows=264825 loops=1) Hash Cond: (b.path = p.path) -> Seq Scan on my_batch b (cost=0.00..10160.25 rows=264825 width=195) (actual time=136.491..67382.356 rows=264825 loops=1) -> Hash (cost=102551.47..102551.47 rows=3790747 width=107) (actual time=1571.442..1571.442 rows=3790435 loops=1) Buckets: 524288 Batches: 1 Memory Usage: 514378kB -> Seq Scan on path p (cost=0.00..102551.47 rows=3790747 width=107) (actual time=0.008..379.669 rows=3790435 loops=1) -> Hash (cost=165419.66..165419.66 rows=8611966 width=43) (actual time=3145.986..3145.986 rows=8614267 loops=1) Buckets: 1048576 Batches: 1 Memory Usage: 635819kB -> Seq Scan on filename fn (cost=0.00..165419.66 rows=8611966 width=43) (actual time=0.010..782.871 rows=8614267 loops=1) Planning time: 475.820 ms Execution time: 1325969.915 ms (14 rows)