End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Bucardo and truncate triggers

Version 8.4 of Postgres was recently released. One of the features that hasn't gotten a lot of press, but which I'm excited about, is truncate triggers. This fixes a critical hole in trigger-based PostgreSQL replication systems, and support for these new triggers is now working in the Bucardo replication program.

Truncate triggers were added to Postgres by Simon Riggs (thanks Simon!), and unlike other types of triggers (UPDATE, DELETE, and INSERT), they are statement-level only, as truncate is not a row-level action.

Here's a quick demo showing off the new triggers. This is using the development version of Bucardo - a major new version is expected to be released in the next week or two that will include truncate trigger support and many other things. If you want to try this out for yourself, just run:

$ git clone git-clone http://bucardo.org/bucardo.git/

Bucardo does three types of replication; for this example, we'll be using the 'pushdelta' method, which is your basic "master to slaves" relationship. In addition to the master database (which we'll name A) and the slave database (which we'll name B), we'll create a third database for Bucardo itself.

$ initdb -D bcdata
$ initdb -D testA 
$ initdb -D testB 

(Technically, we are creating three new database clusters, and since we are doing this as the postgres user, the default database for all three will be 'postgres')

Let's give them all unique port numbers:

$ echo port=5400 >> bcdata/postgresql.conf
$ echo port=5401 >> testA/postgresql.conf 
$ echo port=5402 >> testB/postgresql.conf 

Now start them all up:

$ pg_ctl start -D bcdata -l bc.log
$ pg_ctl start -D testA -l A.log
$ pg_ctl start -D testB -l B.log

We'll create a simple test table on both sides:

$ psql -d postgres -p 5401 -c 'CREATE TABLE trtest(id int primary key)'
$ psql -d postgres -p 5402 -c 'CREATE TABLE trtest(id int primary key)'

Before we go any further, let's install Bucardo itself. Bucardo is a Perl daemon that uses a central database to store its configuration information. The first step is to create the Bucardo schema. This, like almost everything else with Bucardo, is done with the 'bucardo_ctl' script. The install process is interactive:

$ bucardo_ctl install --dbport=5400

This will install the bucardo database into an existing Postgres cluster.
Postgres must have been compiled with Perl support,
and you must connect as a superuser

We will create a new superuser named 'bucardo',
and make it the owner of a new database named 'bucardo'

Current connection settings:
1. Host:          
2. Port:          5400
3. User:          postgres
4. PID directory: /var/run/bucardo
Enter a number to change it, P to proceed, or Q to quit: P

Version is: 8.4
Attempting to create and populate the bucardo database and schema
Database creation is complete

Connecting to database 'bucardo' as user 'bucardo'
Updated configuration setting "piddir"
Installation is now complete.

If you see any unexpected errors above, please report them to bucardo-general@bucardo.org

You should probably check over the configuration variables next, by running:
bucardo_ctl show all
Change any setting by using: bucardo_ctl set foo=bar

Because we don't want to tell the bucardo_ctl program our custom port each time we call it, we'll store that info into the ~/.bucardorc file:

$ echo dbport=5400 > ~/.bucardorc

Let's double check that everything went okay by checking the list of databases that Bucardo knows about:

$ bucardo_ctl list db
There are no entries in the 'db' table.

Time to teach Bucardo about our two new databases. The format for the add commands is: bucardo_ctl add [type of thing] [name of thing within the database] [arguments of foo=bar format]

$ bucardo_ctl add database postgres name=master port=5401
Database added: master

$ bucardo_ctl add database postgres name=slave1 port=5402
Database added: slave1

Before we go any further, let's look at our databases:

$ bucardo_ctl list dbs
Database: master   Status: active
Conn: psql -h  -p 5401 -U bucardo -d postgres

Database: slave1   Status: active
Conn: psql -h  -p 5402 -U bucardo -d postgres

Note that by default we connect as the 'bucardo' user. This is a highly recommended practice, for safety and auditing. Since that user obviously does not exist on the newly created databases, we need to add them in:

$ psql -p 5401 -c 'create user bucardo superuser'
$ psql -p 5402 -c 'create user bucardo superuser'

Now we need to teach Bucardo about the tables we want to replicate:

$ bucardo_ctl add table trtest db=master herd=herd1
Created herd "herd1"
Table added: public.trtest

A herd is simply a named connection of tables. Typically, you put tables that are linked together by foreign keys or other logic into a herd so that they all get replicated at the same time.

The final setup step is to create a replication event, which in Bucardo is known as a 'sync':

$ bucardo_ctl add sync willow source=herd1 targetdb=slave1 type=pushdelta
NOTICE:  Starting validate_sync for willow
CONTEXT:  SQL statement "SELECT validate_sync('willow')"
Sync added: willow

This command actually did quite a bit of work behind the scenes, including creating all the supporting schemas, tables, functions, triggers, and indexes that Bucardo will need.

We are now ready to start Bucardo up. Simple enough:

$ bucardo_ctl start
Checking for existing processes
Starting Bucardo

Let's add a row to the master table and make sure it goes to the slave:

$ psql -p 5401 -c 'insert into trtest(id) VALUES (1)'
INSERT 0 1
$ psql -p 5402 -c 'select * from trtest'
 id
----
  1
(1 row)

Looks fine, so let's try out the truncate. On versions of Postgres less than 8.4, there was no way for Bucardo (or Slony) to know that a truncate had been run, so the rows were removed from the master but not from the slave. We'll do a truncate and add a new row in a single operation:

$ psql -p 5401 -c 'begin; truncate table trtest; insert into trtest values (2); commit'
COMMIT
$ psql -p 5402 -c 'select * from trtest'
 id
----
  2
(1 row)

It works! Let's clean up our test environment for good measure:

$ bucardo_ctl stop
$ pg_ctl stop -D bcdata
$ pg_ctl stop -D testA
$ pg_ctl stop -D testB

As mentioned, there are three types of syncs in Bucardo. The other type that can make use of truncate triggers is the 'swap' sync, aka "master to master". I've not yet decided on the behavior for such syncs, but one possibility is simply:

  • Database A gets truncated at time X
  • Bucardo truncates database B, then discards all delta rows older than X for both A and B, and all delta rows for B
  • Everything after X gets processed as normal (conflict resolution, etc.)
  • The same thing for a truncate on database B (truncate A, discard all older rows).

Second proposal:

  • Database A gets truncated at time X
  • We populate the delta table with every primary key in the table before truncation (assuming we can get at it)
  • That's it! Bucardo does its normal thing as if we just deleted a whole bunch of rows on A, and in theory deletes them from B as well.

Comments on this strategy welcome!

Update: Clarified initdb cluster vs. database per comment #1 below, and added new truncation handling scheme for multi-master replication per comment #2.

6 comments:

Anonymous said...

The example is a bit confusing. Where you say you're creating the databases, you use the names testA, testB and bcdata. However, the commands given are not createdb's but initdb's which create database clusters (including a default database named "postgres"). The CREATE TABLE statements presumably operate on the latter.

In the commands that add the databases to Bucardo, e.g.,

$ bucardo_ctl add database postgres name=master port=5401

it's not entirely clear that "postgres" is the real database name (and not a database type) and the "name"s are more like aliases.

An instructive and succinct example, nonetheless.

Jon Jensen said...

That's a good explanation of truncate triggers, Greg.

You mention that your idea on how to handle truncates for swap syncs is "one possibility". Are there others that would be valid? None come to mind for me.

Semantically a truncate is equivalent to removing all rows any other atomic way (is there any other way with Postgres besides DELETE?), so the way you describe seems like the only valid way to handle a truncate. I'd be interested to hear of any others I'm not thinking of.

Greg Sabino Mullane said...

#1: Thanks, I've clarified the initdb part, and explained the bucardo_ctl "add" format a little better.

Jon: actually, the way I laid out won't really work, for as soon as you truncate B, *all* of its delta rows are meaningless. So things are good if we truncate A and Bucardo truncates B shortly afterwards, but a time gap will erase changes made on B that occurred after the truncate on A. The canonical way to do it I suppose would be to record all the current rows and enter them in the delta tables (as if they were deleted). Maybe we need an option to allow it to be done that "safe" way versus the "quick" way of just blowing away B. I'll update the post to reflect the alternate strategy.

Log Buffer said...

"Greg Sabino Mullane was on the 8.4 beat, too, with his piece on Bucardo and truncate triggers. [...]"

Log Buffer #154

Goran said...

Greg,

Re multi-master
Option of expanding the truncate seems the cleanest to me.

Option#2 (idea)
It occurs to me on a quick thought that if second level masters have deltas timestamped then if you would add a timestamped log of truncates on master it would be possible to replicate truncate on second level masters as following (pseudo) SQL:
DELETE FROM table
WHERE rec NOT IN
(SELECT rec FROM bucardo_delta WHERE txntime > truncate.time)
Which should preserve space in master delta and hence amount of data to transfer.

Greg Sabino Mullane said...

Goran: I've considered something like that, but it would be a radical shift from "row-based" comparison to "timestamp-based" comparisons. In other words, right now it does not matter how in sync two databases are with regards to their timeofday() calls. The problem is compounded by the fact that we don't track what type of action was performed (update vs insert), so someone could truncate the table on A, then have it reappear because someone updated a row on B before Bucardo came over and replicated the truncation. On the other hand, truncation is such a severe and (presumably) rare thing to do for MM replication, I'm willing to cut any solution some slack. :) I'm thinking overall a before truncate trigger that populates bucardo_delta with all current rows might be expensive with regards to the amount of stored rows, but safest overall: in the above scenario, the 'delete' on A would trump the update on B, and the row would be removed from B, rather than added back to A.