News

Welcome to End Point’s blog

Ongoing observations by End Point people

Listen/Notify improvements in PostgreSQL 9.0

Improved listen/notify is one of the new features of Postgres 9.0 I've been waiting for a long time. There are basically two major changes: everything is in shared memory instead of using system tables, and full support for "payload" messages is enabled.

Before I demonstrate the changes, here's a review of what exactly the listen/notify system in Postgres is. Basically, it is an inter-process signalling system, which uses the pg_listener system table to coordinate simple named events between processes. One or more clients connects to the database and issues a command such as:

LISTEN foobar;

The name foobar can be replaced by any valid name; usually the name is something that gives a contextual clue to the listening process, such as the name of a table. Another client (or even one of the original ones) will then issue a notification like so:

NOTIFY foobar;

Each client that is listening for the 'foobar' message will receive a notification that the sender has issued the NOTIFY. It also receives the PID of the sending process. Multiple notifications are collapsed into a single notice, and the notification is not sent until a transaction is committed.

Here's some sample code using DBD::Pg that demonstrates how the system works:

#!/usr/bin/env perl
# -*-mode:cperl; indent-tabs-mode: nil-*-

use strict;
use warnings;
use DBI;

my $dsn = 'dbi:Pg:dbname=test';
my $dbh1 = DBI->connect($dsn,'test','', {AutoCommit=>0,RaiseError=>1,PrintError=>0});
my $dbh2 = DBI->connect($dsn,'test','', {AutoCommit=>0,RaiseError=>1,PrintError=>0});

print "Postgres version is $dbh1->{pg_server_version}\n";

my $SQL = 'SELECT pg_backend_pid(), version()';
my $pid1 = $dbh1->selectall_arrayref($SQL)->[0][0];
my $pid2 = $dbh2->selectall_arrayref($SQL)->[0][0];
print "Process one has a PID of $pid1\n";
print "Process two has a PID of $pid2\n";

## Process one listens for a notice named "jtx"
$dbh1->do(q{LISTEN jtx});
$dbh1->commit();
## Process one checks for any notices received
print show_notices($dbh1);

## Process two sends a notice, but does not commit
$dbh2->do(q{NOTIFY jtx});
## Process one does not see the notice yet
print show_notices($dbh1);
## Process two sends the same notice again, then commits
$dbh2->do(q{NOTIFY jtx});
$dbh2->commit();

sleep 1; ## Ensure the notice has time to get to propogate
## Process two receives a single notice from process one
print show_notices($dbh1);

## Now that it has seen the notice, it reports nothing again:
print show_notices($dbh1);

sub show_notices { ## Function to return any notices received
       my $dbh = shift;
       my $messages = '';
       $dbh->commit();
       while (my $n = $dbh->func('pg_notifies')) {
          $messages .= "Got notice '$n->[0]' from PID $n->[1]\n";
       }
       return $messages || "No messages\n";
}

The output of the above script on a 8.4 Postgres server is:

Postgres version is 80401
Process one has a PID of 18238
Process two has a PID of 18239
No messages
No messages
Got notice 'jtx' from PID 18239
No messages

As expected, we got a notification only after the other process committed.

Note that because this is asychronous and involves the system tables, we added a sleep call to ensure that the notice had time to propagate so that the other processes will see it. Without the sleep, we usually see four "No messages" appear, as the script goes too fast for the pg_listener table to catch up.

Now for the aforementioned payloads. Payloads allow an arbitrary string to be attached to the notification, such that you can have a standard name like before, but you can also attach some specific text that the other processes can see. I added support for payloads to DBD::Pg back in June 2008, so let's modify the script a little bit to demonstrate the new payload mechanism:

...
## Process two sends two notices, but does not commit
$dbh2->do(q{NOTIFY jtx, 'square'});
$dbh2->do(q{NOTIFY jtx, 'square'});
## Process one does not see the notice yet
print show_notices($dbh1);
## Process two sends the same notice again, then commits
$dbh2->do(q{NOTIFY jtx, 'triangle'});
$dbh2->commit();
...
 ## This part changes: we get an extra item from our array:
 $messages .= "Got notice '$n->[0]' from PID $n->[1] message is '$n->[2]'\n";
...

Here's what the output looks like under version 9.0 of Postgres:

Postgres version is 90000
Process one has a PID of 19089
Process two has a PID of 19090
No messages
No messages
Got notice 'jtx' from PID 19090 message is 'square'
Got notice 'jtx' from PID 19090 message is 'triangle'
No messages

Note that the collapsing of identical messages into a single notification now takes into account the message as well, so we received two notifications in the above example for the three total notifications sent. To add a payload, we simply say NOTIFY, then the name of the notification, add a comma, and specify a payload as a quoted string. Of course, the payload string is still completely optional. If no payload is specified, DBD::Pg will simply treat the payload as an empty string (this is also the behavior when you request the payload using DBD::Pg against a pre-9.0 server, so all combinations should be 100% backwards compatible).

We also got rid of the sleep. Because we are now using shared memory instead of system tables, there is no lag whatsoever, and the other process can see the notices right away.

Another large advantage to removing the pg_listener table is that systems that make heavy use of it (such as the replication systems Bucardo and Slony) no longer have to worry about bloat in these tables.

The use of payloads also means that many application can be greatly simplified: in the past, one had to be creative in the name of your notifications in order to pass meta-information to your listener. For example, Bucardo uses a large collection of notifications, meaning that the Bucardo processes had to do the equivalent of things like this:

$dbh->do(q{LISTEN bucardo_reload_config});
$dbh->do(q{LISTEN bucardo_log_message});
$dbh->do(q{LISTEN bucardo_activate_sync_$sync});
$dbh->do(q{LISTEN bucardo_deactivate_sync_$sync});
$dbh->do(q{LISTEN bucardo_kick_sync_$sync});
...
while (my $notice = $dbh->func('pg_notifies')) {
 my ($name, $pid) = @$notice;
 if ($name eq 'bucardo_reload_config') {
 ...
 }
 elsif ($name =~ /bucardo_kick_sync_(.+)/) {
 ...
 }
...
}

We can instead do things like this:

$dbh->do(q{LISTEN bucardo});
...
while (my $notice = $dbh->func('pg_notifies')) {
 my ($name, $pid, $msg) = @$notice;
 if ($msg eq 'bucardo_reload_config') {
 ...
 }
 elsif ($msg =~ /bucardo_kick_sync_(.+)/) {
 ...
 }
...
}

I hope to add this support to Bucardo shortly; it's simply a matter of refactoring all the listen and notify calls into a function that does the right thing depending on the server version it is attached to.

5 comments:

Jon Jensen said...

Yes, these improvements are excellent.

When I first started using LISTEN and NOTIFY I was surprised to find they used normal identifiers as names instead of something more flexible and exact such as a string. But this new solution is arguably better than just using string channel names, since it encourages that the names be kept simple and any serious detail go into the payloads, and it's also backward-compatible.

Moving to shared memory seems like a win too. Although I wonder, are there any limits to the amount of shared memory allowed to be used up by LISTEN? I see in the NOTIFY docs that there's an 8000-byte payload limit per notify event, and an 8 GB queue size limit for all notifies.

Since many systems run with far less than 8 GB shared memory, does it still queue everything beyond a certain size out to the pg_listener system table? Is it limited to only part of shared memory? Can NOTIFY be used to tie up all the Postgres shared memory and (temporarily) break the server?

Greg Sabino Mullane said...

Jon: the pg_listener table is completely gone. There was a lot of discussion about what to do when the queue is full. From the original patch description: In a few words, the patch reimplements listen/notify as an slru-based queue which works similar to the sinval structure. Essentially it is a ring buffer on disk with pages mapped into shared memory for read/write access.. The long but interesting thread can be found here. I think the final consensus was that the caller (notifier) would eventually throw an exception on a full queue, but it's a pretty severe corner case.

Jon Jensen said...

Thanks for the pointer to the discussion, Greg. That clears up the edge case handling for me.

Jeff Boes said...

Note that you can use any valid Pg identifier as the LISTEN/NOTIFY event name:

LISTEN "to elevator music?";
NOTIFY "The President immediately";

This capability was in place in 7.x, if I recall correctly.

Jon Jensen said...

Jeff, yes, that's always been there, but along with it come the problems of identifiers, including the length limitation which has affected Bucardo, quoting vs. nonquoted case-flattening, and IIRC, potential trouble when using SQL reserved words.

I think the new payload option works around those problems nicely.