News

Welcome to End Point’s blog

Ongoing observations by End Point people

Bucardo replication trigger enabling

Bucardo is one of the trigger-based replication systems for Postgres (others include Slony and Londiste). All of these not only use triggers to gather information on what has changed, but they also disable triggers when copying things to remote databases. They do this to ensure that only the data itself gets copied, in as fast as manner as possible. This also has the effect of disabling foreign keys, which Postgres implements by use of triggers on the underlying tables. There are times, however, when you need a trigger on a target to fire (such as data masking). Here are four approaches to working around the disabling of triggers. The first two solutions will work with any replication system, but the third and fourth are specific to Bucardo.

First, let's understand how the triggers get disabled. A long time ago (Postgres 8.2 and older), triggers had to be disabled by direct changes to the system catalogs. Luckily, those days are over, and now this is done by issuing this command before copying any data:

SET session_replication_role = 'replica';

This prevents all normal triggers and rules from being activated. There are times, however, when you want certain triggers (or their effects) to execute during replication.

Let's use a simple hypothetical to illustrate all of these solutions. We will start with the Postgres built-in pgbench utility, The initialize option (-i) can be used to create and populate some tables:

$ createdb btest1
$ pgbench -i btest1
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.16 s, remaining 0.00 s).
vacuum...
set primary keys...
done.

We want to replicate all four of the tables pgbench just created. Bucardo requires that a table have a primary key or a unique index to be replicated, so we will need to make an immediate adjustment to the pgbench_history table:

$ psql btest1 -c 'ALTER TABLE pgbench_history ADD hid SERIAL PRIMARY KEY'
ALTER TABLE

Now to make things a little more interesting. Let's add a new column to the pgbench_accounts table named "phone", which will hold the account owner's phone number. As this is confidential information, we do not want it to be available - except on the source database! For this example, database btest1 will be the source, and database btest2 will be the target.

$ psql btest1 -c 'ALTER TABLE pgbench_accounts ADD phone TEXT'
ALTER TABLE
$ createdb btest2 --template=btest1

To prevent the phone number from being revealed to anyone querying btest2, a trigger and supporting function is used to change the phone number to always display the word 'private'. Here is what they look like.

btest2=# CREATE OR REPLACE FUNCTION elide_phone()
  RETURNS TRIGGER
  LANGUAGE plpgsql
  AS $bc$
BEGIN
  NEW.phone = 'private';
  RETURN NEW;
END;
$bc$;
CREATE FUNCTION

btest2=# CREATE TRIGGER elide_phone
  BEFORE INSERT OR UPDATE
  ON pgbench_accounts
  FOR EACH ROW
  EXECUTE PROCEDURE elide_phone();
CREATE TRIGGER

Now that everything is setup, we can install Bucardo and teach it how to replicate those tables:

$ bucardo install 
This will install the bucardo database into ...
...
Installation is now complete.

$ bucardo add db A,B dbname=btest1,btest2
Added databases "A","B"

$ bucardo add sync pgb dbs=A,B tables=all
Added sync "pgb"
Created a new relgroup named "pgb"
Created a new dbgroup named "pgb"
  Added table "public.pgbench_accounts"
  Added table "public.pgbench_branches"
  Added table "public.pgbench_history"
  Added table "public.pgbench_tellers"

$ bucardo start

A demonstration of the new trigger is now in order. On the database btest2, we will update a few rows and attempt to set the phone number. However, our new trigger will overwrite our changes:

$ psql btest2 -c "update pgbench_accounts set abalance=123, phone='867-5309' where aid <= 3"
UPDATE 3

$ psql btest2 -c 'select aid,abalance,phone from pgbench_accounts order by aid limit 3'
 aid | abalance |  phone  
-----+----------+---------
   1 |      123 | private
   2 |      123 | private
   3 |      123 | private

So, all is as we expected: any changes made to this table have the phone number changed. Let's see what happens when the changes are done via Bucardo replication. Note that we are updating btest1 but querying btest2:

$ psql btest1 -c "update pgbench_accounts set abalance=99, phone='867-5309' WHERE aid <= 3"
UPDATE 3

$ psql btest2 -c 'select aid,abalance,phone from pgbench_accounts order by aid limit 3'
 aid | abalance |  phone   
-----+----------+----------
   1 |       99 | 867-5309
   2 |       99 | 867-5309
   3 |       99 | 867-5309

As you can see, our privacy safeguard is gone, as Bucardo disables the trigger on btest2 before making the changes. So what can we do? There are four solutions: set the trigger as ALWAYS, set the trigger as REPLICA, use Bucardo's customcode feature, or use Bucardo's customcols feature.

Solution one: ALWAYS trigger

The easiest way is to simply mark the trigger as ALWAYS, which means that it will always fire, regardless of what session_replication_role is set to. This is the best solution for most problems of this sort. Changing the trigger requires an ALTER TABLE command. Once done, psql will show you the new state of the trigger as well:

btest2=# \d pgbench_accounts
   Table "public.pgbench_accounts"
  Column  |     Type      | Modifiers 
----------+---------------+-----------
 aid      | integer       | not null
 bid      | integer       | 
 abalance | integer       | 
 filler   | character(84) | 
 phone    | text          | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Triggers:
    elide_phone BEFORE INSERT OR UPDATE ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE elide_phone()

btest2=# ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER elide_phone;
ALTER TABLE

btest2=# \d pgbench_accounts
   Table "public.pgbench_accounts"
  Column  |     Type      | Modifiers 
----------+---------------+-----------
 aid      | integer       | not null
 bid      | integer       | 
 abalance | integer       | 
 filler   | character(84) | 
 phone    | text          | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Triggers firing always:
    elide_phone BEFORE INSERT OR UPDATE ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE elide_phone()

That is some ugly syntax for changing the triggers, eh? (To restore a trigger to its default state, you would simply leave out the ALWAYS clause, so it becomes ALTER TABLE pgbench_accounts ENABLE TRIGGER elide_phone). Time to verify that the ALWAYS trigger fires even when Bucardo is updating the table:

$ psql btest1 -c "update pgbench_accounts set abalance=11, phone='555-2368' WHERE aid <= 3"
UPDATE 3

$ psql btest2 -c 'select aid,abalance,phone from pgbench_accounts order by aid limit 3'
 aid | abalance |  phone   
-----+----------+----------
   1 |       11 | private
   2 |       11 | private
   3 |       11 | private

Solution two: REPLICA trigger

Trigger-based replication solutions, you may recall from above, issue this command: SET session_replication_role = 'replica'. What this means is that all rules and triggers that are not of type replica are skipped (with the exception of always triggers of course). Thus, another solution is to set the triggers you want to fire to be of type "replica". Once you do this, however, the triggers will NOT fire in ordinary use - so be careful. Let's see it in action:

btest2=# ALTER TABLE pgbench_accounts ENABLE REPLICA TRIGGER elide_phone;
ALTER TABLE

btest2=# \d pgbench_accounts
   Table "public.pgbench_accounts"
  Column  |     Type      | Modifiers 
----------+---------------+-----------
 aid      | integer       | not null
 bid      | integer       | 
 abalance | integer       | 
 filler   | character(84) | 
 phone    | text          | 
Indexes:
    "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Triggers firing on replica only:
    elide_phone BEFORE INSERT OR UPDATE ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE elide_phone()

As before, we can test it out and verify the trigger is firing:

$ psql btest1 -c "update pgbench_accounts set abalance=22, phone='664-7665' WHERE aid <= 3"
UPDATE 3

$ psql btest2 -c 'select aid,abalance,phone from pgbench_accounts order by aid limit 3'
 aid | abalance |  phone   
-----+----------+----------
   1 |       22 | private
   2 |       22 | private
   3 |       22 | private

Solution three: Bucardo customcode

Bucardo supports a number of hooks into the replication process. These are called "customcodes" and consist of Perl code that is invoked by Bucardo. To solve the problem at hand, we will create some code for the "code_before_trigger_enable" hook - in other words, right after the actual data copying is performed. To create the customcode, we write the actual code to a text file, then do this:

$ bucardo add code nophone whenrun=before_trigger_enable sync=pgb src_code=./nophone.pl

This creates a new customcode named "nophone" that contains the code inside the local file "nophone.pl". It runs after the replication, but before the triggers are re-enabled. It is associated with the sync named "pgb". The content of the file looks like this:

my $info = shift;

return if ! exists $info->{rows};

my $schema = 'public';
my $table = 'pgbench_accounts';
my $rows = $info->{rows};
if (exists $rows->{$schema} and exists $rows->{$schema}{$table}) {
  my $dbh = $info->{dbh}{B};
  my $SQL = "UPDATE $schema.$table SET phone=? "
    . "WHERE aid = ? AND phone <> ?";
  my $sth = $dbh->prepare($SQL);
  my $string = 'private';
  for my $pk (keys %{ $rows->{$schema}{$table} }) {
    $sth->execute($string, $pk, $string);
  }
}
return;

Every customcode is passed a hashref of information from Bucardo. One of the things passed in a list of changed rows. At the top, we see that we exit right away (via return, as the customcodes become Perl subroutines) if there are no rows this round. Then we check that something has changed for the pgbench_accounts tables. We grab the database handle, also passed to the subroutine. Note that this is actually a DBIx::Safe handle, not a direct DBI handle. The difference is that certain operations, such as commit, are not allowed.

Once we have the handle, we walk through all the rows that have changed, and set the phone to something safe. The above code is a good approach, but we can make the UPDATE much smarter because we are using a modern Postgres which supports ANY, and a modern DBD::Pg that supports passing Perl arrays in and out. Once we combine those two, we can move the execute() out of the loop into a single call like so:

...
if (exists $rows->{$schema} and exists $rows->{$schema}{$table}) {
  my $dbh = $info->{dbh}{B};
  my $SQL = "UPDATE $schema.$table SET phone=?"
    . "WHERE aid = ANY(?) AND phone <> ?";
  my $sth = $dbh->prepare($SQL);
  my $string = 'private';
  $sth->execute($string, [ keys %{ $rows->{$schema}{$table} } ], $string);
}

Note that this solution requires Bucardo version 5.3.0 or better. Let's verify it:

$ psql btest1 -c "update pgbench_accounts set abalance=33, phone='588-2300' WHERE aid <= 3"
UPDATE 3

$ psql btest2 -c 'select aid,abalance,phone from pgbench_accounts order by aid limit 3'
 aid | abalance |  phone   
-----+----------+----------
   1 |       33 | private
   2 |       33 | private
   3 |       33 | private

Solution four: Bucardo customcols

The final way to keep the information in that column masked is to use Bucardo's 'customcols' feature. This allows rewriting of the command that grabs rows from the source databases. Bucardo uses COPY to grab rows from a source, DELETE to remove the rows if they exist on the target, and another COPY to add the rows to the target tables. Postgres supports adding a SELECT clause to a COPY command, as we will see below. To hide the values of the phone column using the customcols feature, we simply do:

$ bucardo add customcols public.pgbench_accounts "select aid,bid,abalance,filler,'private' as phone" db=B sync=pgb
New columns for public.pgbench_accounts: "select aid,bid,abalance,filler,'private' as phone" (for database B) (for sync pgb)

The list of columns must be the same as in the original table, but we can modify things! So rather than Bucardo doing this:

COPY (SELECT * FROM public.pgbench_accounts WHERE aid IN (1,2,3)) TO STDOUT

Bucardo will instead do this thanks to our customcols:

COPY (SELECT aid,bid,abalance,filler,'private' as phone FROM public.pgbench_accounts WHERE aid IN (1,2,3)) TO STDOUT

Let's verify it:

$ psql btest1 -c "update pgbench_accounts set abalance=44, phone='736-5000' WHERE aid <= 3"
UPDATE 3

$ psql btest2 -c 'select aid,abalance,phone from pgbench_accounts order by aid limit 3'
 aid | abalance |  phone   
-----+----------+----------
   1 |       44 | private
   2 |       44 | private
   3 |       44 | private

Those are the four approaches to firing (or emulating) triggers when using replication. Which one you choose depends on what exactly your trigger does, but overall, the best solution is probably the 'trigger ALWAYS', followed by 'Bucardo customcols'. If you have another solution, or some problem that is not covered by the above, please let me know in the comments.

2014 hack.summit() wrap-up #hacksummit

Seeing the proposed line-up for the 2014 hack.summit() virtual conference was the grown-up equivalent of seeing the line-up for some of the first Lollapalooza events. It was definitely an "All those people I want to see, and all in one place? *head asplode*" moments.

So, what is this conference with the incredibly nerdy name? In short, it's a selection of industry leading speakers presenting all on-line and streamed live. The "registration fee" was actually a choice between mentioning the conference on a few social media platforms, or making a donation to one of a number of programming non-profits. Seeing as I don't tweet, I made a donation, then signed in (using OAuth) via my Google+ account. It was a delightfully frictionless process.

The hack.summit() conference ran December 1st through December 4th, but I was only able to "attend" the last two days. Luckily for me, all of the live-streamed presentations are also available afterwards on the hacksummit site. They feel a little hidden away in the small menu in the upper left corner, but they're all there, available as YouTube videos.

So, why is was hack.summit() worth your time? It's got an amazing collection of very accomplished developers, thought leaders and experienced big cheeses of some companies that do some pretty impressive work. During the live event, the Crowdcast platform provided a great delivery mechanism for the streaming videos, as well as admin-created polls, a light-weight chat feature, and audience-voted questions for the presenters. Hack.summit() founder, Ed Roman, did a great job MC-ing the entire event, too. (And to whoever figured out how to game the voting system at a conference named hack.summit(), well played you rogue.)

In closing, I strongly recommend you do a few things: Go sign up right now to gain access to the presentation videos. Commit some time (make a deal with yourself, get approval to do a group viewing at work, whatever) to watch as many presentations as you can. Lastly, set a calendar reminder to keep an eye out for the hack.summit() 2015 that will hopefully happen.

Ecommerce in the Django World

Mezzanine (http://mezzanine.jupo.org/) is a powerful piece of software written for the Django Framework in Python that functions like a structured content management system similar to Drupal, WordPress, and others. Built on top of this CMS structure is a module which adds features for Ecommerce which is known as Cartridge (http://cartridge.jupo.org/).

Installing Cartridge/Mezzanine

Installing the Mezzanine CMS system from scratch can be accomplished in two methods, either by using the pip python package manager, or you can clone the current source from the git repository from the software maintainers. If you were planning to modify either Mezzanine or Cartridge to customize the setup for your own needs the latter would likely be preferable as you could then easily begin creating custom branches off the original source to track the customization work. However for this example I will show the pip method of installation:

pip install -U cartridge
Once installed, there is a mezzanine-project command which will allow you to create a new blank Mezzanine environment within a new directory, and in this case we will also send an option to instruct the command to install the Cartridge module as well.
mezzanine-project -a cartridge new_cartridge_project
At this point you will have a blank Mezzanine environment with the cartridge module installed, now the Django database must be populated with the model information for the application, and then the Django application server will be started.
cd new_cartridge_project
python manage.py createdb
python manage.py runserver
If all went well, you should see the startup messages for the Django application server which will list the version numbers of the various libraries it will use, and then should be up and listening on the loopback interface. At this point to complete the setup you would just need to point your httpd at this loopback socket to send connections there.

Cartridge Product Models

Products within Cartridge are defined within the python models for the application, there are three primary models to be concerned with in a Product definition, the Product, ProductVariation and ProductImage models specifically.

  • Product - Defines the primary attributes for a product like name, price, SKU, and can be populated with optional fields for sale prices, etc.
  • ProductVariation - Defines a variant of a product SKU, these would be most commonly used for things like product sizes and colors.
  • ProductImage - Defines the image for the picture of each product.
Each of these three models exist within Django as their own data sets, but then they have foreign key references from each Product to all of its Variants and Images. While you can alter these model definitions using the django shell and other methods within python, you can also update and manage the products within the admin interface for the django site.

Adding Products / Tracking Orders

Adding products through the admin interface for Django was relatively easy and quick:


Within this interface you would provide the details for the products name, published status, date range for the product to be on the site, and product description in addition to any Variants or Images for the product. This is where the Mezzanine foundation of Cartridge start to show through, where this process mirrors the creating of a content page within Mezzanine, but has added these attributes of the product to the definition.

In the same way, Product Variants, Discount Codes, and Sales can be created in much the same way within the admin interface. In this way, once you had a basic Cartridge setup in place on your server, within most use cases for setting up a simple web store, Cartridge would remove the need to do any further hacking of python code, and would allow any user familiar with a CMS workflow for creating and managing the objects within the environment to manage the store.

In addition, Cartridge provides order management within the admin interface as well:


From this interface you can update orders, create new orders, print the PDF invoices for the orders and track whether orders have been fulfilled and shipped to the customer.

Conclusions

Like other Ecommerce applications built on top of a Content Management System, Cartridge and Mezzanine work well together to create a stack that allows for both ease of getting started with the system, and a massive degree of customization possible. Each component of Cartridge is defined within Django, and can be rewritten and customized to serve any number of use cases. The setup process for Cartridge was pretty straightforward, but one recommendation I can make is that once you have a working system up, you should place all of the files for Mezzanine and Cartridge into a source code management system like Git. Like other web applications of this type, the complexity of each change to the system can create a possible point of failure. Once you start making changes to the Mezzanine configuration files and the python for the Models and Views for Cartridge, controlling those changes becomes critical. In this way Mezzanine and Cartridge are spanning the normal continuum of software with the polar extremes of ease of use and customization by providing a good example to start with in the setup, and also exposing the full range of customization to the developer.

Easier IE Site Testing With RemoteIE

Microsoft recently announced a new service which I'm finding very useful. RemoteIE lets you test your sites with IE (currently version 11) on Windows 10 Technical Preview. The service runs in Azure RemoteApp which is available for several clients including Android, iOS and Windows Phone. What's great about this is that you do not have to install and maintain your own virtual machine with VirtualBox or VMWare.

RemoteIE

To use RemoteIE you'll need a valid Microsoft account — it's easy to sign up if you don't already one. Once you have an account and have downloaded & installed the Azure RemoteApp client of your choice it's just a matter of starting it up and logging in. Happy Testing!