News

Welcome to End Point’s blog

Ongoing observations by End Point people

Garbage collection in your head, or how to vacation

Recently I returned from the longest (8 workdays) vacation I have ever taken from this job (almost 11 years). I made an interesting discovery which I'm happy to share with you:

Life goes on without you.

I spent most of the time aboard a small cruise ship touring Alaska's Inside Passage and Glacier Bay. During almost all of that, I was out of cell phone range (and, unlike a lot of my colleagues, I don't carry a smart phone but just a dumb $5 flip phone). With no wifi on the ship, I was completely cut off from the Internet for the longest stretch in at least the last 15 years – maybe the longest since I first got Internet at my house back in the mid-90s.

Life (on the Internet) goes on without you.

Facebook posts get posted, liked, commented on. Tweets happen, get re-tweeted. Emails are sent (for those that still use it, anyway). And life goes on.

I can't say I came back from vacation recharged in some stupendous way, but I think I'm better off than if I'd taken a shorter vacation in an Internet-connected location, checking up on the virtual world before breakfast and bedtime every day.

So take vacations, and take meaningful ones – disconnect from work. Don't worry about what's piling up, or what's happening in the online communities in which you participate. If you're going away for a week, really go away and leave work behind. If a crisis arises, make sure someone else is equipped to at least try to handle it, but don't go into your vacation planning for it to be interrupted.

If you can't do that, you should start preparing for it anyway. Train someone to be able to jump in and do your job (inefficiently, sure, but that's far better than "not at all"). Because quite frankly, if you can't cut the ties that bind you to that mission-critical production system on a voluntary, scheduled basis, then you are far too vulnerable to the random interruptions of life (car accident, death in the family, lengthy power failure for us telecommuters) that will come (oh, they will come).

Postgres migration speedup with table change analysis


(A Unicode rabbit face 🐰 will never be as cute
as this real bunny. Photo by Wade Simmons)

One of our clients recently reached out to us for help in upgrading their Postgres database. The use of the pg_upgrade program was not an option, primarily because the client was also taking the opportunity to change from their SQL_ASCII encoding to UTF-8. (If any of your databases, gentle reader, are still SQL_ASCII, please do the same!). Naturally, we also took advantage of the lack of pg_upgrade to enable the use of data checksums, another action we highly recommend. Although there were plenty of wrinkles, and stories to be told about this migration/upgrade, I wanted to focus on one particular problem we had: how to detect if a table has changed.

We needed to know if any applications were modifying certain tables because the speed of the migration was very important. If we could assert that no changes were made, there were some shortcuts available that would greatly speed things up. Initial testing showed that the migration was taking over eight hours, a time unacceptable to the client (no worries, we eventually reduced the time to under an hour!).

Looking closer, we found that over half that time was spent converting a single small (50MB) table from SQL_ASCII to UTF-8. How this conversion was performed is a story for another day, but suffice to say the table had some really, really messy bytes inside of it; the conversion program had to struggle mightily. When you are converting a database to a new encoding, it is imperative to examine every byte and make sure it gets changed to a format that Postgres will accept as valid UTF-8, or the entire table import will fail with an error similar to this:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf4 0xa5 0xa3 0xa5

Looking closer at the data in the table showed that it might - just might! - be a historical table. In other words, it no longer receives updates, just selects. We really wanted this to be true, for it meant we could dump the whole table, convert it, and simply load the converted table into the new database (which took only a few seconds!). First, however, we had to confirm that the table was not changing.

Detecting changes may be done in several ways. For all of them, you can never prove that the table shall not change at some point in the future, but you can prove that it has not changed over a certain period of time. How you go about doing that depends on what kind of access you have. If you do not have super-user access, you could add a simple trigger to the table that updates another table when a update, insert, or delete is performed. Then, checking in on the second table will indicate if any changes have been made.

A better solution is to simply look at the underlying file that makes up the table. To do this, you need be a Postgres superuser or have access to the underlying operating system. Basically, we will trust the operating system's information on when the table was last changed to determine if the table itself has changed. Although not foolproof, it is an excellent solution. Let's illustrate it here. First: create a test table and add some rows:

$ psql
greg=# CREATE TABLE catbox AS SELECT 8675309::INT AS id FROM generate_series(1,1000);
SELECT 1000

Now we can use the pg_stat_file() function, which returns some basic information about a file on disk. With the help of the pg_relation_filepath() function, we can see when the table was last modified:

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') ) \x\g
Expanded display is on.
-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:36:00-04
change       | 2015-11-08 22:36:00-04
creation     | 
isdir        | f

Next we will revisit the table after some time (e.g. 24 hours) and see if the "modification" timestamp is the same. If it is, then the table has not been modified either. Unfortunately, the possibility of a false positive is possible due to VACUUM, which may change things on disk but does NOT change the data itself. (A regular VACUUM *may* modify the file, and a VACUUM FULL *always* modifies it).

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') ) \x\g

-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:36:00-04
change       | 2015-11-08 22:36:00-04
creation     | 
isdir        | f


greg=# vacuum catbox;
VACUUM

greg=# select * from pg_stat_file( pg_relation_filepath('catbox') );

2016-06-09 22:53:24-04
-[ RECORD 1 ]+-----------------------
size         | 40960
access       | 2015-11-08 22:36:00-04
modification | 2015-11-08 22:40:14-04
change       | 2015-11-08 22:40:14-04
creation     | 
isdir        | f

A second (and more foolproof) method is to simply generate a checksum of the entire table. This is a fairly straightforward approach; just pump the output of pg_dump to a checksum program:

$ pg_dump -t catbox --data-only | sha1sum
6f724565656f455072736e44646c207472536e61  -

The advantage here is that even a VACUUM FULL will not change the checksum. However, because pg_dump does no ORDER BY when dumping out the table, it is possible for the rows to be returned in a different order. To work around that, issue a VACUUM FULL yourself before taking the checksum. As before, come back later (e.g. 24 hours) and re-run the command. If the checksums match, then the table has not changed (and is probably no longer updated by the application). By using this method, we were able to verify that the large, SQL_ASCII byte-soup table was indeed not being updated, and thus we took it out of the direct migration.

Of course, that table needed to be part of the new database, but we simply dumped the table, ran the conversion program on it, and (four hours later), had a complete dump of the table that loads extremely fast into the new database.

That solved only one of the problems, however; another table was also slowing down the migration. Although it did not have the SQL_ASCII conversion issue, it was a large table, and took a large percentage of the remaining migration time. A quick look at this table showed it had a "creation_time" column as well as a SERIAL primary key, and was obviously being updated quite often. Close examination showed that it was possible this was an append-only table, such that older rows were never updated. This called for a similar approach: could we prove that a large chunk of the table was not changing? If we could, we could pre-populate the new database and copy over only the most recent rows during the migration, saving a good bit of time.

The previous tricks would not work for this situation, because the underlying file would change constantly as seen by pg_stat_file(), and a pg_dump checksum would change on every insert. We needed to analyze a slice of the table - in this particular case, we wanted to see about checksumming all rows except those created in the last week. As a primary key lookup is very fast, we used the "creation_time" column to determine an approximate primary key to start with. Then it was simply a matter of feeding all those rows into the sha1sum program:

greg=# CREATE TABLE catbox2 (id SERIAL PRIMARY KEY, creation_time TIMESTAMPTZ);
CREATE TABLE
greg=# INSERT INTO catbox2(creation_time) select now() - '1 year'::interval + (x* '1 hour'::interval) from generate_series(1,24*365) x;
INSERT 0 8760

greg=# select * from catbox2 where creation_time > now()-'1 week'::interval order by 1 limit 1
  id  |         creation_time         
------+-------------------------------
 8617 | 2016-06-11 10:51:00.101971-08

$ psql -Atc "select * from catbox2 where id < 8617 order by 1" | sha1sum
456272656d65486e6f203139353120506173733f  -

## Add some rows to emulate the append-only nature of this table:
greg=# insert into catbox2(creation_time) select now() from generate_series(1,1000)
INSERT 0 1000

## Checksums should still be identical:
$ psql -Atc "select * from catbox2 where id < 8617 order by 1" | sha1sum
456272656d65486e6f203139353120506173733f  -

Despite the large size of this table (around 10 GB), this command did not take that long to run. A week later, we ran the same commands, and got the same checksum! Thus, we were able to prove that the table was mostly append-only - or at least enough for our use case. We copied over the "old" rows, then copied over the rest of the rows during the critical production migration window.

In the future, this client will able to take advantage of pg_upgrade, but getting to UTF-8 and data checksums was absolutely worth the high one-time cost. There were several other tricks used to speed up the final migration, but being able to remove the UTF-8 conversion of the first table, and being able to pre-copy 99% of the second table accounted for the lion's share of the final speed improvements.

Liquid Galaxy for Hyundai Card Travel Library

image courtesy of retaildesignblog
End Point and AZero, a South Korean system integrator, have partnered to deploy a 10-screen Liquid Galaxy to the newly-opened Hyundai Card Travel Library in Seoul, South Korea. This project presented a number of unique challenges for our teams, but we have launched successfully, to the great satisfaction of AZero's client.

The Hyundai Card Travel Library is an incredible space: wall-to-wall maple bookshelves hold travel brochures, photo books, and other travel-related material. The Liquid Galaxy displays itself sits in a small alcove off the main library space. Being fully enclosed, viewers can control the lighting and get a full immersion experience through the two rows of five 47" screens arrayed in an wall-mounted semi-circle arc. The viewer can control the screens via the podium-mounted touchscreen and SpaceNav mouse controller.

We solved several technical challenges for this deployment: the extremely tight space made cabling and display configuration tricky. Also, this isn't a "standard" 7-screen single row deployment, but rather two rows of 5 screens each. Working with AZero, End Point reconfigured the Liquid Galaxy display configurations to account for this unique layout. NOTE: the Liquid Galaxy scales quite easily, and can be arrayed in any number of configurations. Other recent deployments include a 40-screen control room, with 10 columns of 4 screens each!

Intended as a travel planning platform, Hyundai provided a number of set tours to showcase on the Liquid Galaxy, such as "The Orient Express", "Rio!", or "European Capitals". Each tour shows an overview map as a graphic overlay, while the Liquid Galaxy hops from each destination on the route to the next within Google Earth. At each location, the potential traveler can drop into Google Street View and see the fully panoramic images of street scenes in Paris or the top of Sugarloaf Mountain in Brazil. This should allow potential travelers to virtually experience the tour locations, and make more informed decisions about which trip might suit their tastes. Beyond that, it's a pretty cool way to view the planet inside a travel library.

End Point CEO and NYSE Bell Ringing

Wall Street. Back where it all started for me some 35 years ago. Only instead of being an employee of Oppenheimer and Company, I had the experience and honor of representing End Point as one of 7 companies chosen to “ring the bell” at the New York Stock Exchange, chiming in Small Business Week for JPMorgan Chase. (They work with 4,000,000 small companies.)

The morning started early by going through security checks rivaling the airport, except I didn’t have to take my shoes off. After getting my nifty credential, we went back to the street where the President of the NYSE gave a welcoming speech, pointing out the buildings that are still standing from when Hamilton, Monroe and Aaron Burr all started their own banks. As well as where George Washington was sworn in.

All this while also getting free coffee from the main small business honoree, Gregory’s Coffee, and picture taking from the business paparazzi!

We then went inside the storied NYSE building and made our way to the trading floor. The surroundings were immensely impressive, as the Stock Exchange still inhabits a huge floor with gorgeous 40 foot mid-18th century ceilings high above, holding up all sorts of 21st century technology and equipment. In the center of it all is CNBC’ stage set, with the show Squawk Box airing live, and with the infamous Mad Money man himself, Jim Kramer, sitting in, talking about the impact the newly minted presumptive Republican Candidate (to be nameless) might have on the markets.

At about 9:15, the 7 small business owners and the head of marketing and small market banking for Chase made our way to the real stage, a perch overlooking the entire floor where the actual apparatus for the bell ringing takes place. In front of four live cameras, we waited for 9:30 to hit, and then the bell was rung....rather, the button was pressed to start the day’s trading. (Sorry they use a button now, not an actual clapper to ring the bell.) Aired live, we shook hands, smiled for a national audience, passed around compliments and enjoyed the moment.

A few moments we went back down to the floor where we were able to float around and I could watch the operation of what is still the heart of the financial world come to life around me.

The Exchange has changed in all those years since my days in one important way; Instead of thousands of crazed floor traders frantically buying and selling millions of shares using ticker tape and hand signals, there were maybe a couple of hundred standing by their booths, watching millions of shares being exchanged electronically. Looking almost bored, they calmly talked with each other sipping more of Gregory’s coffee.

I’ll dispense with the rest of the details of the day and the ensuing reception. Rather, I shall now explain the importance of the event as it relates to End Point and to our clients and my own history.

The first day I was on the job in Wall Street, the chairman of the company said to me something that to this day still guides me as to how I run End Point and I have repeated countless times to whomever will listen. He said, “Rick, to have a good restaurant, you have to have a great kitchen!” So, he had me learn all aspects of the company from back office, to accounting, to the fledgling world of computers to the front office and to how to work effectively with employees and customers alike.

End Point may be one of the smaller of the “small companies” chosen by Chase this day. But, we were selected because we are a company that personifies good management, great engineering, unrelenting commitment to our customers, and a company that has great potential. Why? Because we believe in having a great kitchen! Our engineering staff is exemplary, our clients are fully appreciative of our partnerships with them, and we are doing all we can to be a model business.

While I may not have rung the actual bell this year, our banker and Chase has every confidence – as do I – that one day we will.


Adding Bash Completion To a Python Script

Bash has quite a nice feature, you can write a command in a console, and then press <TAB> twice. This should should you possible options you can write for this command.

I will show how to integrate this mechanism into a custom python script with two types of arguments. What's more, I want this to be totally generic. I don't want to change it when I will change the options, or change config files.

This script accepts two types of arguments. One type contains mainly flags beginning with '--', the other type is a host name taken from a bunch of chef scripts.

Let's name this script show.py - it will show some information about the host. This way I can use it with:

show.py szymon

The szymon part is the name of my special host, and it is taken from one of our chef node definition files.

This script also takes huge number of arguments like:

show.py --cpu --memory --format=json

So we have two kinds of arguments: one is a simple string, one begins with --.

To implement the bash completion on double <TAB>, first I wrote a simple python script, which is prints a huge list of all the node names:

#!/usr/bin/env python

from sys import argv
import os
import json

if __name__ == "__main__":
    pattern = ""
    if len(argv) == 2:
        pattern = argv[1]

    chef_dir = os.environ.get('CHEF_DIR', None)
    if not chef_dir:
        exit(0)
    node_dirs = [os.path.join(chef_dir, "nodes"),
                 os.path.join(chef_dir, "dev_nodes")]
    node_names = []

    for nodes_dir in node_dirs:
        for root, dirs, files in os.walk(nodes_dir):
            for f in files:
                try:
                    with open(os.path.join(root, f), 'r') as nf:
                        data = json.load(nf)
                        node_names.append(data['normal']['support_name'])
                except:
                    pass

    for name in node_names:
        print name

Another thing was to get a list of all the program options. We used the below one liner. It uses the help information shown by the script. So each time the script changed its options, and it is shown when used show.py --help, the tab completion will have show these new options.

$CHEF_DIR/repo_scripts/show.py --help | grep '  --' | awk '{print $1}'

The last step to make all this work was making a simple bash script, which uses the above python script, and the one liner. I placed this script in a file $CHEF_DIR/repo_scripts/show.bash-completion.

_show_complete()
{
    local cur prev opts node_names
    COMPREPLY=()
    cur="${COMP_WORDS[COMP_CWORD]}"
    prev="${COMP_WORDS[COMP_CWORD-1]}"
    opts=`$CHEF_DIR/repo_scripts/show.py --help | grep '  --' | awk '{print $1}'`
    node_names=`python $CHEF_DIR/repo_scripts/node_names.py`

    if [[ ${cur} == -* ]] ; then
        COMPREPLY=( $(compgen -W "${opts}" -- ${cur}) )
        return 0
    fi

    COMPREPLY=( $(compgen -W "${node_names}" -- ${cur}) )
}

complete -F _show_complete show.py

The last thing was to source this file, so I've added the below line in my ~/.bashrc.

source $CHEF_DIR/repo_scripts/show.bash-completion

And now pressing the <TAB> twice in a console shows quite nice completion options:

$ show.py 
Display all 42 possibilities? (y or n)
... and here go all 42 node names ...
$ show.py h
... and here go all node names beginning with 'h' ...
$ show.py --
.. and here go all the options beginning with -- ...

The merchant login ID or password is invalid or the account is inactive, and to how to fix it in Spree

Authorize.net has disabled the RC4 cipher suite on their test server. Their production server update will follow soon. So, in order to ensure your, or your client's, site(s) do not experience any interruption in payment processing it is wise to place a test order in the Authorize.net test environment.

The projects I was testing were all Spree Gem (2.1.x). The Spree Gem uses the ActiveMerchant Gem (in Spree 2.1.x it's ActiveMerchant version 1.34.x). Spree allows you to sign into the admin and select which server your Authorize.net payment method will hit- production or test. There is another option for selecting a "Test Mode" transaction. The difference between a test server transaction and a test mode transaction is explained quite succinctly on the Authorize.net documentation. To summarize it, test server transactions are never sent to financial institutions for processing but are stored in Authorize.net (so you can see their details). Transactions in test mode however are not stored and return a transaction ID of zero.

I wanted to use my Authorize.net test account to ensure my clients were ready for the RC4 Cypher Suite disablement. I ran across a few strange things. First, for three sites, no matter what I did, I kept getting errors saying my Authorize.net account was either inactive or I was providing the wrong credentials. I signed in to Authorize.net and verified my account was active. I triple checked the credentials, they were right. So, I re-read the Spree docs thinking that perhaps I needed to use a special word or format to actually use the test server ("test" versus "Test" or something like that).

Below is a screenshot of the test payment method I had created and was trying to use.

Since I kept getting errors I looked through the Spree code, then the ActiveMerchant Gem that Spree is using.

Below, you can see that the ActiveMerchant is deciding which URL to use (test or live) based on the value of test? (line 15). active_merchant/lib/active_merchant/billing/gateways/authorize_net.rb

require 'nokogiri'

module ActiveMerchant #:nodoc:
  module Billing #:nodoc:
    class AuthorizeNetGateway < Gateway
      include Empty

      self.test_url = 'https://apitest.authorize.net/xml/v1/request.api'
      self.live_url = 'https://api2.authorize.net/xml/v1/request.api'

.
.
.
      def url
        test? ? test_url : live_url
      end

How and where is this set? Spree passes the ActiveMerchant Gem some data which the ActiveMerchant Gem uses to create Response objects. Below is the code where ActiveMerchant handles this data.
active_merchant/lib/active_merchant/billing/response.rb


module ActiveMerchant #:nodoc:
  module Billing #:nodoc:
    class Error < ActiveMerchantError #:nodoc:
    end

    class Response
      attr_reader :params, :message, :test, :authorization, :avs_result, :cvv_result, :error_code, :emv_authorization
.
.
.
      def test?
        @test
      end
.
.
.
      def initialize(success, message, params = {}, options = {})
        @success, @message, @params = success, message, params.stringify_keys
        @test = options[:test] || false
        @authorization = options[:authorization]
        @fraud_review = options[:fraud_review]
        @error_code = options[:error_code]
        @emv_authorization = options[:emv_authorization]

        @avs_result = if options[:avs_result].kind_of?(AVSResult)
          options[:avs_result].to_hash
        else
          AVSResult.new(options[:avs_result]).to_hash
        end

        @cvv_result = if options[:cvv_result].kind_of?(CVVResult)
          options[:cvv_result].to_hash
        else
          CVVResult.new(options[:cvv_result]).to_hash
        end
      end
    end
active_merchant/lib/active_merchant/billing/gateway.rb
      # Are we running in test mode?
      def test?
        (@options.has_key?(:test) ? @options[:test] : Base.test?)
      end

Now that I was more familiar with ActiveMerchant, I wanted to verify that Spree was passing the data as intended

I could see in spree/core/app/models/spree/gateway.rb that Spree was setting ActiveMerchant::Billing::Base.gateway_mode equal to the server param as a symbol. I verified it with some logging.

    def provider
      gateway_options = options
      gateway_options.delete :login if gateway_options.has_key?(:login) and gateway_options[:login].nil?
      if gateway_options[:server]
        ActiveMerchant::Billing::Base.gateway_mode = gateway_options[:server].to_sym
      end 
      @provider ||= provider_class.new(gateway_options)
    end 

At this point I was satisfied that Spree was sending a server param. I also knew Spree was setting Active Merchant's Base.gateway_mode as intended. I then reviewed active_merchant/lib/active_merchant/billing/gateway.rb once more

      # Are we running in test mode?
      def test?
        (@options.has_key?(:test) ? @options[:test] : Base.test?)
      end
and active_merchant/lib/active_merchant/billing/base.rb
      def self.test?
        self.gateway_mode == :test
      end

So, that's it! We know from the exceptions I raised that Spree is sending a test key and a test_mode key. They seem to be the same value but with different keys (I'm guessing that's a mistake), and they both just seem to indicate if the test mode checkbox was checked or not in the Spree admin. However, Base.test? is the server selection and comes from whatever anyone enters in the server input box in the Spree admin. So, we just need to update the ternary operator to check if @options[:test] (test mode) or Base.test? (test server) is true.

Since this is Spree, I created a decorator to override the test? method.

app/models/gateway_decorator.rb

ActiveMerchant::Billing::Gateway.class_eval do
  def test?
    @options.has_key?(:test) && @options[:test] || ActiveMerchant::Billing::Base.test?
  end 
end

Lastly, I placed some test orders and it all worked as intended.

Summary

Authorize.net is disabling the RC4 Cypher suite. If your site(s) uses that, your payment processing may be interrupted. Since the test environment has been updated by Authorize.net, you can see if your site(s) is compliant by posting test transactions to the test environment. If it works, then your site(s) should be compliant and ready when Authorize.net applies the changes to the production server.

Spree 2.1.x (and perhaps all other Spree versions) ALWAYS send the test key, so the ActiveMerchant Gem will always just use the boolean value of that key instead of ever checking to see what the server was set to. Further, this fix makes things a little bit more robust in my opinion by checking if test mode OR the test server was specified, rather than only checking which server (gateway_mode) was specified if the test key was absent.

Alternatively, you could probably make Spree only pass the test key if the value was true. Either way, if you are trying to send test orders to the test environment for a Spree site of at least some versions and have not implemented one of these changes, you will be unable to do so until you add a similar fix as I have described here. If you need any further assistance, please reach out to us at ask@endpoint.com.

Bucardo replication workarounds for extremely large Postgres updates

Bucardo is very good at replicating data among Postgres databases (as well as replicating to other things, such as MariaDB, Oracle, and Redis!). However, sometimes you need to work outside the normal flow of trigger-based replication systems such as Bucardo. One such scenario is when many changes need to be made to your replicated tables. And by a lot, I mean many millions of rows. When this happens, it may be faster and easier to find an alternate way to replicate those changes.

When a change is made to a table that is being replicated by Bucardo, a trigger fires and stores the primary key of the row that was changed into a "delta" table. Then the Bucardo daemon comes along, gathers a list of all rows that were changed since the last time it checked, and pushes those rows to the other databases in the sync (a named replication set). Although all of this is done in a fast and efficient manner, there is a bit of overhead that adds up when (for example), updating 650 million rows in one transaction.

The first and best solution is to simply hand-apply all the changes yourself to every database you are replicating to. By disabling the Bucardo triggers first, you can prevent Bucardo from even knowing, or caring, that the changes have been made.

To demonstrate this, let's have Bucardo replicate among five pgbench databases, called A, B, C, D, and E. Databases A, B, and C will be sources; D and E are just targets. Our replication looks like this: ( A <=> B <=> C ) => (D, E). First, we create all the databases and populate them:

## Create a new cluster for this test, and use port 5950 to minimize impact
$ initdb --data-checksums btest
$ echo port=5950 >> btest/postgresql.conf
$ pg_ctl start -D btest -l logfile

## Create the main database and install the pg_bench schema into it
$ export PGPORT=5950
$ createdb alpha
$ pgbench alpha -i --foreign-keys

## Replicated tables need a primary key, so we need to modify things a little:
$ psql alpha -c 'alter table pgbench_history add column hid serial primary key'

## Create the other four databases as exact copies of the first one:
$ for dbname in beta gamma delta epsilon; do createdb $dbname -T alpha; done

Now that those are done, let's install Bucardo, teach it about these databases, and create a sync to replicate among them as described above.

$ bucardo install --batch
$ bucardo add dbs A,B,C,D,E dbname=alpha,beta,gamma,delta,epsilon dbport=5950
$ bucardo add sync fiveway tables=all dbs=A:source,B:source,C:source,D:target,E:target

## Tweak a few default locations to make our tests easier:
$ echo -e "logdest=.\npiddir=." > .bucardorc

At this point, we have five databases all ready to go, and Bucardo is setup to replicate among them. Let's do a quick test to make sure everything is working as it should.

$ bucardo start
Checking for existing processes
Starting Bucardo

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "\n" " "
alpha|0 beta|0 gamma|0 delta|0 epsilon|0

$ pgbench alpha
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average: 0.000 ms
tps = 60.847066 (including connections establishing)
tps = 62.877481 (excluding connections establishing)

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "\n" " "
alpha|6576 beta|6576 gamma|6576 delta|6576 epsilon|6576

$ pgbench beta
starting vacuum...end.
...
tps = 60.728681 (including connections establishing)
tps = 62.689074 (excluding connections establishing)
$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "select '$db',sum(abalance) from pgbench_accounts";done | tr "\n" " "
alpha|7065 beta|7065 gamma|7065 delta|7065 epsilon|7065

Let's imagine that the bank discovered a huge financial error, and needed to increase the balance of every account created in the last two years by 20 dollars. Let's further imagine that this involved 650 million customers. That UPDATE will take a very long time, but will suffer even more because each update will also fire a Bucardo trigger, which in turn will write to another "delta" table. Then, Bucardo will have to read in 650 million rows from the delta table, and (on every other database in the sync) apply those changes by deleting 650 million rows then COPYing over the correct values. This is one situation where you want to sidestep your replication and handle things yourself. There are three solutions to this. The easiest, as mentioned, is to simply do all the changes yourself and prevent Bucardo from worrying about it.

The basic plan is to apply the updates on all the databases in the syncs at once, while using the session_replication_role feature to prevent the triggers from firing. Of course, this will prevent *all* of the triggers on the table from firing. If there are some non-Bucardo triggers that must fire during this update, you might wish to temporarily set them as ALWAYS triggers.

Solution one: manual copy

## First, stop Bucardo. Although not necessary, the databases are going to be busy enough
## that we don't need to worry about Bucardo at the moment.
$ bucardo stop
Creating ./fullstopbucardo ... Done

## In real-life, this query should get run in parallel across all databases,
## which would be on different servers:
$ QUERY='UPDATE pgbench_accounts SET abalance = abalance + 25 WHERE aid > 78657769;'

$ for db in alpha beta gamma delta epsilon; do psql $db -Atc "SET session_replication_role='replica'; $QUERY"; done | tr "\n" " "
UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 UPDATE 83848570 

## For good measure, confirm Bucardo did not try to replicate all those rows:
$ bucardo kick fiveway
Kicked sync fiveway

$ grep Totals log.bucardo
(11144) [Mon May 16 23:08:57 2016] KID (fiveway) Totals: deletes=36 inserts=28 conflicts=0
(11144) [Mon May 16 23:09:02 2016] KID (fiveway) Totals: deletes=38 inserts=29 conflicts=0
(11144) [Mon May 16 23:09:22 2016] KID (fiveway) Totals: deletes=34 inserts=27 conflicts=0
(11144) [Tue May 16 23:15:08 2016] KID (fiveway) Totals: deletes=10 inserts=7 conflicts=0
(11144) [Tue May 16 23:59:00 2016] KID (fiveway) Totals: deletes=126 inserts=73 conflicts=0

Solution two: truncate the delta

As a second solution, what about the event involving a junior DBA who made all those updates on one of the source databases without disabling triggers? When this happens, you would probably find that your databases are all backed up and waiting for Bucardo to handle the giant replication job. If the rows that have changed constitute most of the total rows in the table, your best bet is to simply copy the entire table. You will also need to stop the Bucardo daemon, and prevent it from trying to replicate those rows when it starts up by cleaning out the delta table. As a first step, stop the main Bucardo daemon, and then forcibly stop any active Bucardo processes:

$ bucardo stop
Creating ./fullstopbucardo ... Done

$ pkill -15 Bucardo

Now to clean out the delta table. In this example, the junior DBA updated the "beta" database, so we look there. We may go ahead and truncate it because we are going to copy the entire table after that point.

# The delta tables follow a simple format. Make sure it is the correct one
$ psql beta -Atc 'select count(*) from bucardo.delta_public_pgbench_accounts'
650000000
## Yes, this must be the one!

## Truncates are dangerous; be extra careful from this point forward
$ psql beta -Atc 'truncate table bucardo.delta_public_pgbench_accounts'

The delta table will continue to accumulate changes as applications update the table, but that is okay - we got rid of the 650 million rows. Now we know that beta has the canonical information, and we need to get it to all the others. As before, we use session_replication_role. However, we also need to ensure that nobody else will try to add rows before our COPY gets in there, so if you have active source databases, pause your applications. Or simply shut them out for a while via pg_hba.conf! Once that is done, we can copy the data until all databases are identical to "beta":

$ ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts; " ; pg_dump beta --section=data -t pgbench_accounts ) | psql alpha -1 --set ON_ERROR_STOP=on
SET
ERROR:  cannot truncate a table referenced in a foreign key constraint
DETAIL:  Table "pgbench_history" references "pgbench_accounts".
HINT:  Truncate table "pgbench_history" at the same time, or use TRUNCATE ... CASCADE.

Aha! Note that we used the --foreign-keys option when creating the pgbench tables above. We will need to remove the foreign key, or simply copy both tables together. Let's do the latter:

$ ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts, pgbench_history; " ; pg_dump beta --section=data -t pgbench_accounts \
  -t pgbench_history) | psql alpha -1 --set ON_ERROR_STOP=on
SET
TRUNCATE TABLE
SET
SET
SET
SET
SET
SET
SET
SET
COPY 100000
COPY 10
 setval 
--------
     30
(1 row)
## Do the same for the other databases:
$ for db in gamma delta epsilon; do \
 ( echo "SET session_replication_role='replica'; TRUNCATE TABLE pgbench_accounts, pgbench_history; " ; pg_dump $db --section=data -t pgbench_accounts \
  -t pgbench_history) | psql alpha -1 --set ON_ERROR_STOP=on ; done

Note: if your tables have a lot of constraints or indexes, you may want to disable those to speed up the COPY. Or even turn fsync off. But that's the topic of another post.

Solution three: delta excision

Our final solution is a variant on the last one. As before, the junior DBA has done a mass update of one of the databases involved in the Bucardo sync. But this time, you decide it should be easier to simply remove the deltas and apply the changes manually. As before, we shut down Bucardo. Then we determine the timestamp of the mass change by checking the delta table closely:

$ psql beta -Atc 'select txntime, count(*) from bucardo.delta_public_pgbench_accounts group by 1 order by 2 desc limit 3'
2016-05-26 23:23:27.252352-04|65826965
2016-05-26 23:23:22.460731-04|80
2016-05-07 23:20:46.325105-04|73
2016-05-26 23:23:33.501002-04|69

Now we want to carefully excise those deltas. With that many rows, it is quicker to save/truncate/copy than to do a delete:

$ psql beta
beta=# BEGIN;
BEGIN
## To prevent anyone from firing the triggers that write to our delta table
beta=#LOCK TABLE pgbench_accounts;
LOCK TABLE
## Copy all the delta rows we want to save:
beta=# CREATE TEMP TABLE bucardo_store_deltas AS SELECT * FROM bucardo.delta_public_pgbench_accounts WHERE txntime <> '2016-05-07 23:20:46.325105-04';
SELECT 1885
beta=# TRUNCATE TABLE bucardo.delta_public_pgbench_accounts;
TRUNCATE TABLE
## Repopulate the delta table with our saved edits
beta=# INSERT INTO bucardo.delta_public_pgbench_accounts SELECT * FROM bucardo_store_deltas;
INSERT 0 1885
## This will remove the temp table
beta=# COMMIT;
COMMIT

Now that the deltas are removed, we want to emulate what caused them on all the other servers. Note that this query is a contrived one that may lend itself to concurrency issues. If you go this route, make sure your query will produce the exact same results on all the servers.

## As in the first solution above, this should ideally run in parallel
$ QUERY='UPDATE pgbench_accounts SET abalance = abalance + 25 WHERE aid > 78657769;'

## Unlike before, we do NOT run this against beta
$ for db in alpha gamma delta epsilon; do psql $db -Atc "SET session_replication_role='replica'; $QUERY"; done | tr "\n" " "
UPDATE 837265 UPDATE 837265 UPDATE 837265 UPDATE 837265 UPDATE 837265 

## Now we can start Bucardo up again
$ bucardo start
Checking for existing processes
Starting Bucardo

That concludes the solutions for when you have to make a LOT of changes to your database. How do you know how much is enough to worry about the solutions presented here? Generally, you can simply let Bucardo run - you will know when everything crawls to a halt that perhaps trying to insert 465 million rows at once was a bad idea. :)