End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Perusion has joined End Point!

We are excited to announce an expansion of End Point’s ecommerce clientele and our developer ranks! The ecommerce consulting company Perusion has joined End Point. Perusion was founded in 2002 by Mike Heins and Greg Hanson. It quickly became a small powerhouse in the open source ecommerce space, focusing on Interchange, Perl, and MySQL on Linux. We were pleased to welcome Perusion in a merger with End Point at the beginning of July.

Mike Heins is the original creator of MiniVend in 1996. In 2000, Mike’s consultancy and MiniVend were acquired by the ecommerce startup Akopia. With numerous improvements, including the addition of a new full-featured administrative back-office, the new open source ecommerce platform Interchange was created. Akopia was acquired by Red Hat in 2001, and in 2002 the Interchange project became independent, led by its creators and a group of other open source developers who maintain it to this day.

Greg Hanson is a serial entrepreneur and business manager and has worked extensively with data systems of many types. In the mid-1990s he started a computer products ecommerce company, Valuemedia, and oversaw every aspect of its evolution. Greg joined Mike to launch Perusion in 2002, and is now a client consultant and developer. He has shepherded several businesses from small mom & pop operations into strong companies providing goods and services around the world.

Josh Lavin began creating websites professionally in 1998 at his consultancy Kingdom Design. He grew his work into the ecommerce space, helping many companies sell their products online for the first time. Josh joined Perusion in 2007, bringing skills in marketing and user experience that are just as important as his development abilities. In recent years he has enjoyed moving client sites over to responsive front-end designs that work well on desktop, tablet, and mobile phone.

Perusion’s development and hosting clients have joined End Point as well. Some of the noteworthy sites that bear mentioning here are American Welding Society, Bluestone Perennials, Vervant√©, Bulk Herb Store, Northern Sun, Penn Herb Company, Air Delights, and Solar Pathfinder.

At End Point we got our start in the industry in 1995 by creating dynamic database-backed websites for our clients. Our tools of choice in those early days were Linux, Apache, msql and MySQL, and Perl. In the late 1990s we began focusing more on ecommerce websites specifically, and we added MiniVend and Interchange to the mix.

Later we branched out into PostgreSQL, Ruby on Rails and Spree, Python and Django, Perl Dancer, NodeJS, and other platforms, while we continued to support and enhance Interchange and the sites running on it. Today we still host and develop many successful ecommerce sites running Interchange, taking many thousands of orders worth millions of dollars every day. So we are delighted to have Perusion join us as we to continue to grow the Interchange-based part of our business.

We have already seen constructive collaboration in this merger, with longtime End Point employees able to add more helping hands to Perusion projects and further breadth of capabilities and depth of support, Perusion developers bringing their expertise to bear, and Perusion contacts leading to new projects and new business.

Perusion has always believed in going above and beyond the call of duty and has made their clients’ success their own goal. Likewise, we at End Point feel this is more than a business, and we value the personal relationships we have developed with each other and our clients over the years. We look forward to the new possibilities that are now available through this change, and are glad to welcome Perusion on board at End Point!

Ruby On Rails: Hash#slice With Default Values

Recently, I needed to add some functionality to an older section of code. This code initialized and passed around a reasonably sized set of various hashes, all with similar keys. As those hashes were accessed and manipulated, there were quite a few lines of code devoted to addressing boundary conditions within those hashes. For example, an if/else statement setting a default value to a given key if it didn't already exist. With all the added safety checks, the main method dragged on for several screens worth of code. While puttering around amidst this section, I figured I'd be a good little boyscout and leave my campsite better than when I found it.

I figured a fairly easy way to do that would be to eliminate the need for all the extra if/else clauses laying around. All they were really doing was ensuring we ended up with a minimum set of hash keys. I decided to turn all the various hashes into instances of a very simple class inheriting from Ruby on Rails' HashWithIndifferentAccess along with some basic key management functionality.

My first draft came out looking something like:

class MyHash < HashWithIndifferentAccess
  def initialize(constuctor = {}) do
    super
    self[:important_key_1] ||= "default 1"
    self[:important_key_2] ||= "default 2"
  end
end
This seemed to work fine. I didn't need to worry about ensuring "important keys" were present anymore. And it was perfectly viable to pass in one of the important keys as part of the initialization.

I soon discovered in my test suite that my code did not do exactly what I intended it to do. In the tests, I wanted to ensure several of my hash keys came out with the right values. I made use of MyHash#slice to ensure I ended up with the right subset of values for my given test. However, no matter what I did, I could not weed out the important keys:

1.9.3 :003 > MyHash.new({foo: 'bar', bar: 'lemon'}).slice(:bar)
=> {"important_key_1"=>"default 1", "important_key_2"=>"default 2", "bar"=>"lemon"}
I admit I was quite perplexed by this. I tried several re-writes of the initialize method looking for some version that didn't exhibit this strange slice behavior. Finally, I took to the Ruby on Rails and Ruby docs.

Looking at the source for slice, I found the problem:

keys.each_with_object(self.class.new)...
The method slice calls "new" (which includes the default values) to create another object to avoid clobbering the one the method is called on. Since I didn't feel like writing my own custom slice method, or trying to monkey patch Rails, I realized I was beaten. I needed to find a new solution.

After a little thought, I came up with this:

class MyHash < HashWithIndifferentAccess
  def self.build(constructor = {}) do
    h = new(constructor)
    h[:important_key_1] ||= "default 1"
    h[:important_key_2] ||= "default 2"
  end
end
This does not fix the problem, but manages to sidestep it. Most Ruby on Rails veterans will be familiar with methods called "build" so it shouldn't be too clumsy to work with. I replaced all the entries in my code that called MyHash.new(...) with MyHash.build(...) and went on my merry way.

img.bi, a secret encrypted image sharing service tool

After a fairly good experience with dnote installed on our own servers as an encrypted notes sharing service, my team decided that it would have been nice to have a similar service for images.

We found a nice project called img.bi that is based on NodeJS, Python, Redis and a lot of client-side JavaScript.

The system is divided into two components: the HTML/JS frontend and a Python FastCGI API.

Unfortunately the documentation is a still in its very early stage and it's lacking a meaningful structure and a lot of needed information.

Here's an overview of the steps we followed to setup img.bi on our own server behind nginx.

First of all we chose that we wanted to have as much as possible running and confined to a regular user, which is always a good idea with such young and potentially vulnerable tools. We chose to use the imgbi user.

Then since we wanted to keep as clean as possible the root user environment (and system status), we also decided to use pyenv. To be conservative we chose the latest Python 2.7 stable release, 2.7.10.

git clone https://github.com/yyuu/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile
echo 'eval "$(pyenv init -)"' >> ~/.bash_profile
echo $SHELL -l
pyenv install -l  | grep 2\\.7
pyenv install 2.7.10
pyenv global 2.7.10
pyenv version
which python
python --version

In order to use img.bi, we also needed NodeJS and following the same approach we chose to use nvm and install the latest NodeJS stable version:

curl -o- https://raw.githubusercontent.com/creationix/nvm/v0.25.4/install.sh | bash
nvm install stable
nvm list
nvm use stable
nvm alias default stable
node --version

After that we updated pip and then installed all the needed Python modules:

pip install --upgrade pip
pip install redis m2crypto web.py bcrypt pysha3 zbase62 pyutil flup

Then it's time to clone the actual img.bi code from the GitHub repo, install a few missing dependencies and then use the bower and npm .json files to add the desired packages:

git clone https://github.com/imgbi/img.bi.git
cd img.bi/
npm install -g bower grunt grunt-cli grunt-multiresize
npm install -g grunt-webfont --save-dev
npm install
bower install

We also faced an issue which made Grunt fail to start correctly. Grunt was complaining about an "undefined property" called "prototype". If you happen to have the same problem just type

cd node_modules/grunt-connect-proxy/node_modules/http-proxy
npm install eventemitter3@0.1.6
cd -

That'll basically install the eventemitter3 NodeJS package module locally to the grunt-connect-proxy module so to overcome the compatibility issues which in turn causes the error mentioned above.

You should use your favourite editor to change the file config.json, which basically contains all your local needed configuration. In particular our host is not exposed on the I2P or Tor network, so we "visually" disabled those options.

# lines with "+" needs to be replace the ones starting with a "-"
-  "name": "img.bi",
+  "name": "img.bi - End Point image sharing service",

-  "maxSize": "3145728",
+  "maxSize": "32145728",

-  "clearnet": "https://img.bi",
+  "clearnet": "https://imgbi.example",

-  "i2p": "http://imgbi.i2p",
+  "i2p": "http://NOTAVAILABLE.i2p",

-  "tor": "http://imgbifwwqoixh7te.onion",
+  "tor": "http://NOTAVAILABLE.onion",

Save and close the file. At this point you should be able to run "grunt" to build the project but if it fails on the multiresize task, just run

grunt --force

to ignore the warnings.

That's about everything you need for the frontend part, so it's now time to take care of the API.

cd
git clone https://github.com/imgbi/img.bi-api.git
cd /home/imgbi/img.bi-api/

You now need to edit the two Python files which are the core of the API.

# edit code.py expired.py
-upload_dir = '/home/img.bi/img.bi-files'
+upload_dir = '/home/imgbi/img.bi-files'

Verify that you're not having any Python import related error, due to missing modules or else, by running the Python code.py file directly.

./code.py

If that's working okay, just create a symlink in the build directory in order to have the API created files available to the frontend

ln -s /home/imgbi/img.bi-files /home/imgbi/img.bi/build/download

And then it's time to spawn the actual Python daemon:

spawn-fcgi -f /home/imgbi/img.bi-api/code.py -a 127.0.0.1 -p 1234

The expired.py file is used by a cronjob which periodically checks if there's any image/content that should be removed because its time has expired. First of all let's call the script directly and if there's no error, let's create the crontab:

python /home/imgbi/img.bi-api/expired.py

crontab -e

@reboot spawn-fcgi -f /home/imgbi/img.bi-api/code.py -a 127.0.0.1 -p 1234
30 4 * * * python /home/imgbi/img.bi-api/expired.py

It's now time to install nginx and Redis (if you still haven't done so), and then configure them. For Redis you can just follow the usual simple, basic installation and that'll be just okay. Same is true for nginx but we'll add our configuration/vhost file content here as an example /etc/nginx/sites-enabled/imgbi.example.conf for everyone who may need it:

upstream imgbi-fastcgi {
  server 127.0.0.1:1234;
}

server {
  listen 80;
  listen [::]:80;
  server_name imgbi.example;
  access_log /var/log/nginx/sites/imgbi.example/access.log;
  error_log /var/log/nginx/sites/imgbi.example/error.log;
  rewrite ^ https://imgbi.example/ permanent;
}

server {
  listen 443 ssl spdy;
  listen [::]:443 ssl spdy;
  server_name  imgbi.example;
  server_name  imgbi.example;
  access_log /var/log/nginx/sites/imgbi.example/access.log;
  error_log /var/log/nginx/sites/imgbi.example/error.log;

  client_max_body_size 4G;

  include include/ssl-wildcard-example.inc;

  add_header Strict-Transport-Security max-age=31536000;
  add_header X-Frame-Options SAMEORIGIN;
  add_header X-Content-Type-Options nosniff;
  add_header X-XSS-Protection "1; mode=block";

  location / {
    root /home/imgbi/img.bi/build;
  }

  location /api {
    fastcgi_param QUERY_STRING $query_string;
    fastcgi_param REQUEST_METHOD $request_method;
    fastcgi_param CONTENT_TYPE $content_type;
    fastcgi_param CONTENT_LENGTH $content_length;

    fastcgi_param SCRIPT_NAME "";
    fastcgi_param PATH_INFO $uri;
    fastcgi_param REQUEST_URI $request_uri;
    fastcgi_param DOCUMENT_URI $document_uri;
    fastcgi_param DOCUMENT_ROOT $document_root;
    fastcgi_param SERVER_PROTOCOL $server_protocol;

    fastcgi_param GATEWAY_INTERFACE CGI/1.1;
    fastcgi_param SERVER_SOFTWARE nginx/$nginx_version;

    fastcgi_param REMOTE_ADDR $remote_addr;
    fastcgi_param REMOTE_PORT $remote_port;
    fastcgi_param SERVER_ADDR $server_addr;
    fastcgi_param SERVER_PORT $server_port;
    fastcgi_param SERVER_NAME $server_name;
    fastcgi_param HTTPS on;

    fastcgi_pass imgbi-fastcgi;
    fastcgi_keep_conn on;
  }
}

Well, that should be enough to get you started and at least have all the components in place. Enjoy your secure image sharing now.

Nothing more permanent than a temporary fix

A brief thought:

You may have heard the saying that nothing is more permanent than a temporary fix. Or that prototypes are things we just haven't yet recognized will be permanent. Or some variation on the theme.

As an illustration of this, I recently came across the initial commit to the source code repository of our endpoint.com website when we ported it to Ruby on Rails back in April 2007. Our then co-worker PJ's comment is a perfect example of how long-lasting some of our planned temporary work can be:

commit 2ee55da6ed953c049b3ef6f9f132ed3c1e0d4de9
Author: PJ Cabreras <pj@endpoint.com>
Date:   Wed Apr 18 13:07:46 2007 +0000

    Initial test setup of repository for mkcamp testing -- will probably throw away later
    
    git-svn-id: file:///home/camp/endpoint/svnrepo/trunk@1 7e1941c4-622e-0410-b359-a11864f70de7

It's wise to avoid big architecture up front for experimental things we don't know the needed shape and size of. But we should plan on iterating and being agile (in the real basic sense of the word), because we may never have the chance to start over from scratch. And starting over from scratch is often ill-advised in any case.

E-commerce website encryption changes

The big picture

Computer security is a moving target, and during the past few years it's been moving faster than ever.

In the e-commerce world, the PCI Security Standards Council sets the rules for what merchants and vendors must do to have what they consider to be a sufficiently secure environment to handle cardholder data such as credit card numbers, expiration dates, and card security codes.

PCI DSS 3.1, released on 15 April 2015 puts us all on notice that TLS 1.0 is considered unfit to use for e-commerce website encryption (HTTPS), and will be disallowed soon. The new rules specify that new software implementations must not use TLS versions prior to 1.1. Existing implementations must require TLS 1.1 or 1.2 no later than 30 June 2016.

They provide some guidance and explain what is expected in more detail.

Long ago we were required to disable SSL 2, and last year we were expected to disable SSL 3, the predecessor to TLS 1.0. That turned out to not be particularly hard or cause too many problems, because almost all systems that supported SSL 3 also supported TLS 1.0.

This time we are not so lucky. Many clients (such as browsers) and servers did not support TLS beyond version 1.0 until fairly recently. That means much more work is involved in meeting these new requirements than just changing some settings and restarting servers.

Almost every client (browser) and server that supports TLS 1.1 also supports TLS 1.2, and almost everything that doesn't support TLS 1.2 doesn't support TLS 1.1 either. So to keep things simpler here I'll just talk about TLS 1.0 vs. TLS 1.2 below, and TLS 1.1 can be assumed to apply as well where TLS 1.2 is mentioned.

At End Point we deploy, support, and host e-commerce sites for many customers, so I'll talk about the server side of this first. Note that servers can act as both server and client in TLS connections, since servers often make outgoing HTTPS connections as well as accepting incoming requests. Let's review the situation with each of the major Linux server operating systems.

Debian

Debian 8 is the current version, and supports TLS 1.2. It is scheduled to be supported until April 2020.

Debian 7 has planned support until May 2018, but unfortunately it does not support TLS 1.2.

Debian's support lifetime has historically depended on how quickly future releases come, but recently the project began to offer long-term support (LTS) for Debian 6, which was supposed to be at end of life, so it will be supported until February 2016. It also only supports TLS 1.0.

Ubuntu

Ubuntu's long-term support (LTS) server versions are supported for 5 years. Currently supported versions 12.04 and 14.04 both handle TLS 1.2.

Some sites are still using Ubuntu 10.04, which supports only TLS 1.0, but its support ended in April 2015, so it should not be used any longer in any case.

Red Hat Enterprise Linux (RHEL) and CentOS

Red Hat and CentOS are "enterprise" operating systems with a very long support lifetime of 10 years. Because that is so long, the oldest supported versions may become practically unusable due to changes in the world such as the deprecation of TLS 1.0.

RHEL/CentOS 7 is the current version, supported until June 2024. It supports TLS 1.2.

RHEL/CentOS 6 is supported until November 2020. It is mostly ok for TLS 1.2. One exception is that the bundled version of curl doesn't support TLS > 1.0 for some reason, so if you have applications making curl client calls to other systems, they may break without workarounds.

RHEL/CentOS 5 is the oldest version still supported, until March 2017, and it is very widely used, but it does not supprt TLS > 1.0.

Old server remediation

If you're on an older server that doesn't support TLS 1.2, the best thing to do is upgrade or migrate to a newer operating system, as soon as possible.

The common versions of OpenSSL that don't support TLS 1.2 also do not support Server Name Indication used for hosting multiple HTTPS sites on the same IP address. That is now becoming more commonly used since the thing holding back acceptance was old versions of Windows XP that didn't support it, and they are now are mostly dead. You can control whether you need SNI on the server side, avoiding it by continuing to get a separate IP address for each HTTPS site you host. But when you are a client of someone else's service that requires SNI, you'll wish you had it.

So migrate. That's easier said than done, of course. Moving to a new OS version involves a lot of new system library versions, language versions, web server version, etc. Some things aren't compatible. It takes work and time. That's life, so accept it and move ahead. Advocate it, schedule it, do it. But in the meantime, if you must cope with old servers, there are some not entirely terrible options.

You could use plain HTTP on a local private network to talk to a newer server running stunnel or an nginx proxy to do the TLS layer, or use a VPN if you have no private network.

You can use CDN in front of your site, which will certainly support TLS 1.2, and covers the path between the end user and the CDN, at least.

You can build your own versions of OpenSSL, any libraries that link to OpenSSL such as curl or wget, Apache or nginx, etc. This is tempting but is a terrible option, because you are almost certain to not update this hand-crafted stack often enough in the future to protect against new vulnerabilities in it. Sidestepping the operating system's native package management for core infrastructure software like this is usually a mistake.

You could avoid that problem by using someone else's backported parallel-install packages of all that, if you can find some, and if you think they're trustworthy, and if they're going to maintain them so you can get later updates. I'm not familiar with anyone doing this, but it may be out there and could be hired for the right ongoing price.

But the best bet is to start planning your upgrade or migration as soon as possible.

Browser support for TLS 1.2

Of course the other half of the connection is the client, primarily end-users' web browsers. On Wikipedia is a very detailed table showing various browsers' support of various features, broken down by version, here: TLS support history of web browsers. My summary follows:

Google Chrome and Mozilla Firefox have been automatically updating themselves for a long time, so unless you or your system administrator have disabled the auto-update, they will work with TLS 1.2.

Internet Explorer 8, 9, and 10 support TLS 1.2, but it is disabled by default. Not until IE 11 can TLS 1.2 be counted on to work.

Apple Safari 7, 8, and 9 for Mac OS X support TLS 1.2.

The built-in browser on Android < 4.4 doesn't support TLS > 1.0, and Android 4.4 has TLS > 1.0 disabled by default, which is the same thing for most users. So anyone with Android < 5.0 will not be able to connect to your site unless they're using a separate and newer mobile browser such as Chrome and Firefox.

Browser support for TLS 1.0

I have not heard of any timelines being announced for browsers to disable TLS 1.0 yet. I suspect that's because there are still so many servers that only support TLS 1.0. But before too long we may start to see servers catch up and then I expect browsers will eventually disable TLS 1.0.

Other clients

There are too many non-browser clients to list here. We've already mentioned curl; there is also wget, and the web client libraries in Perl, Python, Ruby, and Java. PHP uses libcurl. NodeJS and Go are likely not from the operating system and newer than it, so may be more current. At any rate, some of those clients will be old and won't support TLS 1.2, so when other sites stop allowing TLS 1.0 connections, whatever you were talking to them for will stop working.

PCI DSS will require client applications to stop using TLS 1.0 also, which may mean that applications need to be configured to require TLS 1.2 for outgoing HTTPS connections.

Summary

Your systems need to stop supporting TLS 1.0 by June 2016 at the latest. Start planning the migration now! We are available to help our current and new clients test, assess needs, and plan upgrades and migrations.

Reference

Selectively firing Postgres triggers

Being able to disable Postgres triggers selectively can be an important skill when doing tasks like bulk updates, in which you only want a subset of the triggers on the table to be fired. Read below for the long explanation, but the TL;DR version of the best solution is to set a WHEN clause on the trigger you wish to skip, making it conditional on a variable such as session_replication_role, or application_name


CREATE TRIGGER mytrig AFTER INSERT ON foobar FOR EACH 
  ROW WHEN (current_setting('session_replication_role') <> 'local') EXECUTE PROCEDURE myfunc();
BEGIN;
SET LOCAL session_replication_role = 'local';
UPDATE foobar SET baz = 123;
COMMIT;

I decided to spin up a free Heroku "Hobby Dev" database to illustrate the solutions. Generating a test table was done by using the Pagila project, as it has tables which contain triggers. Heroku gives you a randomly generated user and database name. To install the Pagila schema, I did:

$ export H="postgres://vacnvzatmsnpre:2iCDp-46ldaFxgdIx8HWFeXHM@ec2-34-567-89.compute-1.amazonaws.com:5432/d5q5io7c3alx9t"
$ cd pagila-0.10.1
$ psql $H -q -f pagila-schema.sql
$ psql $H -q -f pagila-data.sql

Errors appeared on the import, but they can be safely ignored. One error was because the Heroku database does not have a user named "postgres", and the other error was due to the fact that the Heroku user is not a superuser. The data, however, was all intact. The sample data is actually quite funny, as the movie titles were semi auto-generated at some point. For example, seven random movie descriptions:

  • A Brilliant Panorama of a Madman And a Composer who must Succumb a Car in Ancient India
  • A Touching Documentary of a Madman And a Mad Scientist who must Outrace a Feminist in An Abandoned Mine Shaft
  • A Lackluster Reflection of a Eskimo And a Wretch who must Find a Fanny Pack in The Canadian Rockies
  • A Emotional Character Study of a Robot And a A Shark who must Defeat a Technical Writer in A Manhattan Penthouse
  • A Amazing Yarn of a Hunter And a Butler who must Defeat a Boy in A Jet Boat
  • A Beautiful Reflection of a Womanizer And a Sumo Wrestler who must Chase a Database Administrator in The Gulf of Mexico
  • A Awe-Inspiring Reflection of a Waitress And a Squirrel who must Kill a Mad Cow in A Jet Boat

The table we want to use for this post is named "film", and comes with two triggers on it, 'film_fulltext_trigger', and 'last_updated':

heroku=> \d film
                            Table "public.film"
        Column        |            Type             |       Modifiers
----------------------+-----------------------------+---------------------------------
 film_id              | integer                     | not null default 
                                                      nextval('film_film_id_seq'::regclass)
 title                | character varying(255)      | not null
 description          | text                        | 
 release_year         | year                        | 
 language_id          | smallint                    | not null
 original_language_id | smallint                    | 
 rental_duration      | smallint                    | not null default 3
 rental_rate          | numeric(4,2)                | not null default 4.99
 length               | smallint                    | 
 replacement_cost     | numeric(5,2)                | not null default 19.99
 rating               | mpaa_rating                 | default 'G'::mpaa_rating
 last_update          | timestamp without time zone | not null default now()
...
Triggers:
    film_fulltext_trigger BEFORE INSERT OR UPDATE ON film FOR EACH ROW EXECUTE 
       PROCEDURE tsvector_update_trigger('fulltext', 'pg_catalog.english', 'title', 'description')
    last_updated BEFORE UPDATE ON film FOR EACH ROW EXECUTE PROCEDURE last_updated()

The last_updated trigger calls the last_updated() function, which simply sets the last_update column to CURRENT_TIMESTAMP, which is often seen as its shorter-to-type form, now(). This is a handy metric to track, but there are times when you want to make changes and not update this field. A typical example is some sort of bulk change that does not warrant changing all the rows' last_update field. How to accomplish this? We need to ensure that the trigger does not fire when we do our UPDATE. The way many people are familiar with is to simply disable all triggers on the table. So you would do something like this:

BEGIN;
ALTER TABLE film DISABLE TRIGGER ALL;
UPDATE film SET rental_duration = 10;
ALTER TABLE film ENABLE TRIGGER ALL;
COMMIT;

When using Heroku, you are given a regular user, not a Postgres superuser, so the above will generate an error that looks like this:

ERROR:  permission denied: "RI_ConstraintTrigger_a_88776583" is a system trigger.

This is caused by the failure of a normal user to disable the internal triggers Postgres uses to maintain foreign key relationships between tables. So the better way is to simply disable the specific trigger like so:

BEGIN;
ALTER TABLE film DISABLE TRIGGER last_updated;
UPDATE film SET rental_duration = 10;
ALTER TABLE film ENABLE TRIGGER last_updated;
COMMIT;

This works on Heroku, but there are two major problems with the ALTER TABLE solution. First, the ALTER TABLE will take a very heavy lock on the entire table, meaning that nobody else will be able to access the table - even to read it! - until your transaction is complete (although Postgres 9.5 will reduce this lock!). The other problem with disabling triggers this way is that it is too easy to accidentally leave it in a disabled state (although the check_postgres program has a specific check for this!). Let's take a look at the lock, and double check that the trigger has been disabled as well:

heroku=> SELECT last_update FROM film WHERE film_id = 123;
        last_update         
----------------------------
 2015-06-21 16:38:00.891019
heroku=> BEGIN;
heroku=> ALTER TABLE film DISABLE TRIGGER last_updated;
heroku=> SELECT last_update FROM film WHERE film_id = 123;
heroku=> UPDATE film SET rental_duration = 10;
-- We need the subselect because we share with a gazillion other Heroku databases!
heroku=> select relation::regclass,mode,granted from pg_locks where database = 
heroku->   (select oid from pg_database where datname = current_database());
 relation |        mode         | granted 
----------+---------------------+---------
 pg_locks | AccessShareLock     | t
 film     | RowExclusiveLock    | t
 film     | AccessExclusiveLock | t  ## This is a very heavy lock!
## Version 9.5 and up will have a ShareRowExclusive lock only!
heroku=> ALTER TABLE film ENABLE TRIGGER last_updated;
heroku=> COMMIT;

-- This is the same value, because the trigger did not fire when we updated
heroku=> select last_update FROM film WHERE film_id = 123;
        last_update         
----------------------------
 2015-06-21 16:38:00.891019

What we really want is to use the powerful session_replication_role parameter to safely disable the triggers. The problem is that the canonical way to disable triggers, by setting session_replication_role to 'replica', will disable ALL triggers and rules, for ALL tables. This is not wanted. In our example, we want to stop the last_updated trigger from firing, but also want all the other user triggers to fire, as well as the hidden system triggers that are enforcing foreign key referential integrity.

You can set session_replication_role to one of three values: origin (the default), local, and replica. Setting it to "replica" is commonly used in replication systems such as Bucardo and Slony to prevent all rules and triggers from firing. It can also be used for careful bulk loading. Only triggers explicitly set as "replica triggers" will fire when the session_replication_role is set to 'replica'. The local setting is a little harder to understand, as it does not have a direct mapping to a trigger state, as 'origin' and 'replica' do. Instead, it can be thought of as an alias to 'origin' - same functionality, but with a different name. What use is that? Well, you can check the value of session_replication_role and do things differently depending on whether it is 'origin' or 'local'. Thus, it is possible to teach a trigger that it should not fire when session_replication_role is set to 'local' (or to fire only when it is set to 'local').

Thus, our previous problem of preventing the last_updated trigger from firing can be solved by careful use of the session_replication_role. We want the trigger to NOT fire when session_replication_role is set to 'local'. This can be accomplished in two ways: modification of the trigger, or modification of the underlying function. Each has its strengths and weaknesses. Note that session_replication_role can only be set by a superuser, which means I'll be switching from Heroku (which only allows connecting as a non-superuser) to a local Pagila database.

For the modify-the-function route, add a quick block at the top to short-circuit the trigger if the session_replication_role (srr) is set to 'local'. An advantage to this method is that all triggers that invoke this function will be affected. In the pagila database, there are 14 tables that have a trigger that calls the last_updated function. Another advantage is that the exception to the function firing is clearly visible in the functions definition itself, and thus easy to spot when you examine the function. Here is how you would modify the last_updated function to only fire when in 'local' srr mode:

CREATE OR REPLACE FUNCTION public.last_updated()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $bc$
BEGIN
  IF current_setting('session_replication_role') = 'local' THEN
    RETURN NEW;
  END IF;

  NEW.last_update = CURRENT_TIMESTAMP;
  RETURN NEW;
END
$bc$;

To invoke it, we change session_replication_role (temporarily!) to 'local', then make our changes. Observe how the value of last_update does not change when we are in 'local' mode:

pagila=# show session_replication_role \t\g
 origin

pagila=# begin;
BEGIN
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:00.711411

pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:03.543831
pagila=# commit;
COMMIT

pagila=# begin;
BEGIN
pagila=# set LOCAL session_replication_role = 'local';
SET
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:03.543831
pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:03.543831
pagila=# commit;
COMMIT

pagila=# show session_replication_role;
 origin

The second method for skipping a trigger by using session_replication_role is to modify the trigger definition itself, rather than changing the function. This has the advantage of not having to touch the function at all, and also allows you to see that the trigger has been modified when doing a \d of the table. Using ALTER TRIGGER only allows a rename, so we will need to drop and recreate the trigger. By adding a WHEN clause to the trigger, we can ensure that it does NOT fire when session_replication_role is set to 'local'. The SQL looks like this:

pagila=# begin;
BEGIN
pagila=# drop trigger last_updated ON film;
DROP TRIGGER
pagila=# create trigger last_updated before update on film for each row 
pagila-#   when (current_setting('session_replication_role') <> 'local') execute procedure last_updated();
CREATE TRIGGER
pagila=# commit;
COMMIT

Voila! As before, we can test it out by setting session_replication_role to 'local' and confirming that the function does not modify the last_update column. Before doing that, let's also change the function back to its original form, to keep things honest:

-- Restore the original version, with no session_replication_role logic:
pagila=# CREATE OR REPLACE FUNCTION public.last_updated() RETURNS TRIGGER LANGUAGE plpgsql
AS $bc$ BEGIN NEW.last_update = CURRENT_TIMESTAMP; RETURN NEW; END $bc$;
CREATE FUNCTION

-- Normal update will change the last_update column:
pagila=# select last_update from film where film_id = 203;
        last_update         
----------------------------
 2015-06-21 16:38:00.121011

pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
        last_update         
----------------------------
 2015-06-21 16:38:03.011004

pagila=# begin;
pagila=# set LOCAL session_replication_role = 'local';
SET
pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
        last_update         
----------------------------
 2015-06-21 16:38:03.011004

-- Show that we are not holding a heavy lock:
pagila=# select relation::regclass,mode,granted from pg_locks where relation::regclass::text = 'film';
 relation |       mode       | granted 
----------+------------------+---------
 film     | AccessShareLock  | t
 film     | RowExclusiveLock | t

pagila=# commit;
COMMIT

Those are the three main ways to selectively disable a trigger on a table: using ALTER TABLE to completely disable it (and invoking a heavy lock), having the function check session_replication_role (affects all triggers using it, requires superuser), and having the trigger use a WHEN clause (requires superuser). Sharp readers may note that being a superuser is not really required, as something other than session_replication_role could be used. Thus, a solution is to use a parameter that can be changed by anyone, that will not affect anything else, and can be set to a unique value. Here is one such solution, using the handy "application_name" parameter. We will return to the Heroku database for this one:

heroku=> drop trigger last_updated on film;
heroku=> create trigger last_updated before update on film for each row 
  when (current_setting('application_name') <> 'skiptrig') execute procedure last_updated();

heroku=> select last_update from film where film_id = 111;
 2015-06-21 16:38:00.365103
heroku=> update film set rental_duration = 10 WHERE film_id = 111;
UPDATE 1
heroku=> select last_update from film where film_id = 111;
 2015-06-21 16:38:03.101115

heroku=> begin;
BEGIN
heroku=> set LOCAL application_name = 'skiptrig';
SET
heroku=> update film set rental_duration = 10 WHERE film_id = 111;
UPDATE 1
heroku=> select last_update from film where film_id = 111;
 2015-06-21 16:38:03.101115

-- Show that we are not holding a heavy lock:
heroku=> select relation::regclass,mode,granted from pg_locks where database = 
heroku->   (select oid from pg_database where datname = current_database());
 relation |       mode       | granted 
----------+------------------+---------
 film     | AccessShareLock  | t
 film     | RowExclusiveLock | t

heroku=> commit;
COMMIT

So there you have it - four solutions to the problem of skipping a single trigger. Which to use depends on your circumstances. I prefer the WHEN + session_replication_role option, as it forces you to be a superuser, and is very visible when looking at the trigger via \d.

How fast is pg_upgrade anyway?

Back in the old days, upgrading Postgres required doing a pg_dump and loading the resulting logical SQL into the new database. This could be a very slow, very painful process, requiring a lot of downtime. While there were other solutions (such as Bucardo) that allowed little (or even zero) downtime, setting them up was a large complex task. Enter the pg_upgrade program, which attempts to upgrade a cluster with minimal downtime. Just how fast is it? I grew tired of answering this question from clients with vague answers such as "it depends" and "really, really fast" and decided to generate some data for ballpark answers.

Spoiler: it's either about 3.5 times as fast as pg_dump, or insanely fast at a flat 15 seconds or so. Before going further, let's discuss the methodology used.

I used the venerable pgbench program to generate some sample tables and data, and then upgraded the resulting database, going from Postgres version 9.3 to 9.4. The pgbench program comes with Postgres, and simply requires an --initialize argument to create the test tables. There is also a --scale argument you can provide to increase the amount of initial data - each increment increases the number of rows in the largest table, pgbench_accounts, by one hundred thousand rows. Here are the scale runs I did, along with the number of rows and overall database size for each level:

Effect of --scale
--scaleRows in pgbench_accountsDatabase size
10010,000,0001418 MB
15015,000,0002123 MB
20020,000,0002829 MB
25025,000,0003535 MB
30030,000,0004241 MB
35035,000,0004947 MB
40040,000,0005652 MB
45045,000,0006358 MB
50050,000,0007064 MB
55055,000,0007770 MB
60060,000,0008476 MB

To test the speed of the pg_dump program, I used this simple command:

$ pg_dump postgres | psql postgres -q -p 5433 -f -

I did make one important optimization, which was to set fsync off on the target database (version 9.4). Although this setting should never be turned off in production - or anytime you cannot replace all your data, upgrades like this are an excellent time to disable fsync. Just make sure you flip it back on again right away! There are some other minor optimizations one could make (especially boosting maintenance_work_mem), but for the purposes of this test, I decided that the fsync was enough.

For testing the speed of pg_upgrade, I used the following command:

$ pg_upgrade -b $BIN1 -B $BIN2 -d $DATA1 -D $DATA2 -P 5433

The speed difference can be understood because pg_dump rewrites the entire database, table by table, row by row, and then recreates all the indexes from scratch. The pg_upgrade program simply copies the data files, making the minimum changes needed to support the new version. Because of this, it will always be faster. How much faster depends on a lot of variables, e.g. the number and size of your indexes. The chart below shows a nice linear slope for both methods, and yielding on average a 3.48 increase in speed of pg_upgrade versus pg_dump:

pg_dump versus pg_upgrade
--scaleDatabase sizepg_dump
(seconds)
pg_upgrade
(seconds)
Speedup
1001.4 GB210.074.72.82
1502.1 GB305.079.43.86
2002.8 GB457.6122.23.75
2503.5 GB636.1172.13.70
3004.2 GB832.2215.13.87
3504.9 GB1098.8320.73.43
4005.7 GB1172.7361.43.25
4506.4 GB1340.2426.73.15
5007.1 GB1509.6476.33.17
5507.8 GB1664.0480.03.47
6008.5 GB1927.06073.17

If you graph it out, you can see both of them having a similar slope, but with pg_upgrade as the clear winner:

I mentioned earlier that there were some other optimizations that could be done to make the pg_dump slightly faster. As it turns out, pg_upgrade can also be made faster. Absolutely, beautifully, insanely faster. All we have to do is add the --link argument. What this does is rather than copying the data files, it simply links them via the filesystem. Thus, each large data file that makes up the majority of a database's size takes a fraction of a second to link to the new version. Here are the new numbers, generated simply by adding a --link to the pg_upgrade command from above:

pg_upgrade --link is crazy fast
--scaleDatabase sizepg_upgrade --link
(seconds)
1001.4 GB12.9
1502.1 GB13.4
2002.8 GB13.5
2503.5 GB13.2
3004.2 GB13.6
3504.9 GB14.4
4005.7 GB13.1
4506.4 GB13.0
5007.1 GB13.2
5507.8 GB13.1
6008.5 GB12.9

No, those are not typos - an average of thirteen seconds despite the size of the database! The only downside to this method is that you cannot access the old system once the new system starts up, but that's a very small price to pay, as you can easily backup the old system first. There is no point in graphing these numbers out - just look at the graph above and imagine a nearly flat line traveling across the bottom of the graph :)

Are there any other options that can affect the time? While pgbench has a handy --foreign-keys argument I often use to generate a more "realistic" test database, both pg_dump and pg_upgrade are unaffected by any numbers of foreign keys. One limitation of pg_upgrade is that it cannot change the --checksum attribute of a database. In other words, if you want to go from a non-checksummed version of Postgres to a checksummed version, you need to use pg_dump or some other method. On the plus side, my testing found negligible difference between upgrading a checksummed versus a non-checksummed version.

Another limitation of the pg_upgrade method is that all internal stats are blown away by the upgrade, so the database starts out in a completely unanalyzed state. This is not as much an issue as it used to be, as pg_upgrade will generate a script to regenerate these stats, using the handy --analyze-in-stages argument to vacuum. There are a few other minor limitations to pg_upgrade: read the documentation for a complete list. In the end, pg_upgrade is extraordinarily fast and should be your preferred method for upgrading. Here is a final chart showing the strengths and weaknesses of the major upgrade methods.

Postgres upgrade methods compared
MethodStrengthsWeaknesses
pg_dump
  • Always works
  • Battle tested
  • Slowest method
  • Maximum downtime
  • Requires lots of disk space
pg_upgrade
  • Very fast
  • --link mode super fast
  • Cannot always be used (finicky)
  • Stats are lost
  • Minimal but non-zero downtime
Bucardo
  • Handles complex cases
  • Zero-downtime possible
  • Complex to setup
  • Requires primary keys on large tables
  • Requires lots of disk space

(As an addendum of sorts, pg_upgrade is fantastic, but the Holy Grail is still out of sight: true in-place upgrades. This would mean dropping in a new major version (similar to the way revisions can be dropped in now), and this new version would be able to read both old and new data file formats, and doing an update-on-write as needed. Someday!)