Welcome to End Point’s blog

Ongoing observations by End Point people

FOSDEM conference day 1

It's my first day of my first time at FOSDEM and I really loved it. There's simply no other way to put it.

I guess that the main difference with other conferences is the incredibly huge and diverse amount of topics and talks that you can attend in a short amount of time (two days).

While there's all this choice you won't be able to attend all of them, so you have to pick the one which interests you more. Being a SysAdmin/DevOps person I was more geared toward these kind of talks and I found plenty!

Most of the talks were revolving around "the cloud", how to manage it, best practices and other similar aspects.

Now even if you don't happen to be a huge cloud infrastructure fan there's always good content to squeeze out of these incredibly skilled people and how each of them tackled and overcame their own set of problems.

I was particularly impressed by a few talks, namely the one about OpenLISP by Luigi Iannone which explained with good detail and extremely clearly some parts of this new complex geographically location-independent routing protocol. Amazing and what a perfect time for such a nifty tool now that (maybe) finally IPv6 will start to be more widely used.

Then there was also the GlusterFS-centric talk from Niels de Vos which again was very helpful in having a good primer about this grat technology tool which I'll soon use in one of our internal experiments.

I also attended really good technical talks about Apache Mesos (Dave Lester), infrastructure management with Juju (Charles Butler), which were all good starting point to have an initial idea on these technologies or approach for people who could have never dealt with them before.

On the other end of the spectrum there also also some more abstract interesting ones which gave interesting ideas to start evaluating different high level approaches like the Consuming and contributing to open source infrastructure mangement code (Spencer Krum) one or the IAAS Beyond Infrastructure (Flavio Percoco).

All the talks I attended too were great and well organized, at least if you can bear with a huge flow of people moving around and squeezing you here and there because with all these good topics it's easy to imagine how many people were attracted here.

All in all a really great experience all around and it's not yet finished.

So now back to bed early as tomorrow there's another long list of interesting topics I'll try to chase during day 2.

A few PostgreSQL tricks

We ran into a couple of interesting situations recently, and used some helpful tricks to solve them, which of course should be recorded for posterity.

Unlogged tables

One of our customers needed a new database, created as a copy of an existing one but with some data obscured for privacy reasons. We could have done this with a view or probably any of several other techniques, but in this case, given the surrounding infrastructure, a new database, refreshed regularly from the original, was the simplest method. Except that with this new database, the regular binary backups became too large for the backup volume in the system. Since it seemed silly to re-provision the backup system (and talk the client into paying for it) to accommodate data we could throw away and recalculate at any time, we chose unlogged tables as an alternative.

"Unlogged," in this case, means changes to this table aren't written in WAL logs. This makes for better performance, but also means if the database crashes, these tables can't be recovered in the usual way. As a side effect, it also means these tables aren't copied via WAL-based replication, so the table won't show up in a hot standby system, for instance, nor will the table appear in a system restored from a WAL-based backup (pg_dump will still find them). Unlogged tables wouldn't give our application much of a performance boost in this case — the improved performance applies mostly to queries that modify the data, and ours were meant to be read-only. But before this change, the regular refresh process generated all kinds of WAL logs, and now they've all disappeared. The backups are therefore far smaller, and once again fit within the available space. Should the server crash, we'll have a little more work to do, regenerating these tables from their sources, but that's a scripted process and simple to execute.

Stubborn vacuum

Another fairly new customer has a database under a heavy and very consistent write load. We've had to make autovacuum very aggressive to keep up with bloat in several tables. When the vacuum process happens to clean all the tuples from the end of a table file, it tries to shrink the file and reclaim disk space, but it has to obtain a brief exclusive lock to do it. If it can't get one1, it gives up, and emits a log message you'll see if you are vacuuming in verbose mode:

INFO:  "some_big_table": stopping truncate due to conflicting lock request

Note that though the log message calls this process "truncating", it should not be confused with the "TRUNCATE TABLE" command, which (locks permitting) would reclaim quite a bit more disk space than we want it to. Anyway, when the shrinking operation succeeds, there is no log message, so if VACUUM VERBOSE doesn't say anything about "stopping truncate", it's because it was able to get its lock and shrink the table, or the table didn't need it in the first place. Because of this database's tendency to bloat, we'd like vacuum to be able to shrink tables regularly, but the query load is such that for some tables, it never gets the chance. We're working on mitigating that, but in the meantime, one stop-gap solution is to run VACUUM VERBOSE in a tight loop until you don't see one of those "stopping truncate" messages. In our case we do it like this:

for table in $problematic_tables; do
    echo "Vacuuming $table"
    ( timeout $timeout bash <<VACUUM
        while :; do
            vacuumdb -v -t $table $my_database 2>&1 | grep "stopping truncate" || break
) || echo "Timed out on table $table"

This script iterates through a list of tables we'd like to shrink, and vacuums each repeatedly, as quickly as possible, until the vacuum process fails to emit a "stopping truncate" message, or it finds it has spent eight minutes2 trying. Of course this whole technique is only useful in a few limited cases, but for our purposes we've found it helpful for managing bloat while we continue to work on the query patterns to reduce locking overall.

1. In version 9.0 and before, it simply tries to obtain the lock once. In 9.1 and later versions, it tries every few milliseconds for up to five seconds to obtain the lock.
2. There's nothing magic about eight minutes, it just works out well enough for our purposes.

Shrink XFS partition: Almost possible with LVM

If you happen to have reached this page because you're trying to shrink an XFS filesystem let's put things straight: sorry, that's not possible.

But before you go away you should know there's still hope and I'd like to show you how I used a little workaround to avoid reinstalling a Red Hat Enterprise Linux 7 or CentOS 7 VM using XFS dump/restore on-the-fly and LVM capabilities, both standard choices for the regular RHEL/CentOS 7 setup.

First of all let's clarify the situation I found myself in. For various reasons I had a CentOS 7 VM with everything already configured and working, installed not many days ago to test new software we're evaluating.

The VM itself is hosted on a dedicated server we manage on our own, so I had a certain degree of freedom to what I could do without the need of paying any additional fee. You may not be in this same situation, but you can probably try some similar solution for little money if you're using an "hourly-billed" VPS provider.

The problem was that, even if everything was working and configured, the virtual hard disk device attached to the machine was too big and on the wrong storage area of the virtualization hosting server.

There was also another minor glitch: The VM itself was using an old virtualization device driver (IDE-based) instead of the new VIRTIO one. Since I knew that the virtualized OS CentOS 7 is capable of using VIRTIO based devices I also took the chance to fix this.

Unfortunately, XFS is not capable of being shrunk at the moment (and for the foreseeable future) so what I needed to do was to:

  1. add a new virtual device correctly using VIRTIO and stored in the right storage area of the virtualization host server
  2. migrate all the filesystems to the new virtual device
  3. set the VM OS to be working from the newly-built partition
  4. dismiss the old virtual device

In my specific case this translated to connect to the virtualization hosting server, create a new LVM logical volume to host the virtual disk device for the VM and then add the new virtual device to the VM configuration. Unfortunately in order to have the VM see the new virtual device I had to shut it down.

While connected to the virtualization host server I also downloaded and attached to the VM the latest ISO of SysRescueCD which is a data rescue specialized Linux distribution. I'm specifically using this distro since it's one of the few which offers the XFS dump/restore tools on the live ISO.

Now the VM was ready to be booted with the SysRescueCD Live OS and then I could start working my way through all the needed fixes. If you're doing something similar, of course please make sure you have offsite backups and have double-checked that they're readable before doing anything else.

First of all inspect your dmesg output and find what is the source virtual device and what's the new target virtual device. In my case the source was /dev/sda and the target was /dev/vda

dmesg | less

Then create a partition on the new device (eg: /dev/vda1) as Linux type for the /boot partition; this should be of the same size as the source /boot partition (eg: /dev/sda1) and dedicate all the remaining space to a new LVM type partition (eg: /dev/vda2)

fdisk /dev/vda
# [create /boot and LVM partitions]

You could also mount and copy the /boot files or re-create them entirely if you need to change the /boot partition size. Since I kept /boot exactly the same so I could use ddrescue (a more verbose version of classic Unix dd).

ddrescue /dev/sda1 /dev/vda1

The next step is supposed to migrate the MBR and should be working but in my case the boot phase kept failing so I also needed to reinstall the bootloader via the CentOS 7 rescue system (not covered in this tutorial but briefly mentioned near the end of the article).

ddrescue -i0 -s512 /dev/sda /dev/vda

Then create the target LVM volumes.

pvcreate /dev/vda2
vgcreate fixed_centos_VG /dev/vda2
lvcreate -L 1G -n swap fixed_centos_VG
lvcreate -l 100%FREE -n root fixed_centos_VG
vgchange -a y fixed_centos_VG

Create the target XFS filesystem.

mkfs.xfs -L root /dev/fixed_centos_VG/root

And then create the swap partition.

mkfs.swap /dev/fixed_centos_VG/swap

Next create the needed mountpoints and mount the old source and the new empty filesystems.

mkdir /mnt/disk_{wrong,fixed}
mount /dev/fixed_centos_VG/root /mnt/disk_fixed
vgchange -a y wrong_centos_VG
mount /dev/centos_VG/root /mnt/disk_wrong

Now here's the real XFS magic. We'll use xfsdump and xfsrestore to copy the filesystem content (files, directory, special files) without having to care about files permission, type, extended ACLs or anything else. Plus since it's only moving the content of the filesystem we won't need to have a partition of the same size and it won't take as long as copying the entire block device as the process will just have to go through real used space.

xfs_dump -J - /mnt/disk_wrong | xfs_restore -J - /mnt/disk_fixed

If you want a more verbose output, leave out the -J option. After the process is done, be sure to carefully verify that everything is in place in the new partition.

ls -lhtra /mnt/disk_fixed/

Then unmount the disks and deactivate the LVM VGs.

umount /mnt/disk_{old,fixed}
vgchange -a n centos_VG
vgchange -a n fixed_centos_VG

At this point in order to avoid changing anything inside the virtualized OS (fstab, grub and so on), let's remove the old VG and rename the newer one with the same name the old one had.

vgremove centos_VG
pvremove /dev/sda2
vgrename {fixed_,}centos_VG

You should be now able to shutdown the VM again, detach the old disk and start the new VM which will be using the new smaller virtual device.

If the boot phase keeps failing, boot the CentOS installation media in rescue mode and after chroot-ing inside your installation run grub-install /dev/vda (targeting your new main device) to reinstall grub.

Only after everything is working as expected, proceed to remove the old unneeded device and remove it from the virtualization host server.

Social Innovation Summit 2014

In November, Josh Ausborne and I set up a Liquid Galaxy at the Sofitel Hotel in Redwood City for the 2014 Social Innovation Summit. Prior to the event the End Point content team worked together with Chris Busselle and Sasha Buscho from to create presentations featuring 10 grantee organizations.

With the Liquid Galaxy we were able to feature “Street View” panoramas of people enjoying the High Line in New York City, penguins standing on the shoreline for Penguin Foundation, and seals swimming underwater for Conservation International. The touchscreen and space navigator control device enabled users to view 360 degrees of the landscape as if they had been teleported to each location.

Displaying 2014-SIS-8x-114.jpg
image credit : Jensen Sutta

I was thrilled to see the team in front of the system sharing the larger narrative associated with each project. This highlights one of the many opportunities of the Liquid Galaxy - the opportunity to share, explore and collaborate in real time, in front of an immersive array of screens.

image credit : Chris Busselle

One of the prepared presentations highlighted a data collaboration with Polaris to fight human trafficking. With Google Earth, Chris Busselle identified a location in India that showed brick firing kilns. Chris navigated through this particular landscape in India and shared how these bricks are manufactured by enslaved workers. This was yet another revelation for me as I recognized how a satellite image overlayed in a virtual environment can be a touchpoint for a narrative which spans the globe.

As you might guess, it was a pleasure to work with the team ( we even got a hug at the end! ). The team’s passion for their work and the impact is having through technology is undeniable. I am excited to recognize this event as yet another proof in principle for how the Liquid Galaxy and Google Earth act as tools for awareness which can inspire positive change in the “real” world.

Displaying betterworld.png

Getting realtime output using Python Subprocess

The Problem

When I launch a long running unix process within a python script, it waits until the process is finished, and only then do I get the complete output of my program. This is annoying if I'm running a process that takes a while to finish. And I want to capture the output and display it in the nice manner with clear formatting.

Using the subprocess and shlex library

Python has a “batteries included” philosophy. I have used 2 standard libraries to solve this problem.
import subprocess 
import shlex 
  • subprocess - Works with additional processes
  • shlex - Lexical analysis of shell-style syntaxes


To run a process and read all of its output, set the stdout value to PIPE and call communicate().
import subprocess
process = subprocess.Popen(['echo', '"Hello stdout"'], stdout=subprocess.PIPE)
stdout = process.communicate()[0]
print 'STDOUT:{}'.format(stdout)
The above script will wait for the process to complete and then it will display the output. So now we are going to read the stdout line by line and display it in the console untill it completes the process.
output = process.stdout.readline()
This will read a line from the stdout.
The poll() method will return
  • the exit code if the process is completed.
  • None if the process is still running.
while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
        if output:
            print output.strip()
    rc = process.poll()
The above will loop and keep on reading the stdout and check for the return code and displays the output in real time.
I had one more problem in parsing the shell commands to pass it to popen when I set the shell=False. Below is an example command:
rsync -avzXH --delete --exclude=*.swp --exclude=**/drivers.ini /media/lgisos/lg.iso root@42-a:/isodevice
To split the string using shell-like syntax I have used shlex library's split method.

Here is the final code looks like

def run_command(command):
    process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE)
    while True:
        output = process.stdout.readline()
        if output == '' and process.poll() is not None:
        if output:
            print output.strip()
    rc = process.poll()
    return rc

Postgres session_replication role - Bucardo and Slony's powerful ally

One of the lesser known Postgres parameters is also one of the most powerful: session_replication_role. In a nutshell, it allows you to completely bypass all triggers and rules for a specified amount of time. This was invented to allow replication systems to bypass all foreign keys and user triggers, but also can be used to greatly speed up bulk loading and updating.

(Triggerfish picture by Shayne Thomas)

The problem with disabling triggers

Once upon a time, there were two replication systems, Slony and Bucardo, that both shared the same problem: triggers (and rules) on a "target" table could really mess things up. In general, when you are replicating table information, you only want to replicate the data itself, and avoid any side effects. In other words, you need to prevent any "post-processing" of the data, which is what rules and triggers may do. The disabling of those was done in a fairly standard, but very ugly method: updating the system catalogs for the tables in question to trick Postgres into thinking that there were no rules or triggers. Here's what such SQL looks like in the Bucardo source code:

$SQL = q{
    UPDATE pg_class
    SET    reltriggers = 0, relhasrules = false
    WHERE  (
$SQL .= join "OR\n"
    => map { "(oid = '$_->{safeschema}.$_->{safetable}'::regclass)" }
      grep { $_->{reltype} eq 'table' }
$SQL .= ')';

This had a number of bad side effects. First and foremost, updating the system catalogs is never a recommended step. While it is *possible*, it is certainly discouraged. Because access to the system catalogs do not follow strict MVCC rules, odd things can sometimes happen. Another problem is that editing the system catalogs causes locking issues, as well as bloat on the system tables themselves. Yet another problem is that it was tricky do get this right; even the format of the system catalogs change over time, so that your code would need to have alternate paths for disabling and enabling triggers depending on the version of Postgres in use. Finally, the size of the SQL statements needed grew with the number of tables to be replicated: in other words, you had to specifically disable and enable each table. All in all, quite a mess.

The solution to disabling triggers

The solution was to get away from editing the system catalogs altogether, and provide a cleaner way to temporarily disable all triggers and rules on tables. Jan Wieck, the inventor of Slony, wrote a new user parameter and named it "session_replication_role". As you can tell by the name, this is a session-level setting. In other words, only the current user will see the effects of setting this, and it will last as long as your session does (which is basically equivalent to as long as you are connected to the database). This setting applied to all tables, and can be used to instruct Postgres to not worry about triggers or rules at all. So the new code becomes:

$SQL = q{SET session_replication_role TO 'replica'};

Much cleaner, eh? (you may see session_replication_role abbreviated as s_r_r or simply srr, but Postgres itself needs it spelled out). You might have noticed that we are setting it to 'replica', and not 'on' and 'off'. The actual way this parameter works is to specify which types of triggers should be fired. Previous to this patch, triggers were all of one type, and the only characteristic they could have was "enabled" or "disabled". Now, a trigger can have one of four states: origin, always, replica, or disabled (stored in the 'tgenabled' field of the pg_trigger table as 'O', 'A', 'R', or 'D'). By default, all triggers that are created are of type 'origin'. This applies to the implicitly created system triggers used by foreign keys as well. Thus, when session_replication_role is set to replica, only triggers of the type 'replica' will fire - and not the foreign key enforcing ones. If you really need a user trigger to fire on a replica (aka target) table, you can adjust that trigger to be of type replica. Note that this trigger will *only* fire when session_replication_role is set to replica, and thus will be invisible in day to day use.

Once the replication is done, session_replication_role can be set back to the normal setting like so:

$SQL = q{SET session_replication_role TO 'origin'};

You can also set it to DEFAULT, which in theory could be different from origin as one can set the default session_replication_role to something other than origin inside of the postgresql.conf file. However, it is much cleaner to always specify the exact role you want; I have not come across a use case that required changing the default from origin.

This feature only exists in Postgres 8.3 or better. As such, Bucardo still contains the old system catalog manipulation code, as it supports versions older than 8.3, but it uses session_replication_role whenever possible. Slony always uses one or the other, as it made session_replication_role a backwards-incompatible change in its major version. Thus, to replicate versions of Postgres before 8.3, you need to use the older Slony 1.2

There are some good use cases other than a replication system for using this feature. The most common is simply bulk loading or bulk updating when you do not want the effects of the triggers, or simply do not want the performance hit. Remember that system triggers are disabled as well, so use this with care (this is one of the reasons you must be a superuser to change the session_replication_role parameter).

What if you are not a superuser and need to disable triggers and/or rules? You could create a wrapper function that runs as a superuser. The big downside to that is the all-or-nothing nature of session_replication_role. Once it is changed, it is changed for *everything*, so handing that power to a normal user could be dangerous. My colleague Mark Johnson came up with another great solution: a function that runs as the superuser, and does the old-style system catalog manipulations, but uses an ingenious foreign key trick to ensure that the matching "enable" function *must* be run. Great for fine-grained control of table triggers.

You might wonder about the other setting, "local". It's used mostly to have a third type of trigger, for times when you want "normal" triggers to fire, but want some way to differentiate from "origin" mode. Slony uses this setting when it does some of it DDL trickery, peruse the Slony documentation for more details.

Postgres will also show you what state a trigger is in when you are viewing a table using the "backslash-d" command inside of psql. Here are some examples. Remember that psql never shows "system-level" triggers, but they are there, as we shall see below. First, let's create two test tables linked by a foreign key, and a trigger with supporting function that raises a simple notice when fired:

greg=# create table foo (id int primary key);
greg=# create table bar(id int primary key, fooid int references foo(id));
greg=# insert into foo values (1),(2),(3);
greg=# insert into bar values (10,1), (11,2);

greg=# create function alertalert() returns trigger language plpgsql AS $$ BEGIN RAISE NOTICE 'cookie dough'; RETURN null; END $$;

greg=# create trigger mytrig after update on foo for each statement execute procedure alertalert();

Now that those are setup, let's see what psql shows us about each table:

greg=# \d foo
      Table ""
 Column |  Type   | Modifiers 
 id     | integer | not null
    "foo_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "bar" CONSTRAINT "bar_fooid_fkey" FOREIGN KEY (fooid) REFERENCES foo(id)

greg=# \d bar
      Table ""
 Column |  Type   | Modifiers 
 id     | integer | not null
 fooid  | integer | 
    "bar_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "bar_fooid_fkey" FOREIGN KEY (fooid) REFERENCES foo(id)

Everything looks good. Let's see the trigger in action:

greg=# update foo set id=id;
NOTICE:  cookie dough

Although the output of psql only shows a single trigger on the foo table, their are actually two others, created by the foreign key, which helps to enforce the foreign key relationship. We can see them by looking at the pg_trigger table:

greg=# select tgname, tgenabled, tgisinternal, tgconstraint from pg_trigger where tgrelid::regclass::text = 'foo';
            tgname            | tgenabled | tgisinternal | tgconstraint 
 RI_ConstraintTrigger_a_73776 | O         | t            |        45313
 RI_ConstraintTrigger_a_57179 | O         | t            |        45313
 mytrig                       | O         | f            |            0
(3 rows)

We can see that they are internal triggers (which prevents psql from showing them), and that they have an associated constraint. Let's make sure these triggers are doing their job by causing one of them to fire and complain that the underlying constraint is being violated:

## Try and fail to delete id 1, which is being referenced by the table bar:
greg=# delete from foo where id = 1;
ERROR:  update or delete on table "foo" violates foreign key constraint "bar_fooid_fkey" on table "bar"
DETAIL:  Key (id)=(1) is still referenced from table "bar".
## Check the name of the constraint referenced above by pg_trigger:
greg=# select conname, contype from pg_constraint where oid = 45313;
    conname     | contype 
 bar_fooid_fkey | f

Time to demonstrate the power and danger of the session_replication_role attribute. First let's set it to 'replica' and verify that all triggers fail to fire. We should be able to perform the "illegal" deletion we tried before, and an update should fail to raise any notice at all:

greg=# show session_replication_role;
greg=# set session_replication_role = 'replica';
greg=# delete from foo where id = 1;
greg=# update foo set id=id;
greg=# show session_replication_role;

Let's force our trigger to fire by setting it to replica:

greg=# alter table foo enable replica trigger mytrig;
greg=# \d foo
      Table ""
 Column |  Type   | Modifiers 
 id     | integer | not null
    "foo_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "bar" CONSTRAINT "bar_fooid_fkey" FOREIGN KEY (fooid) REFERENCES foo(id)
Triggers firing on replica only:
greg=# set session_replication_role = 'replica';
greg=# update foo set id=id;
NOTICE:  cookie dough

So what is the consequence of the above DELETE command? The foreign key relationship is now a lie, as there are rows in bar that do not point to a row in foo!

greg=# select * from bar where not exists (select 1 from foo where fooid =;
 id | fooid 
  1 |     1
(1 row)

Ouch! This shows why session_replication_role is such a dangerous tool (indeed, this is the primary reason it is only allowed to be changed by superusers). If you find yourself reaching for this tool, don't. But if you really have to, double-check everything twice, and make sure you always change it back to 'origin' as soon as possible.

Elastic Beanstalk Whenever

I recently got the opportunity to pick up development on a Ruby on Rails application that was originally setup to run on AWS using their Elastic Beanstalk deployment tools. One of our first tasks was to move some notification hooks out of the normal workflow into scripts and schedule those batch scripts using cron.

Historically, I've had extremely good luck with Whenever. In my previous endeavors I've utilized Capistrano which Whenever merged with seamlessly. With how simple it was to integrate Whenever with Capistrano, I anticipated a similar experience dealing with Elastic Beanstalk. While the integration was not as seamless as Capistrano, I did manage to make it work.

My first stumbling block was finding documentation on how to do after or post hooks. I managed to find this forum post and this blog post which helped me out a lot. The important detail is that there is a "post" directory to go along with "pre" and "enact", but it's not present by default, so it can be easy to miss.

I used Marcin's delayed_job config as a base. The first thing I had to address was an apparent change in Elastic Beanstalk's configuration structure. Marcin's config has

  . /opt/elasticbeanstalk/support/envvars
but that file doesn't exist on the system I was working on. With a small amount of digging, I found:
  . /opt/elasticbeanstalk/containerfiles/envvars
in one of the other ebextensions. Inspecting that file showed a definition for and exportation of $EB_CONFIG_APP_CURRENT suggesting this is a similar file just stored in a different location now.

Another change that appears to have occurred since Marcin developed his config is that directories will be created automatically if they don't already exist when adding a file in the files section of the config. That allows us to remove the entire commands section to simplify things.

That left me with a config that looked like:

  mode: "000755"
  owner: root
  group: root
  content: |
    #! /usr/bin/env bash
    . /opt/elasticbeanstalk/containerfiles/envvars
    su -c "cd $EB_CONFIG_APP_CURRENT; bundle exec whenever --update-cron" - $EB_CONFIG_APP_USER

This command completed successfully but on staging the cron jobs failed to run. The reason for that was an environment mismatch. The runner entries inside the cron commands weren't receiving a RAILS_ENV or other type of environment directive so they were defaulting to production and failing when no database was found.

After some greping I was able to find a definition for RACK_ENV in:

Making use of it, I came up with this final version:
  mode: "000755"
  owner: root
  group: root
  content: |
    #! /usr/bin/env bash
    . /opt/elasticbeanstalk/containerfiles/envvars
    . /opt/elasticbeanstalk/containerfiles/envvars.d/sysenv
    su -c "cd $EB_CONFIG_APP_CURRENT; bundle exec whenever --update-cron --set='environment=$RACK_ENV'" - $EB_CONFIG_APP_USER

CentOS 7 on Hetzner server with more than 2 TB disk

We use a variety of hosting providers for ourselves and our clients, including Hetzner. They provide good servers for a great price, have decent support, and we've been happy with them for our needs.

Recently I was given the task of building out a new development server for one of our clients, and we wanted it to be set up identically to another one of their servers but with CentOS 7. I placed the order for the hardware with Hetzner and then began the procedure for installing the OS.

Hetzner provides a scripted install process that you can kick off after booting the machine into rescue mode. I followed this process and selected CentOS 7 and proceeded through the whole process without a problem. After rebooting the server and logging in to verify everything, I noticed that the disk space was capped at 2 TB, even though the machine had two 3 TB drives in it (in hardware RAID 1). I looked at the partitions and found the partition table was "msdos". Ah ha!

At this point painful memories of running into this problem before hit me. I reviewed our notes of what we had done last time, and felt like it was worth a shot even though this time I'm dealing with CentOS 7. I went through the steps up to patching anaconda and then found that anaconda for CentOS 7 is newer and the files are different. I couldn't find any files that care about the partition table type, so I didn't patch anything.

I then tried to run the CentOS 7 install as-is. This only got me so far because I then ran into trouble with NetworkManager timing out and not starting.

screen shot of CentOS 7 installer failing
A screenshot of the CentOS 7 installer failing (anaconda) similar to what I was seeing.

Baffled, I looked into what may have been causing the trouble and discovered that the network was not set up at all and it looked as if no network interfaces existed. WHAT?? At this point I dug through dmesg and found that the network interfaces did indeed exist but udevd had renamed them. Ugh!

Many new Linux distributions are naming network interfaces based on their physical connection to the system: those embedded on the motherboard get named em1, em2, etc. Apparently I missed the memo on this one, as I was still expecting eth0, eth1, etc. And from all indications, so was NetworkManager because it could not find the network interfaces!

Rather than spend more time going down this route, I decided to change gears and look to see if there was any way to patch the Hetzner install scripts to use a GPT partition table with my install instead of msdos. I found and read through the source code for their scripts and soon stumbled on something that just might solve my problem. In the file /root/.oldroot/nfs/install/ I found mention of a config variable FORCE_GPT. If this is set to "1" then it will try to use a GPT partition table unless it thinks the OS won't like it, and it thinks that CentOS won't like it (no matter the version). But if you set FORCE_GPT to "2" it will use a GPT partition table no matter what. This config setting just needs to be added to the file you edit where you list out your partitions and LVM volumes.

FORCE_GPT 2                                                                                                      

PART /boot ext3 512M                                                                                             
PART lvm   vg0  all                                                                                              
LV  vg0  swap swap   swap  32G                                                                                   
LV  vg0  root  /     ext4 100G                                                                                   
LV  vg0  home  /home ext4 400G                                                                                   

I then ran the installer script and added the secret config option and... Bingo! It worked perfectly! No need to manually patch anything or install manually. And now we have a CentOS 7 server with full 3 TB of disk space usable.

(parted) print                                                            
Model: DELL PERC H710 (scsi)
Disk /dev/sda: 3000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: pmbr_boot

Number  Start   End     Size    File system  Name  Flags
 3      1049kB  2097kB  1049kB                     bios_grub
 1      2097kB  539MB   537MB   ext3
 2      539MB   3000GB  2999GB                     lvm

PGConf.US NYC: "Choosing a Logical Replication System"

I'm excited to have my talk "Choosing a Logical Replication System" accepted to PGConf.US! I'll be speaking on Friday, March 27th from 2:00 - 2:50, as part of the Strategy track.

In this talk I will cover a variety of existing Logical Replication systems for PostgreSQL and go over some of the differences between requirements, supported capabilities, and why you might choose one system over another. I'll also cover some of the changes in PostgreSQL 9.4.

Read about the talk here.

SSH one-time passwords (otpw) on chromebook

A little while ago, I bought a Chromebook as an alternative to my sturdy-but-heavy laptop. So far, it has been great - quick boot up, no fan, long battery life, and light as a feather. Perfect for bringing from room to room, and for getting some work done in a darkened bedroom at night. The one large drawback was a lack of SSH, a tool I use very often. I'll describe how I used one-time passwords to overcome this problem, and made my Chromebook a much more productive tool.

The options for using SSH on Chrome OS are not that good. I downloaded and tried a handful of apps, but each had some significant problems. One flaw shared across all of them was a lack of something like ssh-agent, which will cache your SSH passphrase so that you don't have to type it every time you open a new SSH session. An option was to use a password-less key, or a very short passphrase, but I did not want to make everything less secure. The storage of the SSH private key was an issue as well - the Chromebook has very limited storage options, and relies on putting most things "in the cloud".

What was needed was a way to use SSH in a very insecure environment, while providing as much security as possible. Eureka! A one-time password system is exactly what I needed. Specifically, the wonderful otpw program. Chromebooks have a simple shell (accessed via ctrl-alt-t) that has SSH support. So the solution was to use one-time passwords and not store anything at all on the Chromebook.

Rather than trying to get otpw setup on all the servers I might need to reach, I simply set it up on my main laptop, carefully allowed incoming SSH connections, and now I can ssh from my Chromebook to my laptop. From there, to the world. Best of all, when I ssh in, I can use the already running ssh-agent on the laptop! All it takes is memorizing a single passphrase and securing a sheet of paper (which is far easier to secure than an entire Chromebook :)

Here are some details on how I set things up. On the Chromebook, nothing is needed except to open up a crosh tab with ctrl-alt-t, and run ssh. On the laptop side, the first step is to install the otpw program, and then configure PAM so that it uses it:

$ sudo aptitude install otpw-bin
$ sudo cat >> /etc/pam.d/ssh
  auth     required
  session  optional

That is the bare minimum, but I also wanted to make sure that only 'local' machines could SSH in. While there are a number of ways to do this, such as iptables or /etc/hosts.allow, I decided the best approach was to configure sshd itself. The "Match" directive instructs that the lines after it only take effect on a positive match. Thus:

$ sudo cat >> /etc/ssh/sshd_config
AllowUsers nobodyatall
Match Address,
AllowUsers greg
$ service ssh restart

The next step is to create the one-time password list. This is done with the otwp-gen program; here is the command I use:

$ otpw-gen -e 30 | lpr
Generating random seed ...

If your paper password list is stolen, the thief should not gain access to your account with this information alone. Therefore, you need to memorize and enter below a prefix password. You will have to enter that each time directly before entering the one-time password (on the same line).

When you log in, a 3-digit password number will be displayed.  It identifies the one-time password on your list that you have to append to the prefix password. If another login to your account is in progress at the same time, several password numbers may be shown and all corresponding passwords have to be appended after the prefix password. Best generate a new password list when you have used up half of the old one.

Enter new prefix password: 
Reenter prefix password: 

Creating '~/.otpw'.
Generating new one-time passwords ...

The otpw-gen command creates a file named .otpw in your home directory, which contains the hash of all the one-time passwords to use. In the example above, the -e controls the entropy of the generated passwords - in other words, how long they are. otpw-gen will not accept an entropy lower than 30, which will generate passwords that are five characters long. The default entropy, 48, generates passwords that are eight characters long, which I found a little too long to remember when trying to read from the printout in a dark room. :). Rather than show the list of passwords on the screen, or save them to a local file, the output goes directly to the printer. otpw-gen does a great job of formatting the page, and it ends up looking like this:

Here are some close-ups of what the passwords look like at various entropies:

Sample output with a low entropy of 30:
OTPW list generated 2015-07-12 13:23 on gregsbox

000 GGS%F  056 bTqut  112 f8iJs  168 lQVjk  224 gNG2x  280 -x8ke  336 egm5n
001 urHLf  057 a/Wwh  113 -PEpV  169 9ABpK  225 -K2db  281 babfX  337 feeED
002 vqrX:  058 rZszx  114 r3m8a  170 -UzX3  226 g74RI  282 gusBJ  338 ;Tr4m
003 fa%6G  059 -i4FZ  115 nPEaJ  171 o64FR  227 uBu:h  283 uBo/U  339 ;pYY8
004 -LYZY  060 vWDnw  116 f5Sb+  172 hopr+  228 rWXvb  284 rksPQ  340 ;v6GN
Sample output with the default entropy of 48:
OTPW list generated 2015-15-05 15:53 on gregsbox

000 tcsx qqlb  056 ougp yuzo  112 lxwt oitl  168 giap vqsj  224 vtvk rjc/
001 mfui ukph  057 wbpw aktt  113 kert wozj  169 ihed psyx  225 ducx pze=
002 wwsj hdcr  058 jmwa mguo  114 idtk zrzw  170 ecow fepm  226 ikru hty+
003 aoeb klnz  059 pvie fbfc  115 fmlb sptb  171 ftrd jotb  227 mqns ivq:
004 yclw hyml  060 slvj ezfi  116 djsy ycse  172 butg guzm  228 pfyv ytq%
005 eilj cufp  061 zlma yxxl  117 skyf ieht  173 vbtd rmsy  229 pzyn zlc/
Sample output with a high entropy of 79:
OTPW list generated 2015-07-05 18:74 on gregsbox

000 jeo SqM bQ9Y ato  056 AyT jsc YbU0 rXB  112 Og/ I3O 39nY W/Z
001 AFk W+5 J+2m e1J  057 MXy O9j FjA8 8q;  113 a6A 8R9 /Ofr E4s
002 02+ XPB 8B2S +qT  058 Cl4 6g2 /9Bk KO=  114 HEK vd3 T2TT Rr.
003 Exb jqE iK49 rfX  059 Qhz eU+ J2VG kwQ  115 aJ7 tg1 dJsr vf.
004 Bg1 b;5 p0qI f/m  060 VKz dpa G7;e 7jR  116 kaL OSw dC8e kx.

The final step is to SSH from the Chromebook to the laptop! Hit ctrl-alt-t, and you will get a new tab with a crosh prompt. From there, attempt to ssh to the laptop, and you will see the usual otpw prompt:

$ ssh greg@
Password 140: 

So you type in the passphrase you entered above when running the otpw-gen command, then pull out your sheet of paper and look up the matching password next to number 140. Voila! I am now connected securely to my more powerful computer, and can SSH from there to anywhere I am used to going to from my laptop. I can even run mutt as if I were at the laptop! A nice workaround for the limitations of the Chromebook.

Cleaner redirection in Perl Dancer

Recently I worked on a project using the Perl web application framework Dancer that had multiple paths to order a product:

 /product => /cart => /checkout => /receipt

That's the standard approach. Then there was a "phone order" approach:

 /create_order => /checkout => /receipt

A "phone order" is one taken down (usually by phone), where the user who is logged in is not the same as the user who "owns" the order. Thus, one user is ordering on behalf of another: the order must be recorded as part of the second user's order history, the various shipping and billing information must come from that user's stored information, and even the product pricing has to be calculated as though that customer were doing the ordering rather than the logged-in user.

As a consequence, the phone order page flow actually ended up as:

 get /create_order => post /create_order => /checkout

The submission of the /create_order page was processed in an environment that knew about this "proxy" ordering arrangement, thus could do some particularly special-case processing, and then the idea was to pass off to the /checkout page, which would finalize the order including payment information.

All well and good, but when it came time to implement this, I was faced with a minor inconvenience and a bad choice:

Since /checkout was itself a POSTed page, I needed to reach that page with a set of form parameters in hand. So my original plan was:

 post '/create_order' => sub {
   ... # do my special-case processing, and then:
   forward '/checkout', { param1 => $value1, ... };

While this works, the problem is that "forward" as a Dancer directive doesn't interact with the browser: it just interrupts your path handling of "/create_order" and resumes at "/checkout". So the browser, innocent of these shenanigans, remains on "/create_order". It would be so much cleaner (darn my OCD!) if the browser ended up at "/checkout".

That means you need to redirect the request, though. I.e.,

 post '/create_order' => sub {
   ... # do my special-case processing, and then:
   redirect '/checkout';  # hmm, something's missing here

Hmm, redirect doesn't support a parameter hash. Oh, well, no problem:

   redirect url_for('/checkout', { param1 => $value1, ... });

That gets the job done, but at a price: now instead of a nice, clean URL at my final destination, I get:


So, still not right. Some research and mailing-list inquiries let me to:

Why doesn't HTTP have POST redirect?

Short version: you can't get there from here. Redirection is supposed to be "idempotent", meaning you can repeat them without harm. That's why when you refresh the page after a form submission, browsers will ask for permission to re-submit the form rather than just silently refreshing the page.

So what's the option? Well, I can think of two approaches here:

One: instead of redirecting with parameters, store the parameters in the session:

 post '/create_order' => sub {
   ... # do my special-case processing
   session 'create_order_for_checkout' => { param1 => $value1, ... };
   redirect '/checkout';
 post '/checkout' => sub {
   my $params = (session 'create_order_for_checkout')
     || params();

Two: do away with the post handler for '/create_order' altogether, and move the processing inside the post handler for '/checkout'. The merits of that depend on how complex the /create_order handler is.

I'm leaning toward the first approach, definitely.

Spree Commerce invalid value for Integer(): \"09\"" in Spree::Checkout/update

Hello again all. I like to monitor the orders and exceptions of the Spree sites I work on to ensure everything is working as intended. One morning I noticed an unusual error: "invalid value for Integer(): \"09\"" in Spree::Checkout/update on a Spree 2.1.x site.

The Issue

Given that this is a Spree-powered e-commerce site, a customer's inability to checkout is quite alarming. In the backtrace I could see that a string of "09" was causing an invalid value for an integer. Why hadn't I seen this on every order in that case?

I went into the browser and completed some test orders. The bug seemed to affect only credit cards with a leading "0" in the expiration month, and then only certain expiration months. I returned to the backtrace and saw this error was occurring with Active Merchant. So, Spree was passing Active Merchant a string while Active Merchant was expecting an integer.

Armed with a clearer understanding of the problem, I did some Googling. I came across this post. This post describes the source of this issue as being the behavior of sprintf which I will describe below. This topic was discussed in the Ruby Forum.

Octal Numbers

As per Daniel Martin on the aforementioned post:

  • sprintf("%d",'08') ==> ArgumentError
  • sprintf("%d",'8') ==> "8"
  • sprintf("%d",'08'.to_i) ==> "8"
  • sprintf("%f",'08') ==> "8.000000"

As you can see, sprintf cannot convert '08' or '09' to a decimal. Matthias Reitlinger notes,

"%d tells sprintf to expect an Integer as the corresponding argument. Being given a String instead it tries to convert it by calling Kernel#Integer"

In the same post, we can review some documentation of Kernel#Integer

We can see here that if the argument being provided is a string (and it is since that is what Spree is sending), the "0" will be honored. Again, we know

sprintf("%d",'01') => "1" | sprintf("%d", 01) => "1"
sprintf("%d",'02') => "2" | sprintf("%d", 02) => "2"
sprintf("%d",'03') => "3" | sprintf("%d", 03) => "3"
sprintf("%d",'04') => "4" | sprintf("%d", 04) => "4"
sprintf("%d",'05') => "5" | sprintf("%d", 05) => "5"
sprintf("%d",'06') => "6" | sprintf("%d", 06) => "6"
sprintf("%d",'07') => "7" | sprintf("%d", 07) => "7"
sprintf("%d",'08') => error | sprintf("%d", 08) => error
sprintf("%d",'09') => error | sprintf("%d", 09) => error

By pre-prepending the "0" to the numbers, they are being marked as 'octal'. Wikipedia defines octal numbers as

"The octal numeral system, or oct for short, is the base-8 number system, and uses the digits 0 to 7. Octal numerals can be made from binary numerals by grouping consecutive binary digits into groups of three (starting from the right)."

So, 08 and 09 are not octal numbers.


This is why this checkout error did not occur on every order whose payment expiration month had a leading "0", only August (08) and September (09) were susceptible as the leading '0' indicates we are passing in an octal of which 08 and 09 are not valid examples of. So, I made Spree send integers (sprintf("%d",8) #=> "8" and sprintf("%d",9) #=> "9") so that the leading "0" would not get sent (thereby not trying to pass them as octals). I created a app/models/spree/credit_card_decorator.rb file with the contents

Spree::CreditCard.class_eval do
  def expiry=(expiry)
    if expiry.present?
      self[:month], self[:year] = expiry.delete(' ').split('/')
      self[:year] = "20" + self[:year] if self[:year].length == 2
      self[:year] = self[:year].to_i
      self[:month] = self[:month].to_i

After adding this, I tested it in the browser and there were no more checkout errors! I hope you've found this interesting and helpful, thanks for reading!

Angular Responsive Layout Directive

To all of you window.onResize aficionados, I dedicate this blog post because today we will be doing a lot of dynamic resizing in JavaScript. All of it will be done completely and effortlessly with my one-page long Angular directive.

Why do I need to attach an expensive onResize handler to my already overloaded page, you ask. The answer is very simple. Our app layout is pixel-perfect. Each element has the predefined width and margins. Yet, the app needs to look good on all kind of devices, from regular PC to tablet to iPhone. That's why I created the following Angular directive in /scripts/directives/tsResize.js:

.directive('tsResize', function($window) {
 return function(scope, element) {
   var w = angular.element($window);
   scope.getWindowDimensions = function () {
     return {
       'h': $window.innerHeight,
       'w': $window.innerWidth
              function (newValue, oldValue) {
     scope.windowHeight = newValue.h;
     scope.windowWidth = newValue.w;

     scope.mainContainerStyle = function () {
       if (newValue.w > 890) {
         return {};
       } else {
         val = newValue.w/890;
         return {
           '-webkit-transform': 'scale(' + val + ')',
           '-o-transform': 'scale(' + val + ')',
           '-ms-transform': 'scale(' + val + ')',
           'transform': 'scale(' + val + ')',
           'transform-origin': 'left -10px',
           '-webkit-transform-origin': 'left -10px'               
     scope.topBarStyle = function () {
       if (newValue.w > 890) { 
         return {};
       } else { 
         val = newValue.w/890;
         return {
           '-webkit-transform': 'scale(' + val + ')',
           '-o-transform': 'scale(' + val + ')',
           '-ms-transform': 'scale(' + val + ')',
           'transform': 'scale(' + val + ')',
           'transform-origin': '0 2px 0',
           '-webkit-transform-origin': '0 2px 0'  
    }, true);

   w.bind('resize', function () {

As you can see all the magic is done with transform:scale CSS attribute on the two of my main page components: the navigation and the contents container.

They styles are cross-browser.

return {
  '-webkit-transform': 'scale(' + val + ')',
  '-o-transform': 'scale(' + val + ')',
  '-ms-transform': 'scale(' + val + ')',
  'transform': 'scale(' + val + ')'             

It's important to set transform-origin, or the elements will be weirdly positioned on the page.

return {
  'transform-origin': '0 top',
  '-webkit-transform-origin': '0 top'                

The style calculations are attached to the changes of window dimensions.

scope.getWindowDimensions = function () {
  return {
    'h': $window.innerHeight,
    'w': $window.innerWidth
             function (newValue, oldValue) {

Few other things. My layout was sliced to the fixed width of 890px, that's why I took 890 as the pivotal point of my scale ratio formula. You should take the default width of the layout as the base of your calculation.

if (newValue.w > 890) {
  return {};
} else {
  val = newValue.w/890;
  return {
    '-webkit-transform': 'scale(' + val + ')',

With the directive in place it's time to plug it in:


Be sure to use style "display:block" or "display:inline-block" and "position:relative" for all the inside components of the scaled elements with the default display. Otherwise they do not obey the scaling enforcement and grow way too long prompting a scrollbar.

It all worked nicely and I was able to enjoy the smoothly resizing layout.

Web Development, Big Data and DevOps - OSI Days 2014, India

This is the second part of an article about the conference Open Source India, 2014 was held at Bengaluru, India. The first part is available here. The second day of the conference started with the same excitement level. I plan to attend talks covering Web, Big Data, Logs monitoring and Docker.

Web Personalisation

Jacob Singh started the first talk session with a wonderful presentation along with real-world cases which explained the importance of personalisation in the web. It extended to content personalisation for users and A/B testing (comparing two versions of a webpage to see which one performs better). The demo used the Acquia Lift personalisation module for the Drupal CMS which is developed by his team.

MEAN Stack

Sateesh Kavuri of Yodlee spoke about the MEAN stack which is a web development stack equivalent to popular LAMP stack. MEAN provides a flexible compatibility to web and mobile applications. He explained the architecture of MEAN stack.

He also provided an overview of each component involved in MEAN Stack.

MongoDB - NoSQL database with dynamic schema, in-built aggregation, mapreduce, JSON style document, auto-sharding, extensive query mechanism and high availability.

ExpressJS - A node.js framework to provide features to web and mobile applications.

AngularJS - seamless bi-directional model with extensive features like services and directives.

Node.js - A server side javascript framework with event based programming and single threaded (non blocking I/O with help of request queue).

Sails.js - MEAN Stack provisioner to develop applications quickly.

Finally he demonstrated a MEAN Stack demo application provisioned with help of Sails.js.

Moving fast with high performance Hack and PHP

Dushyant Min spoke about the way Facebook optimised the PHP code base to deliver better performance when they supposed to handle a massive growth of users. Earlier there were compilers HipHop for PHP(HPHPc) or HPHPi(developer mode) to convert the php code to C++ binary and executed to provide the response. After sometime, Facebook developed a new compilation engine called HipHop Virtual Machine(HHVM) which uses Just-In-Time(JIT) compilation approach and converts the code to HipHop ByteCode(HHBC). Both Facebook’s production and development environment code runs over HHVM.
Facebook also created a new language called Hack which is very similar to PHP which added static typing and many other new features. The main reason for Hack is to get the fastest development cycle to add new features and release frequent versions. Hack also uses the HHVM engine.

HHVM engine supports both PHP and Hack, also it provides better performance compare to Zend engine. So Zend Engine can be replaced with HHVM without any issues in the existing PHP applications to get much better performance. It is simple as below:

Also PHP code can be migrated to Hack by changing the <?php tag to <?hh and there are some converters (hackficator) available for code migration. Both PHP and Hack provide almost the same performance on the HHVM engine, but Hack has some additional developer-focussed features.

Application Monitoring and Log Management

Abhishek Dwivedi spoke about a stack to process the logs with various formats, myriad timestamp and no context. He explains a stack of tools to process the logs, store and visualize in a elegant way.

ELK Stack = Elasticsearch, LogStash, Kibana. The architecture and the data flow of ELK stack is stated below:

Elasticsearch - Open source full text search and analytics engine

Log Stash - Open source tool for managing events and logs which has following steps to process the logs

Kibana - seamlessly works with Elasticsearch and provides elegant user interface with various types of graphs

Apache Spark

Prajod and Namitha presented the overview of Apache Spark which is a real time data processing system. It can work on top of Hadoop Distributed FileSystem(HDFS). Apache Spark performs 100x faster in memory and 10x faster in disk compare to Hadoop. It fits with Streaming and Interactive scale of Big Data processing.

Apache Spark has certain features in processing the data to deliver the promising performance:

  • Multistep Directed Acyclic Graph
  • Cached Intermediate Data
  • Resilient Distributed Data
  • Spark Streaming - Adjust batch time to get the near real time data process
  • Implementation of Lambda architecture
  • Graphx and Mlib libraries play an important role

Online Data Processing in Twitter

Lohit Vijayarenu from Twitter spoke about the technologies used at Twitter and their contributions to Open Source. Also he explained the higher level architecture and technologies used in the Twitter microblogging social media platform.

The Twitter front end is the main data input for the system. The Facebook-developed Scribe log servers gather the data from the Twitter front end application and transfers the data to both batch and real time Big Data processing systems. Storm is a real time data processing system which takes care of the happening events at the site. Hadoop is a batch processing system which runs over historical data and generates result data to perform analysis. Several high level abstraction tools like PIG are used write the MR jobs. Along with these frameworks and tools at the high level architecture, there are plenty of Open Source tools used in Twitter. Lohit also strongly mentioned that in addition to using Open Source tools, Twitter contributes back to Open Source.


Neependra Khare from Red Hat given a talk and demo on Docker which was very interactive session. The gist of Docker is to build, ship and run any application anywhere. It provides good performance and resource utilization compared to the traditional VM model. It uses the Linux core feature called containerization. The container storage is ephemeral, so the important data can be stored in persistent external storage volumes. Slides can be found here.

DBD::Pg escaping placeholders with backslashes

The popularity of using JSON and JSONB within Postgres has forced a solution to the problem of question mark overload. JSON (as well as hstore) uses the question mark as an operator in its queries, and Perl DBI (esp. DBD::Pg) uses the question mark to indicate a placeholder. Version 3.5.0 of DBD::Pg has solved this by allowing the use of a backslash character before the question mark, to indicate it is NOT a placeholder. We will see some code samples after establishing a little background.

First, what are placeholders? They are special characters within a SQL statement that allow you to defer adding actual values until a later time. This has a number of advantages. First, it completely removes the need to worry about quoting your values. Second, it allows efficient re-use of queries. Third, it reduces network traffic as you do not need to send the entire query each time it is re-run. Fourth, it can allow for seamless translation of data types from Postgres to your client language and back again (for example, DBD::Pg translates easily between Perl arrays and Postgres arrays). There are three styles of placeholders supported by DBD::Pg - question marks, dollar-signs, and colon-names.

Next, what are Postgres operators? They are special symbols withing a SQL statement that perform some action using as inputs the strings to the left and right side of them. It sounds more complicated than it is. Take this query:

SELECT count(*) FROM pg_class WHERE relpages > 24;

In this case, the operator is ">" - the greater than sign. It compares the things on its left (in this case, the value of the relpages column) with the things on its right (in this case, the number 24). The operator will return true or false - in this case, it will return true only if the value on its left is larger than the value on its right. Postgres is extremely extensible, which means it is easy to add all types of new things to it. Adding your own operator is fairly easy. Here's an example that duplicates the greater-than operator, but with a ? symbol:

CREATE OPERATOR ? (procedure=int4gt, leftarg=integer, rightarg=integer);

Now the operator is ready to go. You should be able to run queries like this:

SELECT count(*) FROM pg_class WHERE relpages ? 24;

The list of characters that can make up an operator is fairly small. The documentation has the detailed rules, but the basic list is + - * / < > = ~ ! @ # % ^ & | ` ?. Note that an operator can consist of more than one character, for example, >=

A question mark inside a SQL query can be both a placeholder and an operator, and the driver has no real way to figure out which is which. The first real use of a question mark as an operator was with the geometric operators and then with the hstore module, which allows storing and querying of key/value pairs. It uses a lone question mark to determine if a given value appears as a key in a hstore column. For example, if the goal is to find all rows in which an hstore column contains the value foobar, the SQL would be:

SELECT * FROM mytable WHERE myhstorecol ? 'foobar';

However, if you were to try this via a Perl script using the question-mark placeholder style, DBD::Pg would get confused (and rightly so):

$sth = $dbh->prepare('SELECT * FROM mytable WHERE myhstorecol ? ?');
DBD::Pg::st execute failed: called with 1 bind variables when 2 are needed

Trying to use another placeholder style still does not work, as DBD::Pg still picks it up as a possible placeholder

$sth = $dbh->prepare('SELECT * FROM mytable WHERE myhstorecol ? $1');
Cannot mix placeholder styles "?" and "$1"

A few years ago, a solution was developed: by setting the database handle attribute "pg_placeholder_dollaronly" to true, DBD::Pg will ignore the question mark and only treat dollar-sign numbers as placeholders:

$dbh->{pg_placeholder_dollaronly} = 1;
$sth = $dbh->prepare('SELECT * FROM mytable WHERE myhstorecol ? $1');
## No error!

Then came JSON and JSONB. Just like hstore, they have three operators with question marks in them: ?, ?& and ?| - all of which will prevent the use of question-mark placeholders. However, some frameworks and supporting modules (e.g. SQL::Abstract and DBIx::Class) only support the question mark style of placeholder! Hence, another solution was needed. After some discussion on the dbi-users list, it was agreed that a backslash before a placeholder character would allow that character to be "escaped" and sent as-is to the database (minus the backslash). Thus, as of version 3.5.0 of DBD::Pg, the above query can be written as:

use DBD::Pg 3.5.0;
$SQL = "SELECT * FROM mytable WHERE hstorecol \\? ?");
$sth = $dbh->prepare($SQL);
# No error!
$SQL = "SELECT * FROM mytable2 WHERE jsoncol \\? ?");
$sth = $dbh->prepare($SQL);
# Still no error!

So, a fairly elegant solution. The only caveat is to beware of single and double quotes. The latter require two backslashes, of course. I recommend you always use double quotes and get in the habit of consistently using double backslashes. Not only will you thus never have to worry about single-vs-double, but it adds a nice little visual garnish to help that important backslash trick stand out a little more.

Much thanks to Tim Bunce for reporting this issue, herding it through dbi-users, and helping write the final DBD::Pg solution and code!

Riding the Elasticsearch River on a CouchDB: Part 1

As you may guessed from my perfect tan and rugged good looks, I am Phunk, your river guide. In this multi-part series, I will guide us through an exploration of Elasticsearch, its CouchDB/BigCouch River plugin, its source, the CouchDB document store, and the surrounding flora and fauna that are the Ruby on Rails based tools I created to help the DPLA project manage this ecosystem.

Before we get our feet wet, let's go through a quick safety briefing to discuss the terms I'll be using as your guide on this trip. Elasticsearch: A schema-less, JSON-based, distributed RESTful search engine. The River: An Elasticsearch plugin that automatically indexes changes in your upstream (heh) document store, in real-time. CouchDB: The fault-tolerant, distributed NoSQL database / document store. DPLA: The Digital Public Library of America open source project for which all this work was done.

Let's put on our flotation devices, don our metaphor helmets and cast off.

In an Elasticsearch + River + CouchDB architecture, all things flow from the CouchDB. For the DPLA project, we wanted to manage (create, update and delete) documents in our CouchDB document repository and have those changes automagically reflected in our Elasticsearch index. Luckily, CouchDB publishes a real-time stream (heh) of updates to its documents via its cleverly named "_changes" feed. Each change in that feed is published as a stand-alone JSON document. We'll look at that feed in more detail in a bit.

The River bridges (heh) the gap between CouchDB's _changes feed and Elasticsearch index. The plugin runs inside Elasticsearch, and makes a persistent TCP connection to CouchDB's _changes endpoint. When a new change is published to that endpoint, the River passes the relevant portions of that JSON up to Elasticsearch, which then makes the appropriate change to its index. Let's look at a simple timeline of what the River would see from the _changes feed during the creation of a new document in CouchDB, and then an update to that document:

A document is created in CouchDB, the _changes feed emits:


That same document is updated in CouchDB, the _changes feed emits:


It's tough to tell from this contrived example document, but the _changes feed actually includes the entire source document JSON for creates and updates. (I'll talk more about that in part 2.) From the above JSON examples, the River would pass the inner-most document containing the _id, _rev and my_field data up to Elasticsearch. Elasticsearch uses that JSON to update the corresponding document (keyed by _id) in its search index and voila, the document you updated in CouchDB is now updated in your Elasticsearch search index in real-time.

We have now gotten our feet wet with how a document flows from one end to the other in this architecture. In part 2, we'll dive deeper into the DevOps-heavy care, feeding, monitoring and testing of the River. We'll also look at some slick River tricks that can transform your documents before Elasticsearch gets them, and any other silly River puns I can come up with. I'll also be reading the entire thing in my best David Attenborough impression and posting it on SoundCloud.

DBD::Pg: one ping to rule them all

How can you tell if your database connection is still valid? One way, when using Perl, is to use the ping() method. Besides backslash-escaped placeholders, a revamped ping() method is the major change in the recently released version 3.5.0 of DBD::Pg, the Perl/DBI interface to Postgres. Before 3.5.0, there was a chance of false positives when using this method. In particular, if you were inside of a transaction, DBD::Pg did not actually attempt to contact the Postgres backend. This was definitely an oversight, and DBD::Pg now does the right thing.

Detecting a dead backend is a little trickier than it sounds. While libpq stores some state information for us, the only way to be sure is to issue a command to the backend. Additionally, we check the value of PQstatus in case libpq has detected a problem. Realistically, it would be far better if the Postgres protocol supported some sort of ping itself, just a simple answer/response without doing anything, but there is nothing like that yet. Fortunately, the command that is issued, /* DBD::Pg ping test, v3.5.0 */, is very lightweight.

One small side effect is that the ping() method (and its stronger cousin, the pg_ping() method) will both cancel any COPY that happens to be in progress. Really, you should not be doing that anyway! :) Calling the next copy command, either pg_getline() or pg_putline(), will tell you if the connection is valid anyway. Since the copy system uses a completely different backend path, this side effect is unavoidable.

Even this small change may cause some problems for applications, which relied on the previous false positive behavior. Leaving as a basic no-op, however, was not a good idea, so check if your application is using ping() sanely. For most applications, simple exception handling will negate to use ping() in the first place.