Welcome to End Point’s blog

Ongoing observations by End Point people

Google Liquid Galaxy Support

End Point is tremendously excited to announce our turnkey installation and support service for Google's Liquid Galaxy. Google has just today announced the release of Liquid Galaxy to the general public and to its Earth Enterprise customers. Liquid Galaxy is an astounding 3D immersive environment for Google Earth. End Point's turnkey installation service includes on-site booth construction, precise installation of compute nodes and displays, plus expert configuration of operating system, networking, and Google Earth client software. End Point also provides support for upgrades and troubleshooting.

End Point has extensive hands-on experience with installing and configuring Liquid Galaxy. We've installed a number of Galaxies for Google itself in the US and abroad. We also recently installed the Liquid Galaxy unveiled this month at the San Jose Tech Museum's spectacular new Silicon Valley Innovation Gallery. This is the first Liquid Galaxy deployed outside of a Google office or trade show.

Google's Liquid Galaxy is 3-dimensional without your having to wear funny glasses. Google's reference implementation specifies a circular booth 12'4" (3.75m) in diameter with 8 large high definition displays mounted in portrait orientation around the interior of the booth's perimeter. Each display is driven by its own computer. You simply walk in (bring some friends if you want) and you're inside of Google Earth.

If the screen in front of you is a northern view, then the screens to your right show what's to the east and the screens to your left the west. That's unless you've navigated yourself to an upside down position! When you dive down to ground level and navigate through Google Earth's 3D building models, not only do you get to see what is in front of you, but you see what is to the side also. (You'd see what is behind you too, except that the booth has an open doorway.)

The experience of Liquid Galaxy is unique and marvelous because your peripheral vision comes alive in it. Not until you've been inside Liquid Galaxy can you truly appreciate how fundamental and useful our peripheral vision is to our overall visual experience. The world has more than two dimensions, so it's particularly appropriate that Google Earth now does too.

See the End Point website for more info about Liquid Galaxy and our support offerings, including a price list. We've posted a nice YouTube video made by Eugenia Loli-Queru with Jason Holt, Google's lead engineer for the project, at the helm.

Gift Certificates: Easy to Say, Hard to Do

A recent blog article by Steph goes through the steps taken to handle implementation of gift certificates using Spree's coupon/discount model. At the end of the article, she rightly concludes that gift certificates are not that well suited to the discount model and need to be handled separately. Gift certificates are rightly handled as a method of payment -- they should have no effect on the amount of the sale, sales tax or shipping charges. This is an attempt to further that discussion.

Some people want to put rules on to how gift certificates can be used: Minimum purchase, expiration dates, etc. However, these are rules that apply to qualification: Does the person or the sale qualify to get a lower price? To my mind, these are not rules that can be applied to true gift certificates. A gift certificate is a business liability. The merchant has the money, and the customer should be allowed to spend it as she sees fit, without having to pass any qualifications separate from those that apply to everyone else. So gift certificates should be treated by the system as a form of payment, and customers should not have to qualify just to use them.

To implement a gift certificate system, we need to be able to sell them, record them, and redeem them.

Selling Gift Certificates:

The first thing to tackle is selling a gift certificate. The key thing here is that the sales tax is charged later, when the certificate is used to pay for merchandise. So we need to set up a non-taxable category of merchandise and put our certificate in it. If the site is selling a physical certificate, shipping can be charged, but this would typically be a flat rate for postage that does not vary with the amount of the certificate. No shipping should be charged on virtual gift certificates. This may not fit easily into standard shipping calculators, so something custom may be needed here.

I am assuming here that certificates sell at face value (a $50 certificate sells for $50). If not, we need a way to get the certificate value different from the amount paid for it. In any case the sale price needs to be at least the value of the certificate ($100 certificate costs $105) or we can get an runaway escalation effect, where one certificate is used to buy another ad infinitum. For that same reason, our gift certificates should be non-discountable. Otherwise a slick customer can use one to buy another, eventually getting free certificates. No thanks! Alternatively, we could prevent people from using one gift certificate to pay for another. For now, let's just say that certificates are non-discountable. Anyway, discounts apply to merchandise, and a gift certificate is not merchandise.

Recording Gift Certificates:

Having been sold, the gift certificate needs to go on file so it can be used. The record needs to include the purchaser, (who may not be the eventual redeemer) and the the original value of the certificate. In the database we need a table like "giftcerts," to which we can insert a row with the certificate number, the order number it was purchased from, and the certificate value. Ideally, having certificate numbers be pulled randomly from a large space, and taking care to ensure no duplicates are issued is sufficient. Usually a certificate just needs to be too hard to guess or to brute-force, but certainly, knowing one valid number, it should not be possible to guess another.

To redeem a card, a customer just needs to enter its number. Once it is presented for the first time, we can associate it with the logged-in username if we desire. Some merchants may want to ensure that a customer is registered to spend a gift certificate. This has two benefits: (1) we get better tracking of gift certificate sales, and (2) we can use the registration info to ensure that the person who spends the certificate is the one to whom it belongs.

Redeeming Gift Certificates:

When a certificate is used for purchase, the system needs to determine whether the code is eligible or not. To us, that means simply does the card have an available balance?

When applied to the purchase total, a valid certificate will either cover the entire purchase amount, or leave a balance due. If a balance is due, the system should require a credit card, possibly even another gift certificate, or other payment mode be provided to cover the remaining amount of the sale. To avoid race conditions, the system needs to implement special handling to make sure that no overspend takes place. If the certificates used cover the entire amount of the sale, we need to then bypass credit card processing.

If the customer edits the cart contents once a certificate is applied, we need to re-apply the certificate, adjusting the applied value. Likewise, when the order is finally submitted, we need to check each certificate again, and report failure and request new payment details if any part of the submitted payments does not go though. If our database is one that can handle transactions, we are in good shape -- we can process the payments one by one, handling the gift certificates first, and roll back if any part fails.

Of course we will need to record gift certificate usage, so that for each certificate, we are able to see how it has been spent and when. So, once a sale is placed that uses a certificate for payment, we need to insert rows into "giftcert_use" indicating the order number, certificate number and amount charged, one row for each certificate used by the customer.

Other thoughts:

In the case of returns for merchandise paid for by gift certificate, it may be best to issue a new gift-certificate as a store credit for the amount of the return. In fact this can be done for any return! The customer gets credit that can only be used at the store. This requires an admin capability to add new gift certificates with a comment or flag to indicate that the certificate was added as a refund of store credit.


So we see it's not so easy to handle gift certificates!. This simple discussion involves customizations to sales tax, shipping, and payment processing. There are race conditions to be handled and avoided, and and admin interface for issuing store credits. All that, and we did not cover auditing of gift certificates or providing reports to users.

What have I left out? Are there other important elements to be considered? Your comments are welcome!

Debugging jQuery

A recent reskin project for a client requires that we convert some of their old Prototype code to jQuery as well as create new jQuery code. I have not done much with converting Prototype to jQuery and I felt like my debugging tools for JavaScript were under par. So this morning I set out to find what was available for jQuery and I found this article on the subject.

I've used Firebug for some time now, but was unaware of some of the supporting plugins that would certainly help with debugging JavaScript. Some of the plugins and methods found in the article that I found immediately helpful were:

  • FireFinder: Makes it quite easy to verify that the selector values in your code are correct and that the proper elements are returned. I was able to immediately pinpoint problems with my selectors and this brought to light why certain events were not firing.
  • Firebug Console: Using the console.log function allowed me to check values without littering my code with alert statements.
  • FireQuery: At a glance this plugin for Firebug shows which elements have event handlers bound to them.
  • Firebug Breakpoints: Setting breakpoints and watch statements in your code makes it easier to see what is happening in the JavaScript code as it is executed instead of trying to figure out what happened after the code has run its course.

Thanks to the author of the article, Elijah Manor, for the in-depth information on debugging jQuery code.

PostgreSQL 8.4 in RHEL/CentOS 5.5

The announcement of end of support coming soon for PostgreSQL 7.4, 8.0, and 8.1 means that people who've put off upgrading their Postgres systems are running out of time before they're in the danger zone where critical bugfixes won't be available.

Given that PostgreSQL 7.4 was released in November 2003, that's nearly 7 years of support, quite a long time for free community support of an open-source project.

Many of our systems run Red Hat Enterprise Linux 5, which shipped with PostgreSQL 8.1. All indications are that Red Hat will continue to support that version of Postgres as it does all parts of a given version of RHEL during its support lifetime. But of course it would be nice to get those systems upgraded to a newer version of Postgres to get the performance and feature benefits of newer versions.

For any developers or DBAs familiar with Postgres, upgrading to a new version with RPMs from the PGDG or other custom Yum repository is not a big deal, but occasionally we've had a client worry that using a packages other than the ones supplied by Red Hat is riskier.

For those holdouts still on PostgreSQL 8.1 because it's the "norm" on RHEL 5, Red Hat gave us a gift in their RHEL 5.5 update. It now includes separate PostgreSQL 8.4 packages that may optionally be used on RHEL 5 instead of PostgreSQL 8.1. (Both can't be used on the same system at the same time.)

I know that getting these packages from Red Hat shouldn't be necessary, but for those who feel jittery about using 3rd-party packages, it's a good nudge to switch to Postgres 8.4 using Red Hat's supported packages. Thanks to Tom Lane at Red Hat for making this happen. Though I don't know whose idea it was, Tom is the author of all the RPM commitlog messages, so thanks, Tom!

This brings up a few other rhetorical questions: Will RHEL 6 ship with PostgreSQL 9.0? Will RHEL 5.6 have backported PostgreSQL 9.0 in similar postgresql90 packages? It'd be great to see each new PostgreSQL release have supported packages in RHEL so that there's even less reason to start a new project on an older version of Postgres. RHEL 5.5 with PostgreSQL 8.4 is a nice start in that direction.

Postgres configuration best practices

This is the first in an occasional series of articles about configuring PostgreSQL. The main way to do this, of course, is the postgresql.conf file, which is read by the Postgres daemon on startup and contains a large number of parameters that affect the database's performance and behavior. Later posts will address specific settings inside this file, but before we do that, there are some global best practices to address.

Version Control

The single most important thing you can do is to put your postgresql.conf file into version control. I care not which one you use, but go do it right now. If you don't already have a version control system on your database box, git is a good choice to use. Barring that, RCS. Doing so is extremely easy. Just change to the directory postgresql.conf is in. The process for git:

  • Install git if not there already (e.g. "sudo yum install git")
  • Run: git init
  • Run: git add postgresql.conf pg_hba.conf
  • Run: git commit -a -m "Initial commit"

For RCS:

  • Install as needed (e.g. "sudo apt-get install rcs")
  • Run: mkdir RCS
  • Run: ci -l postgresql.conf pg_hba.conf

Note that we also checked in pg_hba.conf as well. You want to check in any file in that directory you may possibly change. For most people, that only means postgresql.conf and pg_hba.conf, but if you use other files (pg_ident.conf) check those in as well.

Ideally you want the version checked in to be the "raw" configuration files that came with the system - in other words, before you started messing with them. Then you make your initial changes and check it in. From then on of course, you commit every time you change the file.

At a bare minimum, the version control system should be telling you:

  • Exactly what was changed
  • When it was changed
  • Who made the change
  • Why it was changed

The first two items happen automatically in all version control systems, so you don't have to worry about those. The third item, "who made the change", must be entered manually if on a shared account (e.g. postgres) and using RCS. If you are using git, you can simply set the environment variables GIT_AUTHOR_NAME and GIT_AUTHOR_EMAIL. For shared accounts, I have a custom bashrc file called "gregbashrc" that is called when I log in that sets those ENVs as well as a host of other items.

The fourth item, "why it was changed", is generally the content of the commit message. Never leave this blank, and be as descriptive and verbose as possible - someone later on will be grateful you did. It's okay to be repetitive and state the obvious. If this was done as part of a specific ticket number or project name, mention that as well.

Safe Changes

It's important that the changes you make to the postgresql.conf file (or other files) actually work and don't cause Postgres to be unable to parse the file, or handle a changed setting. Never make changes and restart Postgres, because if it doesn't work, you've got a broken config file, no Postgres daemon, and most likely unhappy applications and/or users. At the very least, do a reload first (e.g. /etc/init.d/postgresql reload or just kill -HUP the PID). Check the logs and see if Postgres was happy with your changes. If you are lucky, it won't even require a restart (some changes do, some do not).

A better way to test your changes is to make it on an identical test box. That way, all the wrinkles are ironed out before you make the changes on production and attempt a reload or restart.

Another way I've found handy is to simply start a new Postgres daemon. Sounds like a lot of work, but it's pretty automatic once you've done it a few times. The process generally looks like this, assuming your production postgresql.conf is in the "data" directory, and your changes are in data/

  • cd ..
  • initdb testdata
  • cp -f data/ testdata/
  • echo port=5555 >> testdata/postgresql.conf
  • echo max_connections=10 >> testdata/postgresql.conf

The max_connections is not strictly necessary, of course, but unless you are changing something that relies on that setting, it's nicer to keep it (and the resulting memory) low.

  • pg_ctl -D testdata -l test.log start
  • cat test.log
  • pg_ctl -D testdata stop
  • rm -fr testdata (or just keep it around for next time)

The test.log file will show you any problems that might have popped up with your changes, and once it works you can be fairly confident it will work for the "main" daemon as well, so to finish up:

  • cd data
  • mv -f postgresql.conf
  • git commit postgresql.conf -m "Adjusted random_page_cost to 2, per bug #4151"
  • kill -HUP `head -1`
  • psql -c 'show random_page_cost'

Keeping it Clean

The postgresql.conf file is fairly long, and can be confusing to read with its mixture of comments, in-line comments, strange wrapping, and the commented out vs. not-commented-out variables. Hence, I recommend this system:

  • Put a big notice at the top of the file asking people to make changes to the bottom
  • Put all important variables at the bottom, sans comments, one per line
  • Line things up
  • Put into logical groups.

This avoids having to hunt for settings, prevents the gotcha of when a setting is changed twice in the file, and makes things much easier to read visually. Here's what I put at the top of the postgresql.conf:


I then add a good 20+ empty lines, so anyone viewing the file is forced to focus on the all-caps message above.

The next step is to put all the settings you care about at the bottom of the file. Which ones should you care about? Any setting you have changed (obviously), any setting that you *might* change in the future, and any that you may not have changed, but someone may want to look up. In practice, this means a list of about 25 items. After aligning all the values to the right and breaking things into logical groups, here's what the bottom of the postgresql.conf looks like:

## Connecting
port                            = 5432
listen_addresses                = '*'
max_connections                 = 100

## Memory
shared_buffers                  = 400MB
work_mem                        = 1MB
maintenance_work_mem            = 1GB

## Disk
fsync                           = on
synchronous_commit              = on
full_page_writes                = on
checkpoint_segments             = 100

archive_mode                    = off
archive_command                 = ''
archive_timeout                 = 0

## Planner
effective_cache_size            = 18GB
random_page_cost                = 2

## Logging
log_destination                 = 'stderr'
logging_collector               = on
log_filename                    = 'postgres-%Y-%m-%d.log'
log_truncate_on_rotation        = off
log_rotation_age                = 1d
log_rotation_size               = 0
log_min_duration_statement      = 200
log_statement                   = 'ddl'
log_line_prefix                 = '%t %u@%d %p'

## Autovacuum
autovacuum                      = on
autovacuum_vacuum_scale_factor  = 0.1
autovacuum_analyze_scale_factor = 0.3

Because everything is in one place, at the bottom of the file, and not commented out, it's very easy to see what is going on. The groups above are somewhat arbitrary, and you can leave them out or create your own, but at least keep things grouped together as much as possible. When in doubt, use the same order as they appear in the original postgresql.conf.

Sometimes people change important settings in a group, such as for bulk loading of data. In this case, I usually make a separate group for it at the very bottom. This makes it easy to switch back and forth, and helps to prevent people from (for example) forgetting to switch fsync back on:

## Bulk loading only - leave 'on' for everyday use!
autovacuum                      = off
fsync                           = off
full_page_writes                = off

Ownership and permissions

All the conf files should be owned by the postgres user, and the configuration files should be world-readable if possible (indeed, it's a requirement for Debian based system that postgresql.conf be readable for psql to work!). Be careful about SELinux as well: it can get ornery if you do things like use symlinks.


One final note - make sure you are backing up your changes as well. PITR and pg_dump won't save your postgresql.conf! If you are checking things in to a remote version control system, then some of the pressure is off, but you should have some sort of policy for backing up all your conf files explicitly. Even if using a local git repo, tarring and copying up the whole thing is usually a very quick and cheap action.

Anonymous code blocks

With the release of PostgreSQL 9.0 comes the ability to execute "anonymous code blocks" in various of the PostgreSQL procedural languages. The idea stemmed from work back in autumn of 2009 that tried to respond to a common question on IRC or the mailing lists: how do I grant a permission to a particular user for all objects in a schema? At the time, the only solution short of manually writing commands to grant the permission in question on every object individually was to write a script of some sort. Further discussion uncovered several people that often found themselves writing simple functions to handle various administrative tasks. Many of those people, it turned out, would rather simply call one statement, rather than create a function, call the function, and then drop (or just ignore) the function they'd never need again. Hence, the new DO command.

The first language to support DO was PL/pgSQL. The PostgreSQL documentation provides an example to answer the original question: how do I grant permissions on everything to a particular user.

DO $$DECLARE r record;
    FOR r IN SELECT table_schema, table_name FROM information_schema.tables
             WHERE table_type = 'VIEW' AND table_schema = 'public'
        EXECUTE 'GRANT ALL ON ' || quote_ident(r.table_schema) || '.' || quote_ident(r.table_name) || ' TO webuser';

Notice that this doesn't actually tell us what language to use. If no language is specified, DO defaults to PL/pgSQL (which, in 9.0, is enabled by default). But you can use other languages as well:

DO $$
    BTW Calculate pi using Gregory-Leibniz series
    BTW This method does not converge particularly quickly...
    I HAS A T1
    I HAS A T2
    I HAS A PI ITZ 0.0

        T1 R QUOSHUNT OF 4.0 AN SUM OF 3.0 AN ITR
        T2 R QUOSHUNT OF 4.0 AN SUM OF 5.0 AN ITR
        ITR R SUM OF ITR AN 4.0
            YA RLY, GTFO
    VISIBLE "PI R: "

I tried to rewrite the GRANT function shown above in PL/LOLCODE for this example, until I discovered that some of PL/LOLCODE's limitations make it extremely difficult, if not impossible. So far as I know, PL/LOLCODE was the second language to support anonymous blocks, thanks to what turned out to be a relatively simple programming exercise. After finishing PL/LOLCODE's DO support, I decided to do the same for PL/Perl. I wasn't particularly surprised to find that PL/Perl was harder to extend than PL/LOLCODE; PL/Perl is a much more feature-rich (and hence, complicated) language and I wasn't as familiar with its internals. However, after my initial submission and with helpful commentary from several other people, Andrew Dunstan tied off the loose ends and got it committed. It looks like this:

DO $$
    my $row;
    my $rv = spi_exec_query(q{
        SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS relname
        FROM information_schema.tables WHERE table_type = 'VIEW' AND table_schema = 'public'
    my $nrows = $rv->{processed};
    foreach my $i (0 .. $nrows - 1) {
        my $row = $rv->{rows}[$rn];
        spi_exec_query("GRANT ALL ON $row->{relname} TO webuser");
$$ LANGUAGE plperl;

DO wasn't the only thing to come from the pgsql-hackers discussion I mentioned above. In PostgreSQL 9.0, the GRANT command has also been modified, so it's now possible to grant permissions several objects in one stroke syntax. For instance:


pg_wrapper's very symbolic links

I like pg_wrapper. For a development environment, or testing replication scenarios, it's brilliant. If you're not familiar with pg_wrapper and its family of tools, it's a set of scripts in the postgresql-common and postgresql-client-common packages available in Debian, as well as Ubuntu and other Debian-like distributions. As you may have guessed pg_wrapper itself is a wrapper script that calls the correct version of the binary you're invoking – psql, pg_dump, etc – depending on the version of the database you want to connect to. Maybe not all that exciting in itself, but implied therein is the really cool bit: This set of tools lets you manage multiple installations of Postgres, spanning multiple versions, easily and reliably.

Well, usually reliably. We were helping a client upgrade their production boxes from Postgres 8.1 to 8.4. This was just before the 9.0 release, otherwise we'd consider moving the directly to that instead. It was going fairly smoothly until on one box we hit this message:

Could not parse locale out of pg_controldata output

Oops, they had pinned the older postgres-common version. An upgrade of those packages and no more error!

$ pg_lsclusters
Version Cluster   Port Status Owner    Data directory                     Log file
8.1     main      5439 online postgres /var/lib/postgresql/8.1/main       custom
Error: Invalid data directory

Hmm, interesting. Okay, so not quite, got a little bit more work to do. This one took some tracing through the code. The pg_wrapper scripts, if they don't already know it, look for the data directory in a couple of places. The first stop is the postgresql.conf file, specifically /etc/postgresql/<version>/<cluster-name>/postgresql.conf, looking for the data_directory parameter. But, in its transitional state at the time, the postgresql.conf was still a work in progress.

The second place it looks is a symlink in the same /etc/postgresql/<version>/<cluster-name>/ directory. While that's the old way of doing things, it at least let us get things looking reasonable:

# ln -s /var/lib/postgresql/8.4/main /etc/postgresql/8.4/main/pgdata
# /etc/init.d/postgresql-8.4 status
8.1     main      5439 online postgres /var/lib/postgresql/8.1/main       custom
8.4     main      5432 online postgres /var/lib/postgresql/8.4/main       custom

VoilĂ ! From there we were able to proceed with the upgrade, confident that the instance will behave as expected. And now, everything is running great!

As with most things that provide a simpler experience on the surface, there's additional complexity under the hood. But for now, we have one more client upgraded. Thanks, Postgres!

Listen/Notify improvements in PostgreSQL 9.0

Improved listen/notify is one of the new features of Postgres 9.0 I've been waiting for a long time. There are basically two major changes: everything is in shared memory instead of using system tables, and full support for "payload" messages is enabled.

Before I demonstrate the changes, here's a review of what exactly the listen/notify system in Postgres is. Basically, it is an inter-process signalling system, which uses the pg_listener system table to coordinate simple named events between processes. One or more clients connects to the database and issues a command such as:

LISTEN foobar;

The name foobar can be replaced by any valid name; usually the name is something that gives a contextual clue to the listening process, such as the name of a table. Another client (or even one of the original ones) will then issue a notification like so:

NOTIFY foobar;

Each client that is listening for the 'foobar' message will receive a notification that the sender has issued the NOTIFY. It also receives the PID of the sending process. Multiple notifications are collapsed into a single notice, and the notification is not sent until a transaction is committed.

Here's some sample code using DBD::Pg that demonstrates how the system works:

#!/usr/bin/env perl
# -*-mode:cperl; indent-tabs-mode: nil-*-

use strict;
use warnings;
use DBI;

my $dsn = 'dbi:Pg:dbname=test';
my $dbh1 = DBI->connect($dsn,'test','', {AutoCommit=>0,RaiseError=>1,PrintError=>0});
my $dbh2 = DBI->connect($dsn,'test','', {AutoCommit=>0,RaiseError=>1,PrintError=>0});

print "Postgres version is $dbh1->{pg_server_version}\n";

my $SQL = 'SELECT pg_backend_pid(), version()';
my $pid1 = $dbh1->selectall_arrayref($SQL)->[0][0];
my $pid2 = $dbh2->selectall_arrayref($SQL)->[0][0];
print "Process one has a PID of $pid1\n";
print "Process two has a PID of $pid2\n";

## Process one listens for a notice named "jtx"
$dbh1->do(q{LISTEN jtx});
## Process one checks for any notices received
print show_notices($dbh1);

## Process two sends a notice, but does not commit
$dbh2->do(q{NOTIFY jtx});
## Process one does not see the notice yet
print show_notices($dbh1);
## Process two sends the same notice again, then commits
$dbh2->do(q{NOTIFY jtx});

sleep 1; ## Ensure the notice has time to get to propogate
## Process two receives a single notice from process one
print show_notices($dbh1);

## Now that it has seen the notice, it reports nothing again:
print show_notices($dbh1);

sub show_notices { ## Function to return any notices received
       my $dbh = shift;
       my $messages = '';
       while (my $n = $dbh->func('pg_notifies')) {
          $messages .= "Got notice '$n->[0]' from PID $n->[1]\n";
       return $messages || "No messages\n";

The output of the above script on a 8.4 Postgres server is:

Postgres version is 80401
Process one has a PID of 18238
Process two has a PID of 18239
No messages
No messages
Got notice 'jtx' from PID 18239
No messages

As expected, we got a notification only after the other process committed.

Note that because this is asychronous and involves the system tables, we added a sleep call to ensure that the notice had time to propagate so that the other processes will see it. Without the sleep, we usually see four "No messages" appear, as the script goes too fast for the pg_listener table to catch up.

Now for the aforementioned payloads. Payloads allow an arbitrary string to be attached to the notification, such that you can have a standard name like before, but you can also attach some specific text that the other processes can see. I added support for payloads to DBD::Pg back in June 2008, so let's modify the script a little bit to demonstrate the new payload mechanism:

## Process two sends two notices, but does not commit
$dbh2->do(q{NOTIFY jtx, 'square'});
$dbh2->do(q{NOTIFY jtx, 'square'});
## Process one does not see the notice yet
print show_notices($dbh1);
## Process two sends the same notice again, then commits
$dbh2->do(q{NOTIFY jtx, 'triangle'});
 ## This part changes: we get an extra item from our array:
 $messages .= "Got notice '$n->[0]' from PID $n->[1] message is '$n->[2]'\n";

Here's what the output looks like under version 9.0 of Postgres:

Postgres version is 90000
Process one has a PID of 19089
Process two has a PID of 19090
No messages
No messages
Got notice 'jtx' from PID 19090 message is 'square'
Got notice 'jtx' from PID 19090 message is 'triangle'
No messages

Note that the collapsing of identical messages into a single notification now takes into account the message as well, so we received two notifications in the above example for the three total notifications sent. To add a payload, we simply say NOTIFY, then the name of the notification, add a comma, and specify a payload as a quoted string. Of course, the payload string is still completely optional. If no payload is specified, DBD::Pg will simply treat the payload as an empty string (this is also the behavior when you request the payload using DBD::Pg against a pre-9.0 server, so all combinations should be 100% backwards compatible).

We also got rid of the sleep. Because we are now using shared memory instead of system tables, there is no lag whatsoever, and the other process can see the notices right away.

Another large advantage to removing the pg_listener table is that systems that make heavy use of it (such as the replication systems Bucardo and Slony) no longer have to worry about bloat in these tables.

The use of payloads also means that many application can be greatly simplified: in the past, one had to be creative in the name of your notifications in order to pass meta-information to your listener. For example, Bucardo uses a large collection of notifications, meaning that the Bucardo processes had to do the equivalent of things like this:

$dbh->do(q{LISTEN bucardo_reload_config});
$dbh->do(q{LISTEN bucardo_log_message});
$dbh->do(q{LISTEN bucardo_activate_sync_$sync});
$dbh->do(q{LISTEN bucardo_deactivate_sync_$sync});
$dbh->do(q{LISTEN bucardo_kick_sync_$sync});
while (my $notice = $dbh->func('pg_notifies')) {
 my ($name, $pid) = @$notice;
 if ($name eq 'bucardo_reload_config') {
 elsif ($name =~ /bucardo_kick_sync_(.+)/) {

We can instead do things like this:

$dbh->do(q{LISTEN bucardo});
while (my $notice = $dbh->func('pg_notifies')) {
 my ($name, $pid, $msg) = @$notice;
 if ($msg eq 'bucardo_reload_config') {
 elsif ($msg =~ /bucardo_kick_sync_(.+)/) {

I hope to add this support to Bucardo shortly; it's simply a matter of refactoring all the listen and notify calls into a function that does the right thing depending on the server version it is attached to.

PostgreSQL odd checkpoint failure

Nothing strikes fear into the heart of a DBA like error messages, particularly ones which indicate that there may be data corruption. One such situation happened recently to us, when we ran into a recent unusual situation in an upgrade to PostgreSQL 8.1.21. We had updated the software and manually been running a REINDEX DATABASE command, when we started to notice some errors being reported on the front-end. We decided to dump the database in question to ensure we had a backup to return to, however we still ended up with more messages:

  pg_dump -Fc database1 > pgdump.database1.archive

  pg_dump: WARNING:  could not write block 1 of 1663/207394263/443523507
  DETAIL:  Multiple failures --- write error may be permanent.
  pg_dump: ERROR:  could not open relation 1663/207394263/443523507: No such file or directory
  CONTEXT:  writing block 1 of relation 1663/207394263/443523507
  pg_dump: SQL command to dump the contents of table "table1" failed: PQendcopy() failed.
  pg_dump: Error message from server: ERROR:  could not open relation 1663/207394263/443523507: No such file or directory
  CONTEXT:  writing block 1 of relation 1663/207394263/443523507
  pg_dump: The command was: COPY public."table1" (id, field1, field2, field3) TO stdout;

Looking at the pg_database contents revealed that 207394263 was not even the database in question. I connected to the aforementioned database and looked for a relation that matched that pg_class.oid, and barring that pg_class.relfilenode. This search revealed nothing. So where was the object itself living, and why were we getting this message?

We decided that since it appeared that something was awry with the database system in general, that we should take this opportunity to dump the tables in question. I proceeded to write a quick script to go through the database tables and dump each one individually using pg_dump's -t option. This worked for some of the tables, but not all of them, which would die with the same error. Looking at the pg_class.relpages field for the non-dumpable tables revealed that these were all the larger tables in the database. Obviously not good, since this is where the bulk of the data lay. However, we also noticed that the message that we got referenced the exact same filesystem path, so it appeared to be something separate from the table that was being dumped.

After some advice on IRC, we reviewed the logs for checkpoint logging, which revealed that checkpoints had been failing. This further meant that the database was in a state such that it could not be shut down cleanly, had we wanted to try to restart to see if that cleared up the flakiness. This further meant that we'd only be able to shutdown via a hard kill, which is definitely something to avoid, WAL or not, particularly since there had not been a checkpoint for some time. A manual CHECKPOINT further failed after a timeout.

Before we went down the road of forcing a hard server shutdown, we ended up just touching the specific relation path in question into existence and then running a CHECKPOINT. This time since the file existed, it was able to complete the checkpoint, and restore working order to the database. We successfully (and quickly) ran a full pg_dump, and went about the task of manually vetting a few of the affected tables, etc.

Our working theory for this is that somehow there was a dirty buffer that referenced a relation that no longer existed, and hence when the there was a checkpoint or other event which attempted to flush shared_buffers (i.e., the loading of a large relation which would require a flush of Least Recently Used pages as in the pg_dump case), the flush attempt for the missing relation failed, which aborted the checkpoint/other action.

After the file existed and PostgreSQL had successfully synched to disk, it was a single two-block file, of which the first block was completely empty and the second block looked like an index page (due to the layout/contents of the data). The most suggestive cause was that had been an interrupted REINDEX earlier in the day. Since this machine was showing no other signs of data corruption and everything else seemed reasonable, our best guess is that there was some race condition that had caused the relation's data to exist in memory even while the canceled REINDEX ensured that the actual relfile and the pg_class rows did not exist for the buffer.

Perl Testing - stopping the firehose

I maintain a large number of Perl modules and scripts, and one thing they all have in common is a test suite, which is basically a collection of scripts inside a "t" subdirectory used to thoroughly test the behavior of the program. When using Perl, this means you are using the awesome Test::More module, which uses the Test Anything Protocol (TAP). While I love Test::More, I often find myself needing to stop the testing entirely after a certain number of failures (usually one). This is the solution I came up with.

Normally tests are run as a group, by invoking all files named t/*.t; each file has numerous tests inside of it, and these individual tests issue a pass or a fail. At the end of each file, a summary is output stating how many tests passed and how many failed. So why is stopping after a failed test even needed? The reasons below mostly relate to the tests I write for the Bucardo program, which has a fairly large and complex test suite. Some of the reasons I like having fine-grained control of when to stop are:

  • Scrolling back through screens and screens of failing tests to find the point where the test began to fail is not just annoying, but a very unproductive use of my time.
  • Tests are very often dependent. If test #23 fails, it means there is a very good chance that most if not all of the subsequent tests are going to fail as well, and it makes no sense for me to look at fixing anything but test #23 first.
  • Tests can take a very long time to run, and I can't wait around for the errors to start appearing and hit ctrl-c. I need to kick them off, go do something else, and then come back and have the tests stop running immediately after the first failed test. Bucardo tests, for example, create and startup four different Postgres clusters, populates the databases inside each cluster with test data, installs a fresh copy of Bucardo, and *then* begins the real testing. No way I'm going to wait around for that to happen.
  • Debugging is greatly aided by having the tests stop where I want them to. Often tests after the failing one will modify data and otherwise destroy the "state" such that I cannot manually duplicate the error right then and there, and thus fix it easily.

For now, my solution is to override some of the methods from Test::More. I have a standard script that does this, and I 'use' this script after I 'use Test::More' inside my test scripts. For example, a test script might look like this:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;
use Test::More tests => 356;
use TestOverride;

sub some_function {
       my $arr = [];
       push @$arr => 4,9;
       return [$arr];

my $t = q{Function some_function() returns correct value when called with 'foo'};
my $value = some_function('foo');
my $res = [[3],[5]];
is_deeply( $value, $res, $t);


$t = q{Value of baz is 123};
is ($baz, 123, $t);

In turn, the TestOverride file contains this:

use Data::Dumper;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Pad = '|';

use base 'Exporter';
our @EXPORT = qw{ is_deeply like pass is isa_ok ok };

my $bail_on_error = $ENV{TESTBAIL} || 0;

my $total_errors = 0;

sub is_deeply {

   # Return right away if the test passes
   my $rv = Test::More::is_deeply(@_);
   return $rv if $rv;

   if ($bail_on_error and ++$total_errors >= $bail_on_error) {
       my ($file,$line) = (caller)[1,2];
       Test::More::diag("GOT: ".Dumper $_[0]);
       Test::More::diag("EXPECTED: ".Dumper $_[1]);
       Test::More::BAIL_OUT "Stopping on a failed 'is_deeply' test from line $line of $file.";


} ## end of is_deeply

sub is {
   my $rv = Test::More::is(@_);
   return $rv if $rv;
   if ($bail_on_error and ++$total_errors >= $bail_on_error) {
       my ($file,$line) = (caller)[1,2];
       Test::More::BAIL_OUT "Stopping on a failed 'is' test from line $line of $file.";
} ## end of is

The is_deeply compares two arbitrary Perl structures (such as the arrayref here, but it can do hashes as well), and points out if they differ, and where. The "deeply" is because it will walk through the entire structure to find any differences. Good stuff.

Some things to note about the new is_deeply function: first, we simply pass in our parameters to the "real" is_deeply subroutine - the one found inside the Test::More package. If this passes (by returning true), we simply pass that truth back to the caller, and it's completely as if is_deeply had not been overwritten at all. However, if the test fails, Test::More::is_deeply will output a failure notice, but we check to see if the total number of failures for this test script ($total_errors) is greater than or equal to the threshold ($bail_on_error) that we set via then environment variable TESTBAIL. (Having it as an environment variable that defaults to zero allows the traditional behavior to be easily changed without editing any files).

If the number of failed tests is over our threshhold, we call the BAIL_OUT method from Test::More, which not only stops the current test script from running any more scripts, but stops any subsequent test files from running as well.

Before calling BAIL_OUT however, we also take advantage of the overriding to provide a little more detail about the failure. We output the line and file the test came from (because Test::More::is_deeply only sees that we are calling it from within the file). Most importantly, we output a complete dump of the expected and actual structures passed to is_deeply to be compared. The regular is_deeply only describes where the first mismatch occurs, but I often need to see the entire surrounding object. So rather than normal output looking like this:

not ok 1 - Function some_function() returns correct value when called with 'foo'
#   Failed test 'Function some_function() returns correct value when called with 'foo''
#   at test1.t line 18.
#     Structures begin differing at:
#          $got->[0] = '4'
#     $expected->[0] = '3'
# Looks like you planned 356 tests but ran 1.
# Looks like you failed 1 test of 1 run.

The new output looks like this:

not ok 1 - Function some_function() returns correct value when called with 'foo'
#   Failed test 'Function some_function() returns correct value when called with 'foo''
#   at line 23.
#     Structures begin differing at:
#          $got->[0] = '4'
#     $expected->[0] = '3'
# GOT: |[
# |  4,
# |  [
# |    9
# |  ]
# |]
# |  3
# |]
Bail out!  Stopping on a failed 'is_deeply' test from line 17 of test1.t.

Yes, the Test::Most module does some similar things, but I don't use it because it's yet another module dependency, it doesn't allow me to control the number of acceptable failures before bailing, and it doesn't show pretty output for is_deeply.

jQuery Auto-Complete in Interchange

"When all you have is a hammer, everything looks like a nail."

Recently, I've taken some intermediate steps in using jQuery for web work, in conjunction with Interchange and non-Interchange pages. (I'd done some beginner stuff, but now I'm starting to see nails, nails, and more nails.)

Here's how easy it was to add an auto-complete field to an IC admin page. In this particular application, a <select> box would have been rather unwieldy, as there were 400+ values that could be displayed.

<script src="//" type="text/javascript"></script>
<script type="text/javascript" src=""></script>
<script type="text/javascript" src=""></script>
<script type="text/javascript" src=""></script>

That's the requisite header stuff. Then you set up the internal list of autocomplete terms:

<script type="text/javascript">
var auto_list = "[perl]...[/perl]".split(" ");

The [perl] block just needs to emit a space-delimited list of the autocomplete terms. For instance,

[perl table="foo"]
return join(' ', map { $_->[0] }
@{ $Db{foo}->query('SELECT keycol FROM foo ORDER BY 1') });

Reducing bloat without locking

It's not altogether uncommon to find a database where someone has turned off vacuuming, for a table or for the entire database. I assume people do this thinking that vacuuming is taking too much processor time or disk IO or something, and needs to be turned off. While this fixes the problem very temporarily, in the long run it causes tables to grow enormous and performance to take a dive. There are two ways to fix the problem: moving rows around to consolidate them, or rewriting the table completely. Prior to PostgreSQL 9.0, VACUUM FULL did the former; in 9.0 and above, it does the latter. CLUSTER is another suitable alternative, which also does the latter. Unfortunately all these methods require heavy table locking.

Recently I've been experimenting with an alternative method -- sort of a VACUUM FULL Lite. Vanilla VACUUM can reduce table size when the pages at the end of a table are completely empty. The trick is to empty those pages of live data. You do that by paying close attention to the table's ctid column:

5432 josh@josh# \d foo
      Table ""
 Column |  Type   | Modifiers 
 a      | integer | not null
 b      | integer | 
    "foo_pkey" PRIMARY KEY, btree (a)

5432 josh@josh# select ctid, * from foo;
 ctid  | a | b 
 (0,1) | 1 | 1
 (0,2) | 2 | 2
(2 rows)

The ctid is one of several hidden columns found in each PostgreSQL table. It shows up in query results only if you explicitly ask for it, and tells you two values: a page number, and a tuple number. Pages are numbered sequentially from zero, starting with the first page in the relation's first file, and ending with the last page in its last file. Tuple numbers refer to entries within each page, and are numbered sequentially starting from one. When I update a row, the row's ctid changes, because the update creates a new version of the row and leaves the old version behind (see this page for explanation of that behavior).

5432 josh@josh# update foo set a = 3 where a = 2;
5432 josh@josh*# select ctid, * from foo;
 ctid  | a | b 
 (0,1) | 1 | 1
 (0,3) | 3 | 2
(2 rows)

Note the changed ctid for the second row. If I vacuum this table now, I'll see it remove one dead row version, from both the table and its associated index:

5432 josh@josh# VACUUM verbose foo;
INFO:  vacuuming ""
INFO:  scanned index "foo_pkey" to remove 1 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": removed 1 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "foo_pkey" now contains 2 row versions in 2 pages
DETAIL:  1 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": found 1 removable, 2 nonremovable row versions in 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
1 pages contain useful free space.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

So given these basics, how can I make tables smaller? Let's build a bloated table:

5432 josh@josh# truncate foo;
5432 josh@josh*# insert into foo select generate_series(1, 1000);
INSERT 0 1000
5432 josh@josh*# delete from foo where a % 2 = 0;
5432 josh@josh*# select max(ctid) from foo;
(1 row)
5432 josh@josh# vacuum verbose foo;
INFO:  vacuuming ""
INFO:  scanned index "foo_pkey" to remove 500 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "foo": removed 500 row versions in 4 pages

I've filled the table with 1000 rows, and then deleted every other row. The last tuple is on the fourth page (remember they're numbered starting with zero), but since half the table is empty space, I can probably squish it into three or maybe just two pages. I'll start by moving the tuples on the last page off to another page, by updating them:

5432 josh@josh# begin;
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
5432 josh@josh*# update foo set a = a where ctid >= '(3,0)';
5432 josh@josh*# commit;

Here I'm not changing the row at all, but the tuples are moving around into dead space earlier in the table; this is apparent because the number of rows affected decreases. For the first update or two, there's room enough on the page to store all the new rows, but after a few updates they have to start moving to new pages. Eventually the row count goes to zero, meaning there are no rows on or after page #3, so vacuum can truncate that page:

5432 josh@josh# vacuum verbose foo;
INFO:  vacuuming ""

INFO:  "foo": truncated 4 to 3 pages

It's important to note that I did this all within a transaction. If I hadn't, there's a possibility that vacuum would have reclaimed some of the dead space made by the updates, so instead of moving to different pages, the tuples would have moved back and forth within the same page.

There remains one problem: I can't remove index bloat, and in fact, all this tuple-moving causes more index bloat. I can't fix that completely, but in PostgreSQL 8.3 and later I can avoid creating too much new bloat by updating an unindexed column instead of an indexed one. In PostgreSQL 8.3 and later, the heap-only tuples (HOT) feature avoids modifying indexes if:

  1. the update touches only unindexed columns, and
  2. there's sufficient free space available for the tuple to stay on the same page.
Despite the index bloat caveat, this can be a useful technique to slim down particularly bloated tables without VACUUM FULL and its associated locking.

CSS Sprites and a "Live" Demo

I've recently recommended CSS sprites to several clients, but the majority don't understand what CSS sprites are or what their impact is. In this article I'll present some examples of using CSS sprites and their impact.

First, an intro: CSS sprites is a technique that uses a combination of CSS rules and a single background image that is an aggregate of many smaller images to display the image elements on a webpage. The CSS rules set the boundaries and offset that define the part of the image to show. I like to refer to the technique as analogous to the "Ouija board"; the CSS acts as the little [rectangular] magnifying glass to show only a portion of the image.

It's important to choose which images should be in a sprite based on how much each image is repeated throughout a site's design and how often it might be replaced. For example, design border images and icons will likely be included in a sprite since they may be repeated throughout a site's appearance, but a photo on the homepage that's replaced daily is not a good candidate to be included in a sprite. I also typically exclude a site's logo from a sprite since it may be used by externally linking sites. End Point uses CSS sprites, but only for a few elements:

End Point's CSS sprite image only contains borders and Twitter and LinkedIn images.

"Live" Demo

I wanted to implement CSS sprites to demonstrate their performance impact. I chose to examine the homepages of my favorite ski resorts in Utah, Snowbird and Alta. First, I used to get a benchmark for each homepage:

Without sprites, Alta's homepage loaded in 3.440 seconds with a repeat request load time of 1.643 seconds. 22 requests are made on the homepage. Without sprites, Snowbird's homepage loaded in 7.070 seconds with a repeat request load time of 3.146 seconds. 57 requests are made on the homepage.

After I benchmarked the two pages, I downloaded each homepage and its files and examined the images to build a CSS sprite image. I created the sprite images shown below. I chose to exclude the logo from each sprite, in addition to other time-sensitive images. Each sprited image contains navigation elements, icons, and a few homepage specific images.

My resulting sprited image for Alta.

My resulting sprited image for Snowbird.

I updated the HTML to remove individual image requests. Below are some examples:

<a href=""> <img src="./alta_files/banner_contact.gif" name="ContactUs" width="100" height="20" border="0"> </a>
<a href="/" class="sprite" id="banner_contact"></a>

<a href="" target="_self"> <img src="./alta_files/altaskishoplogo.jpg" alt="March Madness Logo" width="180" height="123" border="0"> </a>
<a href="" target="_self" id="skishoplogo" class="sprite"></a>
<div class="icon"><img src="./snowbird_files/icon_less_rain.gif" border="0" alt="Sunny" title="Sunny"></div>
<div class="icon sprite less_rain"></div>

<div><input type="image" src="./snowbird_files/btn_check_rates.gif" border="0" alt="Check Rates"></div>
<div class="sprite" id="check_rates"></div>

There are a few CSS tips to be aware of during CSS sprite implementation, such as:

  • Padding on sprited elements will affect the sprite position.
  • Links must have the "display:block" rule (combined with floating rules) to enforce a height and width.
  • Parts of the sprite that are repeating may not have any other elements along the repeating axis. For example, in End Point's sprite, the top and bottom repeating border are in the sprite. No other images may be included in the sprite to the left or right of the borders.

The CSS for the new sprites is shown below. For all sprited elements, a height and width is set in addition to a background position. The height, width, and background position are the rules that define the region of the sprited image to show.

Alta CSS sprite rules
.sprite { background-image: url(sprites.gif); display: block; float: left; }
a#banner_logo { width: 100px; height: 100px; }
div#banner_top { width: 600px; height: 100px; background-position: -100px 0px; }
a#banner_home { width: 100px; height: 20px; background-position: -700px 0px; }
a#banner_contact { width: 100px; height: 20px; background-position: -700px -20px; }
a#banner_sitemap { width: 100px; height: 20px; background-position: -700px -40px; }
a#banner_press { width: 100px; height: 20px; background-position: -700px -60px; }
a#banner_weather { width: 100px; height: 20px; background-position: -700px -80px; }
a#skishoplogo { width: 180px; height: 123px; background-position: -298px 123px; }
a#skihistory { width: 164px; height: 61px; background-position: -479px 123px; margin-right: 40px; }
a#altaenviron { width: 298px; height: 61px; background-position: 0px 123px; }
Snowbird CSS sprite rules
.sprite { background: url(sprites.gif); }
a.sprite { display: block; float: left; margin-right: 5px; border: 2px solid #FFF; }
div#headerWeather div.icon { width: 36px; height: 26px; }
div.less_rain { background-position: -118px -142px; }
div.sunny { background-position: -177px -116px; }
div.partly_cloudy { background-position: -138px -116px; }
a#facebook { width: 66px; height: 25px; background-position: 0px -142px; }
a#twitter { width: 55px; height: 25px; background-position: -66px -142px; }
a#youtube { width: 48px; height: 25px; background-position: 0px -116px; }
a#flickr { width: 94px; height: 25px; background-position: -48px -116px; }
a#picofday { width: 218px; height: 22px; border: none; margin: 2px 0px 0px 2px; }
div#check_rates { width: 89px; height: 21px; background-position: 0px -261px; margin: 10px 0px; float: right; }
div#br_corner_light { width: 17px; height: 14px; background-position: -89px -268px; }
div#bl_corner_light { width: 17px; height: 14px; background-position: -105px -268px; }

After spriting the images shown above, I returned to to examine the impact of the sprites. In most cases, I maintained the image format to limit the performance change to spriting only. I also did not change HTML or CSS even if I noticed other performance improvement opportunities. Here are the new results:

With sprites, Alta's homepage loaded in 2.768 seconds with a repeat request load time of 1.093 seconds. With sprites, Snowbird's homepage loaded in 6.289 seconds with a repeat request load time of 2.513 seconds.

A summary of differences shows:

  • Both pages decreased the number of requests by 10.
  • Alta decreased homepage load by 0.672 seconds, or 20% of the original page load. Snowbird decreased homepage load by 0.781 seconds, or 11% of the original page load.
  • On repeat views, Alta's homepage load would be decreased by 0.665 seconds, or 40% of the original repeated page load. On repeat view, Snowbird's homepage load would be decreased by 0.633 seconds, or 20% of the original repeat page load.

There is no reason to avoid sprites if a design has repeating elements or icons. An increase in performance will reduce load time for the customer. If a CDN is in place, CSS sprites can result in a decreased bandwidth cost. The improved performance also can indirectly positively influence search engine performance since search engines may use performance as an influencing factor in search. This improved performance can also make for a better mobile browsing experience. In our examples, the use of CSS sprites decreased first request page load time by 10-20%, but this amount may vary depending on the frequency of images used in a site's design.

Guidelines for Interchange site migrations

I'm involved at End Point often with Interchange site migrations. These migrations can be due to a new client coming to us and needing hosting or migrating from one server to another within our own infrastructure.

There are many different ways to do a migration, in the end though we need to hit on certain points to make sure that the migration goes smoothly. Below you will find steps which you can adapt for your specific migration.

During the start of the migration it might be a good time to introduce git for source control. You can do this by creating the repository and cloning it to /home/account/live, setting up .gitignore files for logs, counter files, gdbm files. Then commit the changes back to the repo and you've now introduced source control without much effort, improving the ability to make changes to the site in the future. This is also helpful to document the changes you make to the code base along the way during the migration in case you need to merge changes from the current production site before completing the migration.

  • Export all of the gdbm databases to their text file equivalents on the production server
  • Take a backup from production of the database, catalog, interchange server, htdocs
  • Setup an account
  • Create the database and user
  • Restore the database, catalog, interchange server and htdocs
  • Update the paths in interchange/bin for each script to point at the new location
  • Grep the restored code for hard coded paths and update those paths to the new locations. Better yet move these paths out to a catalog_local.cfg where environment specific information can go.
  • Grep the restored code for hard coded urls and use the [area] tag to generate the urls
  • Update the urls in products/variable.txt to point at the test domain
  • Update the sql settings in products/variable.txt to point at the new database using the new user
  • Remove the gdbm databases so they will be recreated on startup from the source text files
  • Install a local Perl if it's not already installed (./configure -des will compile and install Perl locally)
  • Install Bundle::InterchangeKitchenSink
  • Install the DBD module for MySQL or PostgreSQL
  • Review the code base looking for use statements in custom code and Require module settings in interchange.cfg. Install the Perl modules found into the local Perl.
  • Setup a non ssl and ssl virtual host using a temporary domain. Configure the temporary domain to use the SSL certificate from the production domain.
  • Firewall or password protect the virtual host so it is not accessible to the public
  • Generate a vlink using interchange/bin/compile and copy it into the cgi-bin directory and name it properly
  • Startup the new Interchange
  • Review error messages and resolve until Interchange will start properly
  • Test the site thoroughly, resolving issues as they appear. Make sure that checkout, charging credit cards, sending of emails, using the admin, etc all function.
  • Migrate any cron jobs running on the current production site, such as session expiration scripts
  • Setup logrotation for the new logs that will be created
  • Verify that you have access to make DNS changes
  • Set the TTL for the domain to a low value such as 5 minutes
  • Modify the new production site to respond to the production url, test by updating your hosts file to manually set the IP address of the domain
  • Shutdown the new Interchange
  • Restore a copy of the original backup for Interchange, the catalog and htdocs to /tmp on the production server
  • Shutdown the production Interchange, put up a maintenance note on the production site.
  • Take a backup of the production database and restore on the new server
  • Diff the Interchange, catalog and htdocs directory between /tmp and the current production locations, making note of the files that have changed since we took the original copy.
  • Copy the files that have changed, making sure to merge with any changes we have made on the new production site. Making sure to copy over all .counter and .autonumber files to the new production site.
  • Start Interchange on the new production server
  • Test the site thoroughly on the new production server, using the production url. Make sure that checkout with charging the credit card functions properly.
  • Resolve any remaining issues found during the testing
  • Setup the Interchange daemon to start at boot for this site in /etc/rc.d/rc.local or in cron using @reboot
  • Update DNS to point at the new production IP address
  • Update the TTL of the domain to a longer value
  • Open the site to the public by opening the firewall or removing the password protection
  • Keep an eye on the error logs for any issues that might crop up

This will hopefully give you a solid guide for performing an Interchange site migration from one server to another and some of the things to watch out for that might cause issues during the migrations.

Long Lasting SSH Multiplexing Made Simplish

My Digression

To start off digressing just a little, I am primarily a developer, lately on longer projects of relatively significant size which means that I stay logged in to the same system for weeks (so long as my X session, local connection, and remote server allow it). I'm also a big believer in lots of screen real estate and using it all when you have it, so I have two monitors running high resolutions (1920x1200), and I use anywhere from 4-6 virtual pages/desktops/screens/(insert term of preference here). On 2-3 of those spaces I generally keep 8 "little black screens" (as my wife likes to call them) or terminals open in 2 rows of 4 windows. In each of those terminals I stay logged in to the same system as long as those aforementioned outside sources will allow me. A little absurd I know...

If you are still with me you may be thinking "why don't you just use `screen`?", I do, in each of those terminals for the inevitable times when network hiccups occur, etc. Next you'll ask, "why don't you just use separate tabs/windows/partitions/(whatever they are called) within screen?", I do, generally I have 2-3 (or more) in each screen instance set to multiple locations on the remote file system. In general I've just found being able to look across a row of terminals at various different libs, a database client, tailed logs, etc. is much faster than toggling between windows (heaven forbid they'd be minimized), screen pages, virtual desktops, or even shell job control (though I use a good bit of that too). Inevitably I also seem to pop up an additional window (or two) every now and again and ssh into the same remote destination for the occasional one off command (particularly since I've never quite gotten the SSH key thing figured out with long running screens, potential future blog post?). So the bottom line, for any given sizable project I probably ssh to the same location a minimum of 8 times each time I set up my desktop and over the course of a particular X session maybe 20 (or more) times? (End digression)

The Point

As fast as SSHing is, waiting for a prompt when doing it that many times (particularly over the course of a year, or career) really starts to make the clock tick by (or at least it seems that way). Enter "multiplexing" which is essentially a wonderful feature in newer SSH that allows you to start one instance with a control channel that handles many of the slow parts (such as authentication) and so long as it is still running when you connect to the same remote location the new connection uses the existing control channel and is lightning fast getting you to the prompt. Simple enough, to turn on multiplexing you can add the following to your ~/.ssh/config:

# for multiplexing
ControlMaster auto
ControlPath ~/.ssh/multi/master-%r@%h:%p

The above indicates that a master should be used for each connection, and the location of where to store the master's control socket file. The %r, %h, and %p are expanded to the login, host, and port respectively which is usually enough to make it unique. This should be enough to start using multiplexing, but...and you knew there had to be one...when the master's control connection is lost all of the slaves to that connection lose their's as well. With the occasional hung terminal window, or accidental closing of it (if you can remember which is master to begin with), etc. you quickly find that when you normally would not lose connection in a separate terminal window you all of a sudden have lost all of your connections (8+ in my case) which is really painful. Here is where the fun comes in, I use the "-n" and "-N" flags to SSH in a terminal window when I first load up an X session and background the process:

> ssh -Nn user@remote &

The above redirects stdin from /dev/null (a necessary evil when backgrounding SSH procs), prevents the execution of a remote command (meaning we don't want a shell), and puts the new process in the local shell's background. Unfortunately, and the part that took me the longest to figure out, is that SSH really likes to have a TTY around (we aren't using the daemon after all) so simply killing the original terminal window will cause the SSH process to die and zap there went your control connection and all the little children. To get around this little snafu I follow the backgrounding of my SSH process with a bash specific built in disassociating it from the TTY:

> disown

Now I am free to close the original terminal window, the SSH process lives on in the background (as if it were a daemon) and keeps the control connection open so that whenever I use SSH to that remote location (or 'scp', etc.) I get an instant response.

Other notes

I could probably set this up to occur when I start X initially automatically, but with a flaky connection I end up needing to do it a few times per X session anyways and you would have to watch out for the sequence of events making sure it occurred after any ssh-agent set up was required.

I tried using 'nohup' and can't remember if I ran into an actual show stopper problem, or if I could just never quite get it to do what I wanted before I stumbled on bash's disown.