Welcome to End Point’s blog

Ongoing observations by End Point people

Multi-store Architecture for Ecommerce

Something that never seems to go out of style in ecommerce development is the request for multi-site or multi-store architecture running on a given platform. Usually there is interest in this type of setup to encourage build-out and branding of unique stores that have shared functionality.

A few of's stores driven by a multi-store architecture, developed with End Point support.

End Point has developed several multi-store architectures on open source ecommerce platforms, including (Interchange/Perl), College District (Interchange/Perl), and Fantegrate (Spree/Rails). Here's an outline of several approaches and the advantages and disadvantages for each method.

Option #1: Copy of Code Base and Database for Every Site

This option requires multiple copies of the ecommerce platform code base, and multiple database instances connected to each code base. The stores could even be installed on different servers. This solution isn't a true multi-store architecture, but it's certainly the first stop for a quick and dirty approach to deploy multiple stores.

The advantages to this method are:

  • Special template logic doesn't have to be written per site — the templates would simply follow the ecommerce platform's template pattern.
  • Relative to Option #3 described below, no custom database development is required.
  • Custom business logic may be more easily applied to a set of the stores, without affecting the other stores.

The disadvantages to this method are:

  • Maintenance of the applications can be time consuming, as changes must be applied to all instances.
  • Custom changes must be applied to all multi-store instances.
  • Users and administrator accounts are not shared across multiple stores.

Option #2: Single Code Base, Single Database

In this method, there is one copy of the source code that interacts with one database. The single database would be modified to contain a store specific id per product, order, and peripheral tables. The code base would also have to be modified to be able to limit the visible products described here. In this method, the individual store may be identified by the domain or subdomain. With this method, there may also be code customization that allows for custom templates per store.

The advantages to this method are:

  • Relative to Option #1, maintenance for one code base is relatively simple.
  • User and administrator accounts are shared across multiple stores.
  • Super administrators may view and manage data from one backend administrative interface.

The disadvantages to this method are:

  • Rights and role management can be complicated.
  • Development is required for code and database customization.
  • Development is required for coding to handle flexible templating across stores.

A second option in multi-store architecture may use a data model with store specific entries in various tables, described here.

Option #3: Single Code Base, Single Database with Schemas or Views Per Store

In this method, there is one copy of the source code that interacts with a database that has views specific to that store, or a schema specific to that store. In this case, the code base would not necessarily need customization since the data model it accesses should follow the conventions of the ecommerce platform. However, moderate database customization is required in this method. With this method, there may also be code customization that allows for custom templates per store.

The advantages to this method are:

  • Relative to Option #1, maintenance for one code base is relatively simple.
  • Relative to option #2, code base changes are minimal.
  • User accounts may or may not be shared across stores.
  • Relative to option #2, there may be a potential performance gain by removing time spent limiting data to the current store instance.

The disadvantage to this method is:

  • Customization and development is required for database configuration and management of multi-store database schemas.

A tangential variation on the methods above are two different codebases and functionality attached to one back-end web service and backing database, such as the architecture we implemented for Locate Express. And a similar tangential variation I've investigated before is one that might use a Sinatra driven front-end and a Rails backed admin, such as RailsAdmin used in Piggybak.

College District has a collection of stores driven by a multi-store architecture, developed with End Point support.


In most cases for our clients, there is cost-benefit analysis that drives the decision between the three options described above. Option #1 might be an acceptable solution for someone interested in building out two or three stores, but the latter two options would be more suitable for someone interested in spinning up many additional instances quickly with lower long term maintenance costs.

Rails 3 remote delete link handlers with Unobtrusive Javascript

I recently encountered a bug in a Rails 3 application that used a remote link_to tag to create a Facebook-style "delete comment" link using unobtrusive javascript. I had never worked with remote delete links like this before, so I figured I’d run through how I debugged the issue.

Here are the relevant parts of the models we're dealing with:

class StoredFile < ActiveRecord::Base
  has_many :comments, :dependent => :destroy
class Comment < ActiveRecord::Base
  belongs_to :user
  belongs_to :stored_file

Here's the partial that renders a single Comment (from the show.html.erb view for a StoredFile) along with a delete link if the current_user owns that single Comment:

<%= comment.content %> -<%= comment.user.first_name %>
<% if comment.user == current_user >
  <%= link_to 'X', stored_file_comment_path(@stored_file, comment), :remote => true, :method => :delete, :class => 'delete-comment' >
<% end ->

Here’s a mockup of the view with 3 comments:

At first, the bug seemed to be that the "X" wasn’t actually a link, and therefore, didn't do anything. Clicking the "X" with Firebug enabled told a different story. There was a link there (hidden by sneaky CSS,) and the Firebug console showed that it was sending the appropriate request to the correct url: /stored_files/78/comments/25

The development.log file on the server corraborated the story and showed a successful delete:

Started DELETE "/stored_files/78/comments/25"
  Processing by CommentsController#destroy as JS
  Parameters: {"stored_file_id"=>"78", "id"=>"25"}
  SQL (0.6ms)  DELETE FROM "comments" WHERE "comments"."id" = 25
  Completed 200 OK in 156ms

So far, so good. I know that our client code is making the correct request and the Rails app is handling it appropriately. I knew that the existing code "worked," but still didn’t provide UI feedback to inform the user. I needed to write some jQuery to handle the successful (HTTP 200) server response to our unobtrusive javascript call.

Normally, when writing my own handler (e.g. to handle a button click) to initiate an Ajax call with jQuery, I’d use $.ajax or $.post and use its built-in success handler. Something like this:

    type: 'POST',
    url: 'my_url',
    data: { param_name1: 'param_value1'},
    success: function(data){ 
        alert('Successfully did the needful!');

It turns out that you still define a event handler when handling server responses to unobtrusive javascript, it’s just that the syntax is very different. I needed to bind the ‘ajax:success’ event when it’s fired by any of my comment delete links (where class=”delete-comment”, as specified in my link_to call).

$(document).on('ajax:success', '.delete-comment', function() {
    // .parent() is the div containing this "X" delete link

(Note that I happen to be using the newer jQuery 1.7+ .on() method and argument list instead of the older, more common .live() style. In this case, they are functionally equivalent, but the methods that .on() replaces are deprecated in jQuery 1.7.)

Now, when the user clicks the "X" to delete one of their own comments, the successful unobtrusive javascript call is detected and the div containing that single comment is neatly hidden with the help of jQuery's slideUp() method. 

Much better!


Liquid Galaxy at the World Oceans Summit

Jenifer Austin Foulkes, Product Manager for Google Ocean contacted End Point for the purpose of using Liquid Galaxy as a presentation platform at ocean-centric conferences around the world. The World Oceans Summit was held at the Capella Singapore Resort, February 22nd through February 24th, 2012 and Liquid Galaxy was there:

World Oceans Summit

Josh Tolley of End Point specially developed a Google Earth tour for the Liquid Galaxy featuring Hope Spots:

"Hope Spots are special places that are critical to the health of the ocean, Earth’s blue heart."
    - From the Sylvia Earle Alliance website.

The Underwater Earth Seaview Project was unveiled for the first time in Singapore. Liquid Galaxy showcased some of the amazing panoramas created by the project. The panoramas are beautiful in the immersive format that a Liquid Galaxy provides.

"We’re on a mission to reveal our oceans. Why? Because the biggest threat to life in our oceans is the fact that it is out of sight and out of mind.”
      - From the Underwater Earth website.

Gap Kim from Google Singapore worked hard to get approvals to bring a portable Liquid Galaxy to the World Ocean Summit and also to have it subsequently installed at the Google Singapore office.

The green light for proceeding came just 13 days before the beginning of the Summit. Work on Ocean tours and conversion of the panoramas had been underway for some time but a Liquid Galaxy needed to be built and ready for shipment in 5 days, so a full court press was needed! Adam Vollrath and Ben Goldstein spent many hours compressed into a few days assembling and testing the system and packing it for shipment. Josh Tolley continued work on the panoramas and tour. Kiel Christofferson had been out in the field upgrading several Liquid Galaxy systems and he provided some crucial last-minute upgrades for this Singapore Liquid Galaxy while flying back to New York from California. (Hooray for WiFi on airplanes!) For my part it was all about providing shipping/receiving, import specifications and coordination for delivery of the Liquid Galaxy in this very short time frame. I knew it would come together. With this crew it always does.

Cut to February 23, Singapore. Adam was there but the monitors (another story) and the Liquid Galaxy would not be there together until late. Adam adapted and worked overnight before a full day of presentation at the Summit. Jenifer provided this video on the closing day of the Summit:

My favorite part of this video is at the very end when we hear Jenifer say "Yaaay!" You always like to hear the client say that. After reaching across the globe to help her save the oceans it feels particularly good.

Gerard Drazba,
Liquid Galaxy Project Manager
End Point Corporation

Spring authentication plugin

One of our clients regularly deploys Pentaho with their application, and wanted their users to be able to log in to both applications with the same credentials. We could, of course, have copied the user information from one application to another, but Pentaho, and the Spring system it uses for authentication, allows us to be much more elegant.

Spring is often described as an "application development framework", or sometimes an "Inversion of Control container", which essentially means that you can redefine, at run time, exactly which objects perform various services within your application. That you have to navigate a bewildering and tangled web of configuration files in order to achieve this lofty goal, and that those files suffer from all the verbosity you'd expect from the combined forces of XML and Java, normally isn't as loudly proclaimed. Those inconveniences notwithstanding, Spring can let you do some pretty powerful stuff, like in our case, redefining exactly how a user gets authenticated by implementing a few classes and reciting the proper incantation in the configuration files.

Spring handles most of the plumbing in the Pentaho authentication process, and it all starts with this particular definition from one of the Spring configuration files:

The authenticationManager bean contains a list of authentication processors, and here's where our custom code begins. These beans must implement the Spring AuthenticationProvider interface, and to customize the authentication we can just put a new bean in this list. The list is ordered; refer to the ProviderManager documentation for details, but essentially any of the beans in the list can accept a set of credentials. Here, we've simply added a reference to our new bean, to the beginning of the list. The new bean is indicated by name only; it needs to be defined elsewhere, like this:

Here we've mapped the bean name to a specific class name. We can also add whatever configuration properties we need. So long as there are corresponding setter methods in the actual code (e.g. "public void setProperty_1(String value)") these will Just Work. Along with these setter methods, we must implement the two methods in Spring's AuthenticationProvider interface. The most important is the aptly-named authenticate(), which Spring will call when a user tries to log in.

authenticate() gets one argument, an Authentication object, and when a user presents valid credentials, that's also what it returns. The first thing our new implementation needs to do is get the username and password from the authentication object:

At this point we've got two jobs to do: first, validate the credentials, and second (assuming the credentials are valid), determine what roles the user should be given. In this particular case we used the authentication database from the client's application to validate the credentials; that code will be completely different from one bean to the next, so I won't share it here, but the second part, finding the user's roles, is pretty consistent. Spring lets you set up as many roles as you'd like, giving each a different text name, and simply assumes all the different roles will mean something to the application (Pentaho, in this case) later on. For our setup, we look up the user's roles from a database, and then tell Spring about them like this:

At this point, the user has successfully logged in. If we didn't want to allow the user in, we could instead throw an AuthenticationException, and Spring would go on to next AuthenticationProvider in the list. The beauty of it all is now that I have a working plugin, I can modify my application's authentication system simply by modifying a configuration file or two (or three) and restarting the application.

Dowloading CSV File With From Django Admin

Django has a very nice admin panel. The admin panel is highly extensible and there can be performed really cool enhancements. One of such things is a custom action.

For the purpose of this this article I’ve created a simple Django project with a simple application containing only one model. The file looks like this:

from django.db import models
from django.contrib import admin

class Stat(models.Model):
    code = models.CharField(max_length=100)
    country = models.CharField(max_length=100)
    ip = models.CharField(max_length=100)
    url = models.CharField(max_length=100)
    count = models.IntegerField()

class StatAdmin(admin.ModelAdmin):
    list_display = ('code', 'country', 'ip', 'url', 'count'), StatAdmin)

I’ve also added a couple of rows in the database table for this model. The admin site for this model looks like this:

Now I want to be able to select some rows and download a CSV file right from the Django admin panel. The file should contain only the information about selected rows.

This can be done really easy with the admin actions mechanism. Over the table with rows there is the actions menu. There is one default action, it is "Delete selected stats". To use the action you need to select the rows, select the action from the combo box and press the OK button.

I will add another action there, which will be named "Download CSV file for selected stats".

Add the action.

First of all I will add the custom action. For this I will enhance the StatAdmin class with the field actions, and add the method, called when someone wants to run this action (all changes from the previous version are highlighted):

class StatAdmin(admin.ModelAdmin):
    actions = ['download_csv']
    list_display = ('code', 'country', 'ip', 'url', 'count')
    def download_csv(self, request, queryset):
    download_csv.short_description = "Download CSV file for selected stats."

In the admin panel for the stats model you can notice that there is the new action. The above code doesn’t do anything useful, so let’s generate the CSV file.

The whole idea is to generate the CSV file, don’t use any disk, do it in memory only and return to the user without any redirection to other page (the file should download automatically after pushing the OK button).

Do it in small steps:

Generate the CSV

For this I will use the CSV module from Python’s standard library and the function now looks like this:

def download_csv(self, request, queryset):
    import csv
    f = open('some.csv', 'wb')
    writer = csv.writer(f)
    writer.writerow(["code", "country", "ip", "url", "count"])
    for s in queryset:
        writer.writerow([s.code,, s.ip, s.url, s.count])

After selecting two rows and running the action, it created a file some.csv in the main project directory with the following content.

That’s OK, generating the CSV file works, however it shouldn’t be stored on the disk.

Return the file directly into the browser.

I want to send the file to the client right after clicking on the OK button. This is fairly easy, the whole magic is to use proper HTTP headers. I will modify the method to look like this:

def download_csv(self, request, queryset):
    import csv
    from django.http import HttpResponse

    f = open('some.csv', 'wb')
    writer = csv.writer(f)
    writer.writerow(["code", "country", "ip", "url", "count"])

    for s in queryset:
        writer.writerow([s.code,, s.ip, s.url, s.count])


    f = open('some.csv', 'r')
    response = HttpResponse(f, content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename=stat-info.csv'
    return response

So the main changes are: - closed the file and reopen it for reading - added headers for proper content type and file name.

The result is that when someone clicks on the OK button, the browser automatically starts downloading the stat-info.csv file.

Don’t use disk.

The only thing left: create the file in memory only. For this I will use StringIO module. It is a nice module implementing exactly the same interface as the file, so I can use it instead the file. StringIO operates only on memory without any disk operations.

def download_csv(self, request, queryset):
    import csv
    from django.http import HttpResponse
    import StringIO

    f = StringIO.StringIO()
    writer = csv.writer(f)
    writer.writerow(["code", "country", "ip", "url", "count"])

    for s in queryset:
        writer.writerow([s.code,, s.ip, s.url, s.count])
    response = HttpResponse(f, content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename=stat-info.csv'
    return response

As you can see the changes are: - added import StringIO - changed opening file to creating new StringIO object - there is no reopening the file, only seek to set the marker at the beginning of the file

Everything is finished now. There is a new action in the admin panel which generates a new CSV file with information about chosen rows and it doesn’t do any browser redirection.

Perl, UTF-8, and binmode on filehandles

Original image by avlxyz

I recently ran into a Perl quirk involving UTF-8, standard filehandles, and the built-in Perl die() and warn() functions. Someone reported a bug in the check_postgres program in which the French output was displaying incorrectly. That is, when the locale was set to FR_fr, the French accented characters generated by the program were coming out as "byte soup" instead of proper UTF-8. Some other languages, English and Japanese among them, seemed to be fine. For example:

## English: "sorry, too many clients already"
## Japanese: "現在クライアント数が多すぎます"
## French expected: "désolé, trop de clients sont déjà connectés"
## French actual: "d�sol�, trop de clients sont d�j� connect�s"

That last line should be very familiar to anyone who has struggled with Unicode on a command line, with those question marks on an inverted background. Our problem was that the output of the script looked like the last line, rather than the one before it. The Japanese output, despite being chock full of Unicode, does have the same problem! More on that later.

I was able to duplicate the problem easy enough by setting my locale to FR_fr and having check_postgres output a message with some non-ASCII characters in it. However, as noted above, some languages were fine, some were not.

Before going any further, I should point out that this Perl script did have a use utf8; at the top of it, as it should. This does not dictate how things will be read in or output,but merely tells Perl that the source code itself contains UTF-8 characters. Now to the quirky parts.

I normally test my Perl scripts on the fly by adding a quick series of debugging statements to warn()s or die()s. Both go to stderr, so it is easy to separate your debugging statements from normal output of the code. However, when I output a non-ASCII message in question immediately after it was defined in the script, it showed a normal, expected UTF-8 string. So I started tracking things through the code, to see if there was some point at which the apparently normal UTF-8 string gets turned back into byte soup. It never did; I finally realized that although print was outputting byte soup, both warn() and die() were outputting UTF-8! Here's a sample script to better demonstrate the problem:


use strict;
use warnings;
use utf8;

my $msg = 'This is a micro symbol: µ';

print "print = $msg\n";
warn "warn = $msg\n";
die "die = $msg\n";

Now let's run it and see what happens:

print = This is a micro symbol: �
warn = This is a micro symbol: µ
die = This is a micro symbol: µ

So we've found one Perl quirk: the output of print() and warn() are different, as warn() manages to correctly output the string as UTF-8. Perhaps it is just that the stdout and stderr filehandles are using different encodings? Let's take a look by expanding the script and explicitly printing to both stdout and stderr. We'll also add some other Unicode characters, to emulate the difference between French and Japanese above:


use strict;
use warnings;
use utf8;

my $msg = 'This is a micro symbol: µ';
my $alert = 'The radioactive snowmen come in peace: ☢ ☃☃☃ ☮';

print STDOUT "print to STDOUT = $msg\n";
print STDOUT "print to STDOUT = $alert\n";

print STDERR "print to STDERR = $msg\n";
print STDERR "print to STDERR = $alert\n";

warn "warn = $msg\n";
warn "warn = $alert\n";

(Note: if you do not see small literal snowmen characters in the above script, you need to get a better browser or RSS reader!)

print to STDOUT = This is a micro symbol: �
Wide character in print at utf12 line 11.
print to STDOUT = The radioactive snowmen come in peace: ☢ ☃☃☃ ☮
print to STDERR = This is a micro symbol: �
Wide character in print at utf12 line 14.
print to STDERR = The radioactive snowmen come in peace: ☢ ☃☃☃ ☮
warn = This is a micro symbol: µ
warn = The radioactive snowmen come in peace: ☢ ☃☃☃ ☮

There are a number of things to note here. First, that the stderr filehandle has the same problem as the stdout filehandle. So, while warn() and die() send things to stderr, there is some magic happening behind the scenes such
that sending a string to them is *not* the same as sending it to stderr ourselves via a print statement. Which is a good thing overall, as it would be more weird for stdout and stderr to have different encoding layers! The solution to this is simple enough: just force stdout to have the proper encoding by use of the binmode function:

binmode STDOUT, ':utf8';

Indeed, the one line above solved the original poster's problem; applying it to our test script shows that the stdout filehandle now outputs things correctly, unlike the stderr filehandle:

print to STDOUT = This is a micro symbol: µ
print to STDOUT = The radioactive snowmen come in peace: ☢ ☃☃☃ ☮
print to STDERR = This is a micro symbol: �
Wide character in print at utf12 line 16.
print to STDERR = The radioactive snowmen come in peace: ☢ ☃☃☃ ☮
warn = This is a micro symbol: µ
warn = The radioactive snowmen come in peace: ☢ ☃☃☃ ☮

The next thing to notice is that the snowmen alert message is displayed properly everywhere. Why is this? The answer lies in that the micro symbol (and the accented French characters) fall into a range that *could* still be
ASCII, as far as Perl is concerned. What happens is that, in the lack of any explicit guidance, Perl makes a best guess as to whether a string to be outputted contains UTF-8 characters or not. In the case of the French and "micro" strings, it guessed wrong, and the characters were output as ASCII. In the case of the Japanese and "snowmen" strings, it still guessed wrong, even though the strings contained higher bytes that left no doubt that we had left ASCII-land and were exploring the land of Unicode. In other words, even though they were still not coming out as pure UTF-8, there is no direct ASCII equivalent so they appear as the characters one would expect. Note, however, that Perl still emits a wide character warning, for it recognizes that something is probably wrong. The warnings go away when we use binmode to force the encoding layer to :utf8

The correct solution when dealing with UTF-8 is to be explicit and not let Perl make any guesses. Solutions to this vary, but the combination here of adding use utf8; and binmode STDOUT, ':utf8';. While I was able to duplicate the problem right away, the combination of Perl making inconsistent guesses and the odd behavior of warn() and die() turned this from a quick fix into a slightly longer investigation. Yes, Unicode and Perl has given me quite a few gray hairs over the years, but I always feel better when I look at how *other* languages handle Unicode. :)

World IPv6 Launch: 6 June 2012

For any of our readers who don’t know: The classic Internet Protocol (IPv4) supported around 4 billion IP addresses, but it recently ran out of free addresses. That the addresses would eventually all be used was no surprise. For more than a decade, a replacement called IPv6 has been under development and allows practically unlimited addresses.

Last year there was a one-day trial run called World IPv6 Day. Our own Josh Williams wrote about it here. It was the first major attempt for mainstream websites to enable dual-stack IPv4 and IPv6 networking, so that both the “old” and “new” Internet could access the same site. It was intended to bring to the surface any problems, and it went very well – most people never knew it was happening, which was the goal.

World IPv6 Launch is 6 June 2012 – The Future is Forever This year there’s a much bigger event planned: World IPv6 Launch, and this time IPv6 is meant to stay on. Google, Facebook, Yahoo!, Bing, and many other major sites are participating. A big advance over last year’s event is that many ISPs and vendors of home networking gear are also participating. That means it won’t just be a test that classic IPv4 still works for people alongside IPv6, but that for some customers, native IPv6 starts working end to end.

Last year, Josh mused that it would have been most appropriate for the day to be June 6, 6-6, for IPv6 and sixes everywhere. I suspect that wasn’t done because it fell on a Monday in 2011, and there are enough new support complaints backlogged from the weekend without adding IPv6 to the mix! But this year it is on June 6.

We got our own website running on IPv6 in time for last year’s World IPv6 Day. A few months ago, we switched on IPv6 for and a few new customer sites as well. For this year’s event we plan to prepare most of our remaining internal infrastructure to be dual-stack and to be ready to enable IPv6 for any customer sites upon request.

What can you do to join in and help free the Internet of its address shortage? Try visiting to see how compatible your own current Internet access is. If your ISP doesn’t yet offer IPv6, ask them when they will. Demand drives support. And visit Hurricane Electric’s IPv6 Tunnel Broker to set up a free tunnel from your location to the IPv6 Internet. It’s a very nice service.

Have fun, and if all goes well, the beginning of widespread adoption of IPv6 starts on June 6!

Tracking down PostgreSQL XYZ error: tablespace, database, and relfilnode

One of our Postgres clients recently had this error show up in their logs:

ERROR: could not read block 3 of relation 1663/18421/31582:
read only 0 of 8192 bytes

Because we were using the tail_n_mail program, the above error was actually mailed to us within a minute of it occurring. The message is fairly cryptic, but it basically means that Postgres could not read data from a physical file that represented a table or index. This is generally caused by corruption or a missing file. In this case, the "read only 0 of 8192" indicates this was most likely a missing file.

When presented with an error like this, it's nice to be able to figure out which relation the message is referring to. The word "relation" is Postgres database-speak for a generic object in the database: in this case, it is almost certainly going to be a table or an index. Both of those are, of course, represented by actual files on disk, usually inside of your data_directory. The number given, 1663/18421/31582, is in the standard X/Y/Z format Postgres uses to identify a file, where X represents the tablespace, Y is the database, and Z is the file.

The first number, X, indicates which tablespace this relation belongs to. Tablespaces are physical directories mapped to internal names in the database. Their primary use is to allow you to put tables or indexes on different physical disks. The number here, 1663, is a very familiar one, as it almost always indicates the default tablespace, known as pg_default. If you do not create any additional tablespaces, everything will end up here. On disk, this will be the directory named base underneath your data_directory.

What if the relation you are tracking is not inside of the default tablespace? The number X represents the OID inside the pg_tablespace system table, which will let you know where the tablespace is physically located. To illustrate, let's create a new tablespace and then view the contents of the pg_tablespace table:

$ mkdir /tmp/pgtest
$ psql -c "CREATE TABLESPACE ttest LOCATION '/tmp/pgtest'"

 $ psql -c 'select oid, * from pg_tablespace'
  oid  |  spcname   | spcowner | spclocation | spcacl | spcoptions
1663   | pg_default |       10 |             |        |
1664   | pg_global  |       10 |             |        |
78289  | ttest      |       10 | /tmp/pgtest |        |

Thus, if X were 78289, it would lead us to the tablespace ttest, and we would know that the file we were ultimately looking for will be in the directory indicated by the spclocation column, /tmp/pgtest. If that column is blank, it means the directory to use is data_directory/base.

The second number in our X/Y/Z series, Y, indicates which database the relation belongs to. You can look this information up by querying the pg_database system table like so:

$ psql -xc 'select oid, * from pg_database where oid = 18421'
-[ RECORD 1 ]-+-----------
oid           | 18421
datname       | foobar
datdba        | 10
encoding      | 6
datcollate    | en_US.utf8
datctype      | en_US.utf8
datistemplate | f
datallowconn  | t
datconnlimit  | -1
datlastsysoid | 12795
datfrozenxid  | 1792
dattablespace | 1663
datacl        |

The columns may look different depending on your version of Postgres - the important thing here is that the number Y maps to a database via the oid column - in this case the database foobar. We need to know which database so we can query the correct pg_class table in the next step. We did not have to worry about that in until now as the pg_tablespace and pg_database tables are two of the very few shared system catalogs.

The final number in our X/Y/Z series, Z, represents a file on disk. You can look up which relation it is by querying the pg_class system table of the correct database:

$ psql -d foobar -c "select relname,relkind from pg_class where relfilenode=31582"
relname | relkind
(0 rows)

No rows, so as far as Postgres is concerned that file does not exist! Let's verify that this is the case by looking on the disk. Recall that X was the default tablespace, which means we start in data_directory/base. Once we are in that directory, we can look for the subdirectory holding the database we want (Y or 18421) - it is named after the OID of the database. We can then look for our relfilenode (Z or 31582) inside of that directory:

$ psql -c 'show data_directory'

(1 row)

$ cd /var/lib/pgsql/data
/var/lib/pgsql/data $ cd base
/var/lib/pgsql/data/base $ cd 18421
/var/lib/pgsql/data/base/18421 $ stat 31582
stat: cannot stat `31582': No such file or directory

So in this case, we confirmed that the relfilenode was no longer there! If it was there, we can probably surmise that the file on disk is corrupted somehow. If the relation was an index, the solution would be to simply run a REINDEX INDEX indexname on it, which will recreate the entire index with a new relfilenode. If it is a table, then things get trickier: we can try a VACUUM FULL on it, which rewrites the entire table, but you will most likely need to go back to your last SQL backup or take a look at your PITR (Point-In-Time Recovery) server.

So why would a relfilenode file not exist on disk? There are a few possibilities:

→ We are looking in the wrong pg_class table (i.e. user error). Each database has its own copy of the pg_class, with different relfilenodes. This means that each subdirectory corresponding to the database has its own set of files as well.

→ It may be a bug in Postgres. Unlikely, unless we have exhausted the other possibilities.

→ Bad RAM or a bad disk may have caused a flipped bit somewhere, for example changing the relfilenode from 12345 to 12340. Possible, but still unlikely.

→ The relfilenode file was removed by something. This is the most likely explanation. We've already hinted above at one way this could happen: a REINDEX. Since the client in this story was (is!) prudently running with log_statement = 'all', I was able to grep back through the logs and found that a REINDEX of a few system tables, including pg_depend, was kicked off a second before the error popped up. While it's impossible to know exactly what the missing relfilenode referred to, the REINDEX is as close to a smoking gun as we are going to get. So the query started, a REINDEX removed one of the indexes it was using, and then the error occurred as Postgres tried to access that index.

In this case, we were able to simply rerun the query and it worked as expected. In normal every day usage, this error should not appear, even when reindexing system tables, but should something like this happen to you, at least you will know what those numbers mean. :)

Django and Virtual Environments

When you have to work with a bunch of different Python applications, the usual problem is that you have to deal with plenty of different packages in different versions. Each application needs its own set of libraries. Usually the versions of the libraries vary between the applications.

To solve all the problems you could create Python virtual environments. There is a great tool: virtualenv. It can create virtual environments for Python. Using it is not too nice. However there is a wrapper to it, called virtualenvwrapper. It wraps all the virtualenv commands into a couple of shell commands.

Let's assume that I need to work on two applications written in Django 1.2 and Django 1.3. Each of the applications needs different set of packages in different versions. I will create two virtual environments.

Installing virtualenvwrapper on Ubuntu is pretty easy:

$ sudo apt-get install virtualenvwrapper

After the installation there is a couple of new commands. The basic one is: mkvirtualenv which creates a new environment. Let’s create one.

$ mkvirtualenv django_demo_12

This command automatically switches to the new environment, so you might notice that the prompt changed. The prompt always starts with the name of current virtual environment.

Let’s create another one, called django_demo_13 (to use Django 1.3 there).

(django_demo_12)$ mkvirtualenv django_demo_13

The list of environments is printed by the command workon, when called without arguments.

$ workon

As you can see, there are two environments ready to use. You can pass the name of the virtual environment as parameter to the workon command. Now let’s install Django 1.2 on the environment django_demo_12.

First of all switch to the new environment:

$ workon django_demo_12

Now the prompt changed, so you can always be sure which Python virtual environment you are using.


Now Django should be installed. There is a couple of ways to install it. The one I prefer is to create a text file with names and versions of all needed packages. This file will be named requirements.txt and will contain only this one line so far (other packages will be added later):


To install the packages listed in the file, I will use the command "pip install -r requirements.txt":

(django_demo_12)$ pip install -r requirements.txt
Downloading/unpacking Django==1.2.7 (from -r requirements.txt (line 1))
Downloading Django-1.2.7.tar.gz (6.4Mb): 6.4Mb downloaded
Running egg_info for package Django

Installing collected packages: Django
Running install for Django
changing mode of build/scripts-2.7/ from 664 to 775

changing mode of /home/szymon/.virtualenvs/django_demo_12/bin/ to 775
Successfully installed Django
Cleaning up...

Now I can check which Django version is installed:

(django_demo_12)$ --version

Now I will create a standard Django project:

(django_demo_12)$ startproject django_demo_12

The only additional thing here is to move the requirements.txt file info the Django project:

(django_demo_12)$ mv requirements.txt django_demo_12

To create application using Django 1.3 the steps are similar. The first thing is to switch to another virtual environment:

(django_demo_12)$ workon django_demo_13

From this moment it will be almost the same as in the previous environment, with the change that the requirements file should contain:


The commands are:

(django_demo_13)$ pip install -r requirements.txt
(django_demo_13)$ startproject django_demo_13
(django_demo_13)$ mv requirements.txt django_demo_13

So now there are two different Python environments, totally separated from each other. When I install something in one of them, it is not installed in the other, so I can have different packages for different Django versions.

The best way to install a package here is to update the requirements.txt file, and run the "pip install -r requirements.txt" once again. Later it will be easier to give the whole code to another programmer, who then could run the command on his computer and it will automatically install all needed packages (each in exactly needed version).

There is one simple command left. Sometimes you just want to remove the virtual environment from the path and use standard python libraries installed in the system. It can be done using this command:

(django_demo_13)$ deactivate

DevCamps setup with Ruby 1.9.3, rbenv, Nginx and Unicorn

I was working with Steph Skardal on the setup of a new DevCamps installation that was going to need to use Ruby 1.9.3, Rails 3, Unicorn and Nginx. This setup was going to be much different than a standard setup due to the different application stack that was required.

The first trick for this was going to get Ruby 1.9.3 on the server. We were using Debian Squeeze but that still only comes with Ruby 1.9.1. We wanted Ruby 1.9.3 for the increased overall speed and significant speed increase with Rails 3. We decided on using rbenv for this task. It's a very easy to setup utility that allows you to maintain multiple version of Ruby in your system user account without the headache of adjusting anything but the PATH environment variable. It takes advantage of another easy to setup utility called ruby build to handle the actual installation of the Ruby source code.

A quick and easy version for setting up a user with this is as follows:

Ensure you are in the home directory. Then, clone the repository into a .rbenv directory
git clone git:// .rbenv
Adjust your users path to find the newly installed commands
echo 'export PATH=$HOME/.rbenv/shims:$HOME/.rbenv/bin:$PATH' >> ~/.bash_profile
Install Ruby version 1.9.3-p0
rbenv install 1.9.3-p0
Make Ruby version 1.9.3-p0 your default version every time you log in
rbenv global 1.9.3-p0
Install the bundler gem for Ruby version 1.9.3-p0
gem install bundler
Refresh rbenv to let it know the new system command bundler exists
rbenv rehash

Now you are ready to use the bundler gem to install any other gems required for the application.

The normal camps setup assumes you are going to be using Apache for the web server. In this case, we wanted to use Nginx due to memory constraints. We decided to use the proxy capability and just proxy through to Unicorn instead of having to build our own version Nginx to use Passenger. To do this, we had to use a feature in the local-config file in camps that allows you to skip the Apache setup and use your own commands to start, stop and restart your web server and application. Here is the example from our local-config that controlls Nginx and Unicorn. This approach could also be used with Interchange or any other application if you need other services started when mkcamp is run.

httpd_start:/usr/sbin/nginx -c __CAMP_PATH__/nginx/nginx.conf
httpd_stop:pid=`cat __CAMP_PATH__/var/run/ 2>/dev/null` && kill $pid
httpd_restart:pid=`cat __CAMP_PATH__/var/run/ 2>/dev/null` && kill -HUP $pid || /usr/sbin/nginx -c __CAMP_PATH__/nginx/nginx.conf
app_stop:pid=`cat __CAMP_PATH__/var/run/ 2>/dev/null` && kill $pid
app_restart:pid=`cat __CAMP_PATH__/var/run/ 2>/dev/null` && kill $pid ; sleep 5 ;  __CAMP_PATH__/bin/start-app
The contents of the start-app script is simply.
cd __CAMP_PATH__ && bundle exec unicorn_rails -c __CAMP_PATH__/config/unicorn.conf.rb -D

You could create one script that handles all aspects of start, stop and restart if you wanted. This setup really wasn't much harder than a normal Ruby on Rails setup. The added time here required to set up rbenv per camp user is offset by the fact that users can manage and try multiple versions of ruby.

Lock up your keys

Locking hash keys with Hash::Util

It’s a given that you shouldn’t write Perl without “use strict”; it prevents all kinds of silent bugs involving misspelled and uninitialized variables. A similar aid for misspelled and uninitialized hash keys exists in the module “Hash::Util”.

By way of background: I was working on a long chunk of code that prepares an e-commerce order for storage in a database. Many of the incoming fields map directly to the table, but others do not. The interface between this code and the page which submits a large JSON structure was in flux for a while, so from time to time I had to chase bugs involving “missing” or “extra” fields. I settled on a restricted hash to help me squash these and future bugs.

The idea of a restricted hash is to clamp down on Perl’s rather loose “record” structure (by which I mean the common practice of using a hash to represent a record with named fields), which is great in some circumstances. While in most programming languages you must pre-declare a structure and live with it, in Perl hashes you can add new keys on the fly, misspellings and all. A restricted hash can only have a particular set of keys, but is still a hash for all other purposes.

An example:

my %hash = (aaa => 1, bbb => 2);

Attempts to reference $hash{ccc} will not return an error, but only an undefined value. We can now lock the hash so that its current roster of keys will be constant:

    use Hash::Util qw(lock_keys);

and now $hash{ccc} is not only undefined, it’s a run-time error:

    Attempt to access disallowed key 'ccc' in a restricted hash

If we know the list of keys before the hash is initialized, we can set it up like this:

    my %hash;
    lock_keys(%hash, qw(aaa bbb ccc));

Keep in mind the values of $hash{aaa}, etc. are mutable (can be undefined, not exist, scalars, references, etc.), just like a normal hash.

What if our key roster needs to change over the course of the program? In my example, there were several kinds of transactions being sent via JSON, and I needed to validate and restrict fields based on the presence and values of other fields. E.g.,

    if ($hash{record_type} eq 'A') {
        # validate %hash for aaa, bbb, ccc
    else {
        # validate %hash for aaa, bbb, ddd; ccc should not appear

You can add to or modify the accepted keys as you go, but it’s a two-step process: not even Hash::Util can modify the keys of a locked hash, so you have to unlock and re-lock:

    my %hash;
    lock_keys(%hash, qw(record_type aaa bbb ccc));
    # …
    if ($hash{record_type} eq 'A') {
        lock_keys(%hash, qw(record_type aaa bbb ccc));
    else {
        lock_keys(%hash, qw(record_type aaa bbb ddd));

Of course, that’s kind of wordy: we’d really rather just splice in a key here and there. Hash::Util has you covered, because you can retrieve the list of legal keys for a hash (even if it’s not currently locked):

    lock_keys_plus(%hash, qw(ddd));

adds ‘ddd’ to the list, keeping the previous keys as well. However, if any of the legal keys are not current keys, they won’t make it into the key roster. Instead, use:

    lock_keys_plus(%hash, (legal_keys(%hash), qw(more keys here)));

Everything shown here for hashes is also available for hashrefs: for instance, to lock up a hashref $hr:

    lock_ref_keys_plus($hr, (legal_ref_keys($hr), qw(other keys)));

Of course, adding all this locking and unlocking adds complexity to your code, so you should consider carefully whether it’s justified. In my case I had 60+ keys, in a nested structure, spanning 1500 lines of code – I just could not keep all the correct spellings in my head any more, so now when I write

    if ($opt->{order_status})

when I mean “transaction_status”, I’ll get a helpful run-time error instead of a silent skip of that block of code.

Are there other approaches? Yes, depending on your needs: JSON::Schema, for instance, will let you validate a JSON structure against a “golden master”. However, it does not prevent subsequent assignments to the structure, creating new keys on the fly (possibly in error). Moose would support a restricted object like this, but may add more complexity than you need, so Hash::Util may be the appropriate, lighter-weight approach.

Download Functionality for Rails Ecommerce

I recently had to build out downloadable product support for a client project running on Piggybak (a Ruby on Rails Ecommerce engine) with extensive use of RailsAdmin. Piggybak's core functionality does not support downloadable products, but it was not difficult to extend. Here are some steps I went through to add this functionality. While the code examples apply specifically to a Ruby on Rails application using paperclip for managing attachments, the general steps here would apply across languages and frameworks.

Data Migration

Piggybak is a pluggable ecommerce engine. To make any models inside your application "sellable", the class method acts_as_variant must be called for any class. This provides a nice flexibility in defining various sellable models throughout the application. Given that I will sell tracks in this example, my first step to supporting downloadable content is adding an is_downloadable boolean and attached file fields to the migration for a sellable item. The migration looks like this:

class CreateTracks < ActiveRecord::Migration
  def change
    create_table :tracks do |t|
      # a bunch of fields specific to tracks

      t.boolean :is_downloadable, :nil => false, :default => false

      t.string :downloadable_file_name
      t.string :downloadable_content_type
      t.string :downloadable_file_size
      t.string :downloadable_updated_at

Class Definitions

Next, I update my class definition to make tracks sellable and hook in paperclip functionality:

class Track < ActiveRecord::Base

  has_attached_file :downloadable,
                    :path => ":rails_root/downloads/:id/:basename.:extension",
                    :url => "downloads/:id/:basename.:extension"

The important thing to note here is that the attached downloadable files must not be stored in the public root. Why? Because we don't want users to access the files via a URL through the public root. Downloadable files will be served via the send_file call, discussed below.


Piggybak's order model has_many shipments. In the case of an order that contains only downloadables, shipments can be empty. To accomplish this, I extend the Piggybak::Cart model using ActiveSupport::Concern to check whether or not an order is downloadable, with the following instance method:

module CartDecorator
  extend ActiveSupport::Concern

  module InstanceMethods
    def is_downloadable?
      items = self.items.collect { |li| li[:variant].item }
      items.all? { |i| i.is_downloadable }

Piggybak::Cart.send(:include, CartDecorator)

If all of the cart items are downloadable, the order is considered downloadable and no shipment is generated for this order. With this cart method, I show the FREE! value on the checkout page under shipping methods.

Forcing Log In

The next step for adding downloadable support is to add code to enforce user log in. In this particular project, I assume that downloads are not included as attachments in files since the files may be extremely large. I add a has_downloadable method used to enforce log in:

module CartDecorator
  extend ActiveSupport::Concern

  module InstanceMethods

    def has_downloadable?
      items = self.items.collect { |li| li[:variant].item }
      items.any? { |i| i.is_downloadable }

Piggybak::Cart.send(:include, CartDecorator)

On the checkout page, a user is forced to log in if cart.has_downloadable?. After log in, the user bounces back to the checkout page.

Download List Page

After a user has purchased downloadable products, they'll need a way to access these files. Next, I create a downloads page which lists orders and their downloads:

With a user instance method (current_user.downloads_by_order), the download index page iterates through orders with downloads to display orders and their downloads. The user method for generating orders and downloads shown here:

class User < ActiveRecord::Base
  def downloads_by_order
    self.piggybak_orders.inject([]) do |arr, order|
      downloads = []
      order.line_items.each do |line_item|
        downloads << line_item.variant.item if line_item.variant.item.is_downloadable?

      arr << {
          :order => order,
          :downloads => downloads
      } if downloads.any?

The above method would be a good candidate for Rails low-level caching or alternative caching which should be cleared after user purchases to minimize download lookup.

Sending Files

As I mentioned above, download files should not be stored in the public directory for public accessibility. From the download list page, the "Download Now" link maps to the following method in the downloads controller:

class DownloadsController < ApplicationController
  def show
    item = ProductType.find(params[:id])

    if current_user.downloads.include?(item)
      send_file "#{Rails.root}/#{item.downloadable.url(:default, false)}"
      redirect_to(root_url, :notice => "You do not have access to this content.")

Note that there is additional verification here to check if the current user's downloads includes the download requested. The .url(:default, false) bit hides paperclip's cache buster (e.g. "?123456789") from the url in order to send the file.


This straightforward code accomplished the major updates required for download support: storing and sending the file, enforcing login, and handling shipping. In some cases, download support functionality may be more advanced, but the elements described here make up the most basic building blocks.

If you are interested in this project, check out these related articles:

Or read more about End Point's web development and consulting services!

RailsAdmin Import: Part 2

I recently wrote about importing data in RailsAdmin. RailsAdmin is a Rails engine that provides a nice admin interface for managing your data, which comes packed with configuration options.

In a recent Ruby on Rails ecommerce project, I've been using RailsAdmin, Piggybak (a Rails ecommerce gem supported by End Point), and have been building out custom front-end features such as advanced search and downloadable product support. When this client came to End Point with the project, we offered several options for handling data migration from a legacy system to the new Rails application:

  1. Create a standard migration file, which migrates data from the existing legacy database to the new data architecture. The advantage with this method is that it requires virtually no manual interaction for the migration process. The disadvantage with this is that it's basically a one-off solution and would never be useful again.
  2. Have the client manually enter data. This was a reasonable solution for several of the models that required 10 or less entries, but not feasible for the tables containing thousands of entries.
  3. Develop import functionality to plug into RailsAdmin which imports from CSV files. The advantage to this method is that it could be reused in the future. The disadvantage with ths method is that data exported from the legacy system would have to be cleaned up and formatted for import.

The client preferred option #3. Using a quick script for generating custom actions for RailsAdmin, I developed a new gem called rails_admin_import to handle import that could be plugged into RailsAdmin. Below are some technical details on the generic import solution.


Using ActiveSupport::Concern, the rails_admin_import gem extends ActiveRecord::Base to add the following class methods:

  • import_fields: Returns an array of fields that will be included in the import, excluding :id, :created_at, and :updated_at, belongs_to fields, and file fields.
  • belongs_to_fields: Returns an array of fields with belongs_to relationships to other models.
  • many_to_many_fields: Returns an array of fields with has_and_belongs_to_many relationships to other models.
  • file_fields: Returns an array of fields that represent data for Paperclip attached files.
  • run_import: Method for running the actual import, receives request params.

And the following instance methods:

  • import_files: sets attached files for object
  • import_belongs_to_data: sets belongs_to associated data for object
  • import_many_to_many_data: sets many_to_many associated data for object

The general approach here is that the import of files, belongs_to, many_to_many relationships, and standard fields makes up the import process for a single object. The run_import method collects success and failure messages for each object import attempt and those results are presented to the user. A regular ActiveRecord save method is called on the object, so the existing validation of objects during each save applies.

Working with Associated Data

One of the tricky parts here is how to handle import of fields representing associations. Given a user model that belongs to a state, country, and has many roles, how would one decide what state, country, or role value to include in the import?

I've solved this by including a dropdown to select the attribute used for mapping in the form. Each of the dropdowns contains a list of model attributes that are used for association mapping. A user can then select the associated mappings when they upload a file. In a real-life situation, I may import the state data via abbreviation, country via display name (e.g. "United States", "Canada") and role via the role name (e.g. "admin"). My data import file might look like this:

name email favorite_color state country role
Steph Skardal blue CO United States admin
Aleks Skardal green Norway user
Roger Skardal tennis ball yellow UT United States dog
Milton Skardal kibble brown UT United States dog

Many to Many Relationships

Many to many relationships are handled by allowing multiple columns in the CSV to correspond to the imported data. For example, there may be two columns for role on the user import, where users may be assigned to multiple roles. This may not be suitable for data with a large number of many to many assignments.

Import of File Fields

In this scenario, I've chosen to use open-uri to request existing files from a URL. The CSV must contain the URL for that file to be imported. The import process downloads the file and attaches it to the imported object.

self.class.file_fields.each do |key|
  if map[key] && !row[map[key]].nil?
      row[map[key]] = row[map[key]].gsub(/\s+/, "")
      format = row[map[key]].match(/[a-z0-9]+$/)
      open("#{Rails.root}/tmp/uploads/#{self.permalink}.#{format}", 'wb') { |file| file << open(row[map[key]]).read }
    rescue Exception => e
      self.errors.add(:base, "Import error: #{e.inspect}")

If the file request fails, an error is added to the object and presented to the user. This method may not be suitable for handling files that do not currently exist on a web server, but it was suitable for migrating a legacy application.

Configuration: Display

Following RailsAdmin's example for setting configurations, I've added the ability to allow the import display to be set for each model.

config.model User do
  label :name

The above configuration will yield success and error messages with the, e.g.:

Configuration: Excluded Fields

In addition to allowing a configurable display option, I've added the configuration for excluding fields.

config.model User do
  excluded_fields do
    [:reset_password_token, :reset_password_sent_at, :remember_created_at,
      :sign_in_count, :current_sign_in_at, :last_sign_in_at, :current_sign_in_ip,

The above configuration will exclude the specified fields during the import, and they will not display on the import page.

Configuration: Additional Fields and Additional Processing

Another piece of functionality that I found necessary for various imports was to hook in additional import functionality. Any model can have an instance method before_import_save that accepts the row of CSV data and map of CSV keys to perform additional tasks. For example:

def before_import_save(row, map)
  self.created_nested_items(row, map)

The above method will create nested items during the import process. This simple extensibility allows for additional data to be handled upon import outside the realm of has_and_belongs_to and belongs_to relationships.

Fields for additional nested data can be defined with the extra_fields configuration, and are shown on the import page.

config.model User do
  extra_fields do
    [:field1, :field2, :field3, :field4]

Hooking into RailsAdmin

As I mentioned above, I used a script to generate this Engine. Using RailsAdmin configurable actions, import must be added as an action:

config.actions do

And CanCan settings must be updated to allow for import if applicable, e.g.:

cannot :import, :all
can :import, User


My goal in developing this tool was to produce reusable functionality that could easily be applied to multiple models with different import needs, and to use this tool across Rails applications. I've already used this gem in another Rails 3.1 project to quickly import data that would otherwise be difficult to deal with manually. The combination of association mapping and configurability produces a flexibility that encourages reusability.

Feel free to review or check out the code here, or read more about End Point's services here.