Welcome to End Point’s blog

Ongoing observations by End Point people

Keep the Aisles Clean at Checkout

It's no mystery in ecommerce that checkout processing must flow smoothly for an effective store. Providing products or services in high demand doesn't mean much if they cannot be purchased, or the purchase process is so burdensome that would-be customers give up in frustration.

Unfortunately, checkout also tends to include the most volatile elements of a web store. It virtually always involves database writes, which can be hindered by locking. It often involves real-time network access to 3rd-party providers, with payment transactions being at the top of the list. It can involve complex inventory assessments, where high concurrency can make what's normally routine highly unpredictable. Meanwhile, your customers wait, while the app sifts through complexity and waits on responses from various services. If they wait too long, you might lose sales; even worse, you might lose customers.

Even armed with the above knowledge, it's all too easy to fall into the trap of expediency. A particular action is so logically suited to be included as part of the checkout routine, and a superficial evaluation makes it seem like such a low-risk operation. That action can be tucked in there just after we've passed all the hurdles and are assured the order won't be rejected--why, it'll be so simple, and all the data we need for the action are readily at hand.

Just such expediency was at the heart of a checkout problem that had been plaguing an Interchange client of ours for months. The client would receive regular complaints that checkouts were timing out or taking so long that the customer was reloading and trying again. Many times, these customers would come to find that their orders had been placed, but that the time to complete them was exceeding the web server's timeout (or their patience). In far less common instances, but still occurring regularly, log and transaction evidence existed that showed an order attempt produced a valid payment transaction, but there was no hint of the order in their database or even in the application's system logs.

In the latter case of behavior, I had seen this before for other clients. If an action within order routing takes long enough, the Interchange server handling the request will be hammered by housekeeping. The telltale sign is the lack of log evidence for the attempt since order routes are logged at the end of the route's run; when that's interrupted, then no logging occurs.

I added considerably more explicit real-time logging and picked off some of the low-hanging fruit--code practices that had often been implicated before as the culprit in these circumstances. After collecting enough data for problematic order attempts, I was able to isolate the volatility to mail-list maintenance. The client utilizes a 3rd-party provider for managing their various mail lists, and that provider's API was contacted during order routing with all the data the provider needed for managing said lists. The data transfer for the API was very simple, and in most cases would process in sub-second time. Unfortunately, it turned out that, in enough cases, the calls to the API would take 10s to even 100s of seconds to process.

The placement of maintaining mail lists within order routing was merely convenience. The success or failure of adding to the mail lists was insignificant compared to the success or failure of the order itself. Once identified, the API calls were moved into a post-order processing routine, which was specifically built to anticipate the demonstrated volatility. As a result, complaints from customers on long or timed-out checkouts have dwindled to near zero, and the mail-list maintenance is more reliable since the background process is designed to catch excessively long process calls and retry until we receive an affirmative response from the list maintainers.

When deciding what belongs within checkout processing, ideally limit that activity to only those actions absolutely imperative to the success of the order. For each piece of functionality, ask yourself (or your client): is the outcome of this action worth adding to the wait a customer experiences placing an order? Should the outcome of this action affect whether the order attempt is successful? If the answer to those questions is "no", account for that action outside of checkout. It may be more work to do so, but keeping the checkout aisles clean, without obstruction, should be paramount.

Spree on Rails 3: Part Two

Yesterday, I discussed my experiences on getting Rails 3 based Spree up and running. I've explained in several blog articles (here and here) that customizing Spree through extensions will produce the most maintainable code – it is not recommended to work directly with source code and make changes to core classes or views. Working through extension development was one of my primary goals after getting Spree up and running.

To create an extension named "foo", I ran rails g spree:extension foo. Similar to pre-Rails 3.0 Spree, a foo directory is created (albeit inside the sandbox/) directory as a Rails Engine. The generator appends the foo directory details to the sandbox/ Gemfile. Without the Gemfile update, the rails project won't include the new foo extension directory (and encompassed functionality). I reviewed the extension directory structure and files and found that foo/lib/foo.rb was similar to the the *_extension.rb file.

require 'spree_core'

module Foo
  class Engine < Rails::Engine

    config.autoload_paths += %W(#{config.root}/lib)

    def self.activate
      # Activation logic goes here.  
      # A good use for this is performing
      # class_eval on classes that are defined
      # outside of the extension 
      # (so that monkey patches are not 
      # lost on subsequent requests in 
      # development mode.)

    config.to_prepare &method(:activate).to_proc
class FooExtension < Spree::Extension
  version "1.0"
  description "Describe your extension here"
  url ""

  def activate
    # custom application functionality here

I verified that the activate method was called in my extension with the following change:

require 'spree_core'

module Foo
  class Engine < Rails::Engine

    config.autoload_paths += %W(#{config.root}/lib)

    def self.activate
      Spree::BaseController.class_eval do
        logger.warn "inside base controller class eval"

    config.to_prepare &method(:activate).to_proc

From here, The Spree Documentation on Extensions provides insight on further extension development. As I began to update an older extension, I ensured that my_extension/lib/my_extension.rb had all the necessary includes in the activate method and I copied over controller and library files to their new locations.

One issue that I came across was that migrations are not run with rake db:migrate and the public assets are not copied to the main project public directory on server restart. The documentation recommends building the migration within the application root (sandbox/), but this is not ideal to maintain modularity of extensions – each extension must include all of its migration files. To work-around this, it was recommended to copy over the install rake tasks from one of the core gems that copies migrations and public assets:

namespace :foo do
  desc "Copies all migrations and assets (NOTE: This will be obsolete with Rails 3.1)"
  task :install do

  namespace :install do

    desc "Copies all migrations (NOTE: This will be obsolete with Rails 3.1)"
    task :migrations do
      source = File.join(File.dirname(__FILE__), '..', '..', 'db')
      destination = File.join(Rails.root, 'db')
      puts "INFO: Mirroring assets from #{source} to #{destination}"
      Spree::FileUtilz.mirror_files(source, destination)

    desc "Copies all assets (NOTE: This will be obsolete with Rails 3.1)"
    task :assets do
      source = File.join(File.dirname(__FILE__), '..', '..', 'public')
      destination = File.join(Rails.root, 'public')
      puts "INFO: Mirroring assets from #{source} to #{destination}"
      Spree::FileUtilz.mirror_files(source, destination)


After creating the extension based migration files and creating the above rake tasks, one would run the following from the application (sandbox/) directory:

steph@machine:/var/www/spree/sandbox$ rake foo:install
(in /var/www/spree/sandbox)
INFO: Mirroring assets from /var/www/spree/sandbox/foo/lib/tasks/../../db to /var/www/spree/sandbox/db
INFO: Mirroring assets from /var/www/spree/sandbox/foo/lib/tasks/../../public to /var/www/spree/sandbox/public

steph@machine:/var/www/spree/sandbox$ rake db:migrate
(in /var/www/spree/sandbox)
# migrations run

Some quick examples of differences in projeect setup and extension generation between Rails 3.* and Rails 2.*:

#clone project
#bundle install
rake sandbox
rails server
rails g spree:extension foo
rails g migration FooThing
#clone project into "sandbox/"
rake db:bootstrap
script/generate extension Foo
script/generate extension_model Foo thing name:string start:date

Some of my takeaway comments after going through these exercises:

If there's anything I might want to learn about to work with edge Spree, it's Rails Engines. When you run Spree from source and use extensions, the architecture includes several layers of stacked Rails Engines:

Layers of Rails Engines in Spree with extensions.

After some quick googling, I found two helpful articles on Engines in Rails 3 here and here. The Spree API has been inconsistent until now - hopefully the introduction of Rails Engine will force the API to become more consistent which may improve the extension community.

I didn't notice much deviation of controllers, models, or views from previous versions of Spree, except for massive reorganization. Theme support (including Spree hooks) is still present in the core. Authorization in Spree still uses authlogic, but I heard rumors of moving to devise eventually. The spree_dash (admin dashboard) gem still is fairly lightweight and doesn't contain much functionality. Two fairly large code changes I noticed were:

  • The checkout state machine has been merged into order and the checkout model will be eliminated in the future.
  • The spree_promo gem has a decent amount of new functionality.

Browsing through the spree-user Google Group might reveal that there are still several kinks that need to be worked out on edge Spree. After these issues are worked out and the documentation on edge Spree is more complete, I will be more confident in making a recommendation to develop on Rails 3 based Spree.

Spree on Rails 3: Part One

A couple of weeks ago, I jumped into development on Spree on Rails 3. Spree is an open source Ruby on Rails ecommerce platform. End Point has been involved in Spree since its inception in 2008, and we continue to develop on Spree with a growing number of clients. Spree began to transition to Rails 3 several months ago. The most recent stable version of Spree (0.11.2) runs on Rails 2.*, but the edge code runs on Rails 3. My personal involvement of Rails 3 based Spree began recently; I waited to look at edge Spree until Rails 3 had a bit of momentum and until Rails 3 based Spree had more documentation and stability. My motivation for looking at it now was to determine whether End Point can recommend Rails 3 based Spree to clients and to share insight to my coworkers and other members of the Spree community.

First, I looked at the messy list of gems that have built up on my local machine throughout development of various Rails and Spree projects. I found this simple little script to remove all my old gems:


GEMS=`gem list --no-versions`
for x in $GEMS; do sudo gem uninstall $x --ignore-dependencies -a; done

Then, I ran gem install rails to install Rails 3 and dependencies. The following gems were installed:

abstract (1.0.0)
actionmailer (3.0.1)
actionpack (3.0.1)
activemodel (3.0.1)
activerecord (3.0.1)
activeresource (3.0.1)
activesupport (3.0.1)
arel (1.0.1)
builder (2.1.2)
bundler (1.0.2)
erubis (2.6.6)
i18n (0.4.1)
mail (2.2.7)
mime-types (1.16)
polyglot (0.3.1)
rack (1.2.1)
rack-mount (0.6.13)
rack-test (0.5.6)
rails (3.0.1)
railties (3.0.1)
rake (0.8.7)
thor (0.14.3)
treetop (1.4.8)
tzinfo (0.3.23)

Next, I cloned the Spree edge with the following command from here:

git clone

In most cases, developers will run Spree from the gem and not the source code (see the documentation for more details). In my case, I wanted to review the source code and identify changes. You might notice that the new spree core directory doesn't look much like the old one, which can be explained by the following: the Spree core code has been broken down into 6 separate core gems (api, auth, core, dash, promo, sample) that run as Rails Engines.

After checking out the source code, the first new task to run with edge Spree was bundle install. The bundler gem is intalled by default in Rails 3. It works out of the box in Rails 3, and can work in Rails 2.3 with additional file and configuration changes. Bundler is a dependency management tool. Gemfile and Gemfile.lock in the Spree core specify which gems are required for the application. Several gems were installed with Spree's bundler configuration, including:

Installing webrat (0.7.2.beta.1) 
Installing rspec-rails (2.0.0.beta.19) 
Installing ruby-debug-base (0.10.3) with native extensions 
Installing ruby-debug (0.10.3) 
Installing state_machine (0.9.4) 
Installing stringex (1.1.0) 
Installing will_paginate (3.0.pre2) 
Using spree_core (0.30.0.beta2) from source at /var/www/spree 
Using spree_api (0.30.0.beta2) from source at /var/www/spree 
Using spree_auth (0.30.0.beta2) from source at /var/www/spree 
Using spree_dash (0.30.0.beta2) from source at /var/www/spree 
Using spree_promo (0.30.0.beta2) from source at /var/www/spree
Using spree_sample (0.30.0.beta2) from source at /var/www/spree

The only snag I hit during bundle install was that the nokogiri gem required two dependencies be installed on my machine (libxslt-dev and libxml2-dev).

To create a project and run all the necessary setup, I ran rake sandbox, which completed the tasks listed below. The tasks created a new project, completed the basic gem setup, installed sample data and images, and ran the sample data bootstrap migration. In some cases, Spree sample data will not be used – the latter two steps can be skipped. The sandbox/ application directory contained a directory of folders that one might expect when developing in Rails (app, db, lib, etc.) and sandbox/ itself runs as a Rails Engine.

steph@machine:/var/www/spree$ rake sandbox
(in /var/www/spree)
         run  rails new sandbox -GJT from "."
      append  sandbox/Gemfile
         run  rails g spree:site -f from "./sandbox"
         run  rake spree:install from "./sandbox"
         run  rake spree_sample:install from "./sandbox"
         run  rake db:bootstrap AUTO_ACCEPT=true from "./sandbox"

After setup, I ran rails server, the new command for starting a server in Rails 3.*, and verified my site was up and running.

Hooray - it's up!

There wasn't much to getting a Rails 3 application up and running locally. I removed all my old gems, installed Rails 3, grabbed the repository, allowed bundler to install dependencies and worked through one snag. Then, I ran my Spree specific rake task to setup the project and started the server. Tomorrow, I share my experiences on extension development in Rails 3 based Spree.

check_postgres meets pgbouncer

Recently the already well-known PostgreSQL monitoring tool check_postgres gained an ability to monitor pgbouncer, the PostgreSQL connection pooling daemon more closely. Previously check_postgres could verify pgbouncer was correctly proxying connections, and make sure its settings hadn't been modified. The pgbouncer administrative console, reports many useful pgbouncer statistics and metrics; now check_postgres can monitor some of those as well.

pgbouncer's description of its pools consists of "client" elements and "server" elements. "Client" refers to connections coming from clients, and "server" to connections to the PostgreSQL server. The new check_postgres actions pay attention only to the pgbouncer "SHOW POOLS" command, which provides the following metrics:

  • cl_active: Connections from clients which are associated with a PostgreSQL connection. Use the pgb_pool_cl_active action.
  • cl_waiting: Connections from clients that are waiting for a PostgreSQL connection to service them. Use the pgb_pool_cl_waiting action.
  • sv_active: Connections to PostgreSQL that are in use by a client connection. Use the pgb_pool_sv_active action.
  • sv_idle: Connections to PostgreSQL that are idle, ready to service a new client connection. Use the pgb_pool_sv_idle action.
  • sv_used: PostgreSQL connections recently released from a client session. Use the pgb_pool_sv_used action.
  • sv_tested: PostgreSQL connections in process of being tested. Use the pgb_pool_sv_tested action.
  • sv_login: PostgreSQL connections currently logging in. Use the pgb_pool_sv_login action.
  • maxwait: The length of time the oldest waiting client has been waiting for a connection. Use the pgb_pool_maxwait action.

Most installations probably don't want any client connections stuck waiting for PostgreSQL connections to service them, meaning the cl_waiting and maxwait metrics ought to be zero. This example will check those two metrics and complain when they're nonzero, for a pgbouncer installation on port 5433 with pools "pgbouncer" and "maindb":

postgres@db:~$ ./ --action=pgb_pool_cl_waiting -p 5433 -w 3 -c 8
POSTGRES_PGB_POOL_CL_WAITING OK: (port=5433) pgbouncer=0 * maindb=0 | time=0.01 time=0.01

postgres@db:~$ ./ --action=pgb_pool_maxwait -p 5433 -w 5 -c 15 
POSTGRES_PGB_POOL_MAXWAIT OK: (port=5433) pgbouncer=0 * maindb=0 | time=0.01 time=0.01

The typical check_postgres filtering rules will work; to filter out a pool called "ignore_this_pool", for instance, add --exclude ignore_this_pool to the command line. Other connection options mean exactly what they would when connection to PostgreSQL directly.

These new actions are available in the latest version from git.

Youth Debate and other client news

I want to draw attention to several of our clients who have been in the news lately:

The Youth Debate 2010 site is live and currently accepting question submissions from youth interested in hearing video responses from the DNC & RNC chairmen before the November midterm elections. It's a simple site, developed and deployed quickly with an eye toward handling a very high amount traffic over a short period of time. We're excited to see what questions and answers come out of the project.

Jared Loftus, entrepreneur and owner of The College District, was profiled in a recent article about his business. We've written about Jared's business and some of the technical details underpinning his sites in past blog posts, including one upon launch of 4 additional sites and one comparing College District multi-site architecture to Spree.

Our client is a well-known retail modular carpet seller and a division of the public company Interface, Inc. We've been pleased to work with them to add new features to and support the operations of their ecommerce system for the past 3 years. Interface's founder, Ray Anderson, has been on a mission to reduce negative environmental impact made during their manufacturing process. He published a book, Confessions of a Radical Industrialist, and has been speaking about it and opening eyes to the possibilities for improvement.

We're always happy to see our clients doing interesting things and getting the attention they deserve!

Cross Browser Development: A Few CSS and JS Issues

Coding cross browser friendly JavaScript and CSS got you down? In a recent project, Ron, David, and I worked through some painful cross browser issues. Ron noted that he even banged his head against the wall over a couple of them :) Three of these issues come up frequently in my other projects full of CSS and JS development, so I wanted to share.

Variable Declaration in JS

In several cases, I noticed that excluding variable declaration ("var") resulted in broken JavaScript-based functionality in IE only. I typically include variable declaration when I'm writing JavaScript. In our project, we were working with legacy code and conflicting variable names may have be introduced, resulting in broken functionality. Examples of before and after:

Bad Better
var display_cart_popup = function() {
    popup_id = '#addNewCartbox';
    left = (parseInt($(window).width()) - 772) / 2;
var display_cart_popup = function() {
    var popup_id = '#addNewCartbox';
    var left = (parseInt($(window).width()) - 772) / 2;
address_display = '';

country = $(type+'_country').value;
address = $(type+'_address').value;
address2 = $(type+'_address2').value;
city = $(type+'_city').value;
state = $(type+'_state').value;
zip = $(type+'_zip').value;
var address_display = '';

var country = $(type+'_country').value;
var address = $(type+'_address').value;
var address2 = $(type+'_address2').value;
var city = $(type+'_city').value;
var state = $(type+'_state').value;
var zip = $(type+'_zip').value;

I researched this to gain more insight, but I didn't find much except a reiteration that when you create variables without the "var" declaration, they become global variables which may have resulted in conflicts. However, all the "learning JavaScript" documentation I browsed through includes variable declaration and there's no reason to leave it out for these lexically scoped variables.

Trailing Commas in JSON objects

According to JSON specifications, trailing commas are not permitted (e.g obj = { "1" : 2, }). From my experience, JSON objects with trailing commas might work in Firefox and WebKit browsers, but it dies silently in IE. Some recent examples:

Bad Better

//JSON response from an ajax call
// if $add_taxes is not true, the carttotal element will be the last element of the list and it will end with a comma

  "response_message"    : '<?= $response_message ?>',
  "subtotal"            : <?= $subtotal ?>, 
  "shipping_cost"       : <?= $shipping ?>, 
  "carttotal"           : <?= $carttotal ?>, 
<?php if($add_taxes) { ?>
  "taxes"               : <?= $taxes ?>
<?php } ?>

//JSON response from an ajax call
//No matter the value of $add_taxes, the carttotal element is the last element and it does not end in a comma

  "response_message"    : '<?= $response_message ?>',
  "subtotal"            : <?= $subtotal ?>, 
  "shipping_cost"       : <?= $shipping ?>,  
<?php if($add_taxes) { ?>
  "taxes"               : <?= $taxes ?>,
<?php } ?>
  "carttotal"           : <?= $carttotal ?>

//Page load JSON object defined
//Last element in array will end in a comma

var fonts = {
[loop list=`$Scratch->{fonts}`]
    '[loop-param name]' : {
      'bold' : "[loop-param bold]",
      'italic' : "[loop-param italic]"

//Page load JSON object defined
//A dummy object is appended to the fonts JSON object
//Additional logic is added elsewhere to determine if the object is a "dummy" or not

var fonts = {
[loop list=`$Scratch->{fonts}`]
    '[loop-param name]' : {
      'bold' : "[loop-param bold]",
      'italic' : "[loop-param italic]"
    'dummy' : {}

Additional solutions to avoid the trailing comma include using join (Perl, Ruby) or implode (PHP), conditionally excluding the comma on the last element of the array, or using library methods to serialize data to JSON.

Floating Elements in IE

Often times, you'll get a design like the one shown below. There will be a static width and repeating components to span the entire width. You may programmatically determine how many repeating elements will be displayed, but using CSS floating elements yields the cleanest code.

Example of a given design with repeating elements to span a static width.

You start working in Chrome or Firefox and apply the following CSS rules:

CSS rules for repeating floating elements.

When you think you're finished, you load the page in IE and see the following. Bummer!

Floating elements wrap incorrectly in IE.

This is a pretty common scenario. In IE, if the combined widths of consecutive floating elements is greater than or equal to 100% of the available width, the latter floating element will jump down based on the IE float model. Instead of using floating elements, you might consider using tables or CSS position rules, but my preference is to use tables only for elements that need vertical align settings and to stay away from absolute positioning completely. And I try to stay away from absolute positioning in general.

The simplest and minimalist change I've found to work can be described in a few steps. Let's say your floating elements are <div>'s inside a <div> with an id of "products":

<div id="products">
  <div class="product">product 1</div>
  <div class="product">product 2</div>
  <div class="product" class="last">product 3</div>
  <div class="product">product 4</div>
  <div class="product">product 5</div>
  <div class="product" class="last">product 6</div>

And let's assume we have the following CSS:

div#products { width: 960px; }
div.product { float: left; width: 310px; margin-right: 15px; height: 100px; }
div.last { margin-right: 0px; }

Complete these steps:

  • First, add another div to wrap around the #products div, with an id of "outer_products"
  • Next, update the 'div#products' width to be greater than 960 pixels by several pixels.
  • Next, add a style rule for 'div#outer_products' to have a width of "960px" and overflow equal to "hidden".


<div id="outer_products">
  <div id="products">
    <div class="product">product 1</div>
    <div class="product">product 2</div>
    <div class="product" class="last">product 3</div>
    <div class="product">product 4</div>
    <div class="product">product 5</div>
    <div class="product" class="last">product 6</div>


div#outer_products { width: 960px; overflow: hidden; }
div#products { width: 980px; }
div.product { float: left; width: 310px; margin-right: 15px; height: 100px; }
div.last { margin-right: 0px; }

The solution is essentially creating a "display window" (outer_products), where overflow is hidden, but the contents are allowed to span a greater width in the inside <div> (products).

The white border outlines the outer_products "display window".

Some other issues that I see less frequently include the double-margin IE6 bug, chaining CSS in IE, and using '#' vs. 'javascript:void(0);'.

Simple audio playback with Yahoo Mediaplayer

Recently I had need to show a list of MP3 files with a click-to-play interface.

I came upon a very simple self-contained audio player:

<script type="text/javascript" src=""></script>

The code to set up my links for playing was dirt-simple:

<script type="text/javascript">
var player = document.getElementById('player');
function add_to_player() {
    var link = this;
    player.src.replace(/audioUrl=.*/,'audioUrl=' + link.src);
    return false;
var links = document.getElementsByTagName('A');
for (var i = 0; i < links.length; i++) {
    if (links[i].src.match(/\.mp3$/)) {
        links.onclick = add_to_player;

You could use various ways to identify the links to be player-ized, but I chose to just associate the links with a class, "mp3":

<a class="mp3" href="/path/to/file.mp3">Audio File 1</a>

Obviously, if jQuery is in use for your page, you can reduce the code to an even smaller snippet.

git branches and rebasing

Around here I have a reputation for finding the tiniest pothole on the path to git happiness, and falling headlong into it while strapped to a bomb ...

But at least I'm dedicated to learning something each time. This time it involved branches, and how git knows whether you have merged that branch into your current HEAD.

My initial workflow looked like this:

 $ git checkout -b MY_BRANCH
   (some editing)
 $ git commit
 $ git push origin MY_BRANCH
 $ git checkout origin/master
 $ git merge --no-commit origin/MY_BRANCH
   (some testing and inspection)
 $ git commit
 $ git rebase -i origin/master

This last step was the trip-and-fall, although it didn't hurt me so much as launch me off my path into the weeds for a while. Once I did the "git rebase", git no longer knows that MY_BRANCH has been successfully merged into HEAD. So later, when I did this:

 $ git branch -d MY_BRANCH
 error: the branch 'MY_BRANCH' is not fully merged.

As I now understand it, the history is no longer a subset of the history associated with MY_BRANCH, so git can't tell the two are related and refuses to delete the branch unless you supply it with -D. A relatively harmless situation, but it set off all sorts of alarms for me, as I thought I messed up the merge somehow.

Implementing Per Item Discounts in Spree

Discounts in Spree

For a good overview of discounts in Spree, the documentation is a good place to start. The section on Adjustments is particularly apropos.

In general, the way to implement a discount in Spree is to subclass the Credit class and attach or allow for attaching of one or more Calculators to your new discount class. The Adjustment class file has some good information on how this is supposed to work, and the CouponCredit class can be used as a template of how to do such an implementation.

What we Needed

For my purposes, I needed to apply discounts on a per Item basis and not to the entire order.

The issue with using adjustments as-is is that they are applied to the entire order and not to particular line items, so creating per line item discounts using this mechanism is not obviously straight forward. The good news it that there is nothing actually keeping us from using adjustments in this manner. We just need to modify a few assumptions.

Implementation Details

This is going to be a high-level description of what I did with (hopefully) enough hints about what are probably the important parts to point someone who wants to do something similar in the same direction.

Analogous to the Coupon class in Spree, I create a Discount class. It holds the meta-data information about the discount. Specifically, the product that the discount applies to and the business logic for determining under what circumstances to apply the discount and how much to apply.

There is also a DiscountCredit class which subclasses the Credit class. In this class I re-define two methods:

  • applicable? returns true when the discount applies to the line_item
  • calculate_adjustment calculates the amount of the discount based on the business rules.

I also add a couple of convenience methods:

  • line_item returns self.adjustment_source
  • discount returns

The trick (as an astute reader might infer from the convenience methods) is to set the line_item which the discount is getting applied to to the adjustment_source attribute in the discount object.

The adjustment generally expects that you will be setting this to something like an instance of the Discount class, but as long as we ensure that LineItems implement any interface constraints required by Adjustments, we should be okay.

To that end, I monkey patch the LineItem class in my extension to add a method called add_discount. This method creates a new instance of a DiscountCredit object and passes in iteslf as the adjustment_source. I then add this credit object to the adjustments on the order.

I also add a method to iterate through all of the discounts to look for one that might already be applied to this line_item instance. I use this method in the add_discount method to ensure that I don't add more than one credit per line item.

To bring this together, I monkey patch the Order class to add a method that iterates through all of the line items in the order and calls add_discount on each one. I add a after_save callback which calls this method to ensure that discounts are applied to all line items each time the order is updated.

That takes care of the mechanics of applying the discounts. From this point several things will be taken care of by Spree. Any discounts that are not applicable will get removed. The cart totals will get added up properly and discounts will be applied as adjustments.

Other things you might want to do

You may not want Spree to only display applied discounts at checkout as a (potentially) long list of credits tacked on to the end of the order.

For example, I found it useful to create some helpers to peek into the order adjustments and pull out the discount for a particular line item when displaying the cart. I also wanted to consolidate all of the discounts as a total amount under discount, rather than display them independently, so, I modified the views that handled displaying the credits.

In my implementation, I found it more straight-forward to forgo the use of calculators when implementing the business logic. But, they would work just fine as part of the Discount class and the DiscountCredit#calculate_adjustment method can call the calculator#calculate method to determine the amount to discount.


This approach works because Spree automatically consolidates products/variants into the same line_item in the cart. In my approach, I assigned discounts at the product level, but applied them at the variant level. This worked for me because I didn't have any variants in my data set.

A general solution would probably assign discounts at the product level (it's too annoying to track them on a per-variant basis) and further track enough information to ensure that a discount was properly applied to any valid line_items that contained variants of that product.


All in all, I found that most of the heavy lifting was already done by the Adjustments code. All it really took was looking at the assumptions behind how the credits were working from a slightly different angle to see how I could modify things to allow per line item discounts to be implemented.

SEO friendly redirects in Interchange

In the past, I've had a few Interchange clients that would like the ability to be able to have their site do a SEO friendly 301 redirect to a new page for different reasons. It could be because either a product had gone out of stock and wasn't going to return or they completely reworked their url structures to be more SEO friendly and wanted the link juice to transfer to the new URLs. The normal way to handle this kind of request is to set up a bunch of Apache rewrite rules.

There were a few issues with going that route. The main issue is that to add or remove rules would mean that we would have to restart or reload Apache every time a change was made. The clients don't normally have the access to do this so it meant they would have to contact me to do it. Another issue was that they also don't have the access to modify the Apache virtual host file to add and remove rules so again, they would have to contact me to do it. To avoid the editing issue, we could have put the rules in a .htaccess file and allow them to modify it that way, but this can present its own challenges because some text editors and FTP clients don't handle hidden files very well. The other issue is that even though overall basic rewrite rules are pretty easy to copy, paste and reuse, they still can have nasty side effects if not done properly and can also be difficult to troubleshoot so I devised a way to allow them to be able to manage their 301 redirects using a simple database table and Interchange's Autoload directive.

The database table is a very simple table with two fields. I called them old_url and new_url with the primary key being old_url. The Autoload directive accepts a list of subroutines as its arguments so this requires us to create two different GlobalSubs. One to actually do the redirect and one to check the database and see if we need to redirect. The redirect sub is really straight forward and looks like this:

sub redirect {
   my ($url, $status) = @_;
   $status ||= 302;
   $Vend::StatusLine = qq|Status: $status moved\nLocation: $url\n|;
   $::Pragma->{download} = 1;
   my $body = '';
   $Vend::Sent = 1;
   return 1;

The code for the sub that checks to see if we need to redirect looks like this:

sub redirect_old_links {
   my $db = Vend::Data::database_exists_ref('page_redirects');
   my $dbh = $db->dbh();
   my $current_url = $::Tag->env({ arg => "REQUEST_URI" });
   my $normal_server = $::Variable->{NORMAL_SERVER};
   if ( ! exists $::Scratch->{redirects} ) {
       my $sth = $dbh->prepare(q{select * from page_redirects});
       my $rc  = $sth->execute();
       while ( my ($old,$new) = $sth->fetchrow_array() ) {
           $::Scratch->{redirects}{"$old"} = $new;
   if ( exists $::Scratch->{redirects}  ) {
       if ( exists $::Scratch->{redirects}{"$current_url"} ) {
           my $path = $normal_server.$::Scratch->{redirects}{"$current_url"};
           my $Sub = Vend::Subs->new;
           $Sub->redirect($path, '301');
       } else {

We normally create these as two different files and put them into our own directory structure under the Interchange directory called custom/GlobalSub and then add this, include custom/GlobalSub/*.sub, to the interchange.cfg file to make sure they get loaded when Interchange restarts. After those files are loaded, you'll need to tell the catalog that you want it to Autoload this subroutine and to do that you use the Autoload directive in your catalog.cfg file like this:

Autoload redirect_old_links

After modifying your catalog.cfg file, you will need to reload your catalog to ensure to change takes effect. Once these things are in place, you should just be able to add data into the page_redirects table and start a new session and it will redirect you properly. When I was working on the system, I just created an entry that redirected /cgi-bin/vlink/redirect_test.html to /cgi-bin/vlink/index.html so I could ensure that it was redirecting me properly.


Written and spoken communication involve language, and language builds on a lot of conventions. Sometimes choosing one convention over another is an easy way to reduce confusion and help you communicate more effectively. Here are a few areas I've noticed unnecessary confusion in communication, and some suggestions on how we can do better.

2-dimensional measurements

Width always comes first, followed by height. This is a longstanding printing and paper measurement custom. 8.5" x 11" = 8.5 inches wide by 11 inches high. Always. Of course it never hurts to say specifically if you're the one writing: 8.5" wide x 11" high, or 360px wide x 140px high.

If a third dimension comes into play, it goes last: 10" (horiz.) x 10" (vertical) x 4" (deep).


In file names, source code, databases, or spreadsheets, use something unambiguous and easily sortable. A good standard is ISO 8601, which orders dates from most significant to least significant, that is, year-month-day, or YYYY-MM-DD. For example, 2010-01-02 is January 2, 2010. If you need to store a date as an integer or shave off 2 characters, the terser YYYYMMDD is an option with the same benefits but a little less readability.

For easier human reading, try "2 January 2010", "2 Jan. 2010", or "January 2, 2010", which don't sort easily but are still unambiguous. The most confusing form in common use is 1/4/08 or 01/04/08, which is ambiguous whenever the year of century or the day of month are 12 or less. That's almost half of every month, and the first dozen years of each century! I've seen people mean by 01/04/08 any of April 8, 2001; April 1, 2008; or more commonly in the U.S., January 4, 2008. By avoiding this form entirely, you avoid a lot of confusion.

Time zones

When dealing with anyone who isn't at the some location you are, specify a time zone with every time. It's easy. So many of us travel or interact with people in remote locations that we shouldn't assume a single time zone.

You can save others some mental strain by translating times into the time zone of the majority of other participants, especially if there's an overwhelming majority in one particular time zone. It's polite.

In time zones, the word "standard" isn't just filler meaning "normal time zones" -- it specifically means "not daylight saving time"! So don't say "Eastern Standard Time" unless you really mean "Eastern Time outside of daylight saving", referring to somewhere that doesn't observe daylight saving time. It's simplest and most often correct in conversation to just say "Eastern Time". When people say "Something Standard Time" but daylight saving time is in effect, beware, because they probably actually just mean "Something Time, either daylight or not, whichever is in effect then". It's good to ask them and confirm what they meant.

Just to keep things interesting, the "S" doesn't always mean "standard". British Summer Time is the British daylight saving time zone and is abbreviated BST.

Close of business

I find it better to avoid the terms "end of business day" or "close of business" because people often stop working at different times, and most of us communicate with people in many time zones. Why not just say what time you really mean?

Likewise, "by the end of the week" is ambiguous both about what time on the last day, and which day you consider the end of the week. The end of the work week? Whose work week? European calendars show Sunday as the end of the week, while American calendars most often show Saturday as the end of the week. Again, by just saying which day you mean, you can avoid causing confusion.

What conventions have you found helpful or harmful in communication?

Providing Database Handle for Interchange Testing

I've recently begun using the test driven development approach to my projects using Perl's Test::More module. Most of my projects lately have been with Interchange which has some hurdles to get around as far as test driven development is concerned. Primarily this is because Interchange runs as a daemon and provides some readily available utilites like the database handle. This method is not available to our tests, so they need to be made available as discussed below.

I develop Usertags, GlobalSubs and ActionMaps where applicable as it helps keep the separation of business logic and views clear. I generally organize these to call a function within a Perl module so they can be tested properly. Most of these tags involve some sort of connection with the database to present information to the user in which I uses the Interchange ::database_exists_ref method.

When it comes to testing I want to ensure that the test script invokes the same method. Otherwise, your script will not be testing the code as its used in production.

Let's say you are building a Perl module that looks something like this:

package YourMagic;
use strict;

sub do_something {
    my ($opt) = @_;

    # some code

    my $dbh = ::database_exists_ref($opt->{table})->dbh
        or return undef;

    # ... more code
    return $output;


The ::database_exists_ref() method will not be available for a test script and needs to be defined. It should return an object to the dbh method in the test script as it does within Interchange. There is no need to test the method itself, as it is not part of the "what" that is being developed. The following code needs to be added to the test script so it can handle the correct type of database reference returned by Interchange.

use lib '/home/user/interchange/custom/lib';
use Test::More tests => 2;
use DBI;

# Here are the methods to provide proper reference to our database handle
sub ::database_exists_ref {
    my $table = shift;
    return undef if !$table;

    # return an object with a dbh method
    return bless({}, __PACKAGE__);

sub dbh {
    # define a dbh method
    my $db = DBI->connect('dsn, 'user', 'pass');

    return $db;

use YourMagic;

    'do_something() returns undef when called with no arguments',

    'do_something() returns ...',

It is also worthwhile to note that you'll need to use the ::database_exists_ref method to look up some information from the existing table that is valuable to test against. Now the do_something() method will call ::database_exists_ref() when invoked.

This approach allows us to use, reuse, and add new tests without worrying about mock data during the intial development. You can be sure that the existing test scripts will function properly against the latest data that is available.

I will cover some other topics regarding Interchange Test Driven Development in future posts. For more information regarding Unit Testing in general see this post by Ethan.

Red Hat SELinux policy for mod_wsgi

Using SELinux, you can safely grant a process only the permissions it needs to perform its function, and no more. Linux distributions provide policies to enforce these limits on most software they package, but many aren't covered. We've made allowances for mod_wsgi on RHEL and CentOS 5 by extending Apache httpd's SELinux policy.

It seems the SELinux policy for Apache httpd is twice as large as any other package's. The folks at Red Hat have put a lot of work into making sure that attackers who manage to exploit httpd can't break out to the rest of your system, while still allowing the flexibility to serve most applications. Consult the httpd_selinux man page if messages in audit.log coincide with your error.

File Contexts

If you've created files and/or directories in /etc/httpd, make sure they have the proper file contexts so the daemon can read them:

  # restorecon -vR /etc/httpd

httpd can only serve files with an explicitly allowed file context. Configure the context of files and directories within your production code base using the semanage command:

  # semanage fcontext --add --ftype -- --type httpd_sys_content_t "/home/projectname/live(/.*)?"
  # semanage fcontext --add --ftype -d --type httpd_sys_content_t "/home/projectname/live(/.*)?"
  # restorecon -vR /home/projectname/live

View file contexts with ls -Z. Changes should be generally accomplished with semanage and restorecon -vR.


The httpd policy provides several boolean options for easy run-time configuration:

  • httpd_can_network_connect - Allows httpd to make network connections, including the local ones you'll be making to a database
  • httpd_enable_homedirs - Allows httpd to access /home/

Booleans are persistently set using the setsebool command with the -P flag:

  # setsebool -P httpd_can_network_connect on

WSGI Socket

When running in daemon mode, httpd and the mod_wsgi daemon communicate via a UNIX socket file. This should usually have a context of httpd_var_run_t. The standard Red Hat SELinux policy includes an entry for /var/run/wsgi.* to use this context, so it makes sense to put the socket there using the WSGISocketPrefix directive within your httpd configuration:

  WSGISocketPrefix run/wsgi

(Note that run/wsgi translates to /etc/httpd/run/wsgi which is symlinked to /var/run/wsgi.)

If socket communication fails, httpd returns a 503 "Temporarily Unavailable" error response.

SELinux Policy Module

In the course of our testing SELinux denials like the following appeared: type=AVC msg=audit(1262803154.315:1851): avc:  denied  { execmem } for  pid=5337 comm="httpd" scontext=root:system_r:httpd_t:s0 tcontext=root:system_r:httpd_t:s0 tclass=process

Unusual behavior like this is usually best allowed by creating application-specific SELinux policy modules. If you cannot resolve these AVC errors by manipulating file contexts or booleans, collect all the errors into a single file and feed that into the audit2allow utility:

  # yum install policycoreutils
  # mkdir ~/tmp  # if this doesn't exist already
  # audit2allow --module wsgi < ~/tmp/pile_of_auditd_output > ~/tmp/wsgi.te

This will output source for a new policy module. You might review the .te file before compiling. Ours looks like this:

module wsgi 1.0;

require {
      type httpd_t;
      class process execmem;

#============= httpd_t ==============
allow httpd_t self:process execmem;

Compile this source into a new policy module and package it:

  # checkmodule -M -m -o ~/tmp/wsgi.mod ~/tmp/wsgi.te
  # semodule_package --outfile ~/tmp/wsgi.pp --module ~/tmp/wsgi.mod

Once created, the module may be installed permanently into any compatible system's SELinux configuration:

  # semodule --install ~/tmp/wsgi.pp

There's plenty of room for improvement here. The file contexts we assigned with semanage should be defined in a .fc source file and included within the policy module. And creating a new context just for the WSGI daemon to transition into would restrict it further, allowing only a subset of Apache httpd's abilities. Writing your own policy like this allows you much finer tuning of your processes' limits, while allowing their needed functionality.

Keep Your Tools Sharp To Avoid Personal Technical Debt

One of the things that really struck me when I started working here at End Point was how all of my co-workers possessed surprisingly deep knowledge of just about every tool they used in their work. Now, I've been developing web applications on Linux for years and I've certainly read my fair share of man pages. But, I've always tended to learn just enough about a specific tool or tool set to get my job done. That is, until I started working here.

I've always thought that a thirst for knowledge and an inquisitive nature are both prerequisites for becoming a good developer. Did you take apart your toys when you were a child because you wanted to see how they worked? Were you even able to put some of them back together such that they still worked?

I did, too. I wanted to know how everything worked. But, somewhere along the line I seem to have decided that there "wasn't time to learn about that" (where "that" was git rebase or mock objects in unit testing, or NoSQL databases like Cassandra) because I live and work in the real world. I have projects with milestones and deadlines. I have meetings and code to review. I have a life outside the office. These are common constraints, and I had often let them prevent me from learning something new. In retrospect, that was a tremendous cop-out.

Look around you. I bet you can find a co-worker with that same list of obligations who is learning something new right now. If not, I'm sure there are developers working for your company's competition that are learning something new right now. They are keeping their tools sharp and rust-free.

When you decide you "don't have time to learn about that," you trade a small time-savings now for less mastery of a tool, technology or methodology going forward. You quietly accumulate a little more "personal technical debt." This is the same kind of technical debt that affects projects and companies, but now if affects you directly and almost immediately. You'll start paying interest on it as soon as that meager time-savings is burned up.

So, where does this leave you and me? We still don't have any more hours in the day. How do we get our work done while continuing to keep our tools sharp by learning new things? Here's the secret:

Start out small and stick with it.

Dig into something small and relevant to your work today. Start using it to make yourself more efficient. If you're a Perl developer, learn basic unit testing with Test::More or start using Mouse to simplify and streamline your object-oriented development. If your app uses the Postgres open source database, learn more about how EXPLAIN ANALYZE can help you optimize your queries faster and cheaper than throwing more hardware at the problem. By starting out small, you'll find it easier to make that initial time investment, and you'll see a quicker return on it. You'll create a positive feedback loop almost immediately.

Once you're seeing these dividends, the next step might be for you to leverage your personal technical investment by sharing them with your peers and facilitating them in doing the same. Schedule a 30 minute training session every Wednesday in which a member of your team gives a quick and dirty talk about something they've learned. I'm talking black and white slides - no clip-art and no star wipes. The presentation file gets copied to your company's wiki or file server where people can grab it afterward. Get management to buy lunch. Make sure the food and the projector are set up 5 minutes early so no one feels their time is being wasted. At the 30 minute mark, pick next week's presenter. Anyone who wants to continue the discussion beyond the 30 minutes is free to do so.

So, again: Start out small and stick with it. You'll see an immediate payoff in terms of increased quality of work and productivity, which in turn, means more job satisfaction for you. And that will put you on a path towards the wizardry that my fellow End Pointers seem to perform every day.

Upgrading old versions of Postgres

Old elephant courtesy of

The recent release of Postgres 9.0.0 at the start of October 2010 was not the only big news from the project. Also released were versions 7.4.30 and 8.0.26, which, as I noted in my usual PGP checksum report, are going to be the last publicly released revisions in the 7.4 and 8.0 branches. In addition, the 8.1 branch will no longer be supported by the end of 2010. If you are still using one of those branches (or something older!), this should be the incentive you need upgrade as soon as possible. To be clear, this means that anyone running Postgres 8.1 or older is not going to get any official updates, including security and bug fixes.

A brief recap: Postgres uses major versions, containing two numbers, to indicate a major change in features and functionality. These are released about every two years. Each of these major versions has many revisions, which are released as often as needed. These revisions are designed to be completely binary compatible with the previous revision, meaning you can upgrade revisions very easily, with no dump and restore of the data needed.

Below are the options available for those running older versions of Postgres, from the most desirable to the least desirable. The three general options are to upgrade to the latest release (9.0 as I write this), migrate to a newer version, or stay on your release.

1. Upgrade to the latest release

This is the best option, as each new version of Postgres adds more features and becomes more efficient, all while maintaining the high code quality standards Postgres is known for. There are three general approaches to upgrading: pg_upgrade, pg_dump, and Bucardo / Slony.

Using pg_upgrade

The pg_upgrade utility is the preferred method for upgrading in the future. Basically, it rewrites your data directory from the "old" on-disk format to the "new" one. Unfortunately, pg_upgrade only works from version 8.3 and onwards, which means it cannot be used if you are coming from an older version. (This utility used to be called pg_migrator, in case you see references to that.)

Dump and restore

The next best method is the tried and true "dump and restore". This involves using pg_dump to create a logical representation of the old database, and then loading it into your new database with pg_restore or psql. The disadvantage to this method is time - dump and reload can take a very, very long time for large databases. Not only does the data need to get loaded into the new database tables, but all the indexes must be recreated, which can be agonizingly slow.

Replication systems

A third option is to use a replication system such as Slony or Bucardo to help with the upgrade. With Slony, you can set up a replication from the old version to the new version, and then failover to the new version once replication is caught up and running smooth. You can do something similar with Bucardo. Note that both systems can only replicate sequences, and tables containing primary keys or unique indexes. Bucardo has a "fullcopy" mode that will copy any table, regardless of primary keys, but it's slow as it's equivalent to a full dump and restore of the table. Note that Bucardo is really only tested on the 8.X versions: for anything older, you will need to use Slony.

Even if you cannot replicate all your tables, such systems can help a migration by replicating most of your data. For example, if you have a 750 GB table full of mostly historical data, you can have Bucardo start tracking changes to the table, set up a copy on the new version (perhaps by using warm standby or a snapshot to reduce load on the master), and then start Bucardo to catch up the rows that have changed since the changes were tracked. If you do this for all your large tables, the actual upgrade process can proceed with minimal downtime by shutting down the master, doing a pg_dump of only the non-tracked tables, and then pointing your apps at the new server.

2. Migrate to a newer version

Even if you don't go to 9.0, you may want to upgrade to a newer version. Why not go all the way to 9.0? There are only two good reasons not to. One, if your system's packaging system does not have 9.0 yet, or you have custom packaging requirements that prevent you from doing so. Two, if you have concerns about application compatibility between two versions. However, that latter concern should be minimal. The largest and most disruptive compatibility change appeared in version 8.3 with the removal of implicit casts. Since 8.2 is likely to be unsupported in the next couple years, you should be going to at least 8.3. And if you can go to 8.3, you can go to 9.0.

3. Stay on your release

This is obviously the least-desirable option, but may be necessary due to real-world constraints involving time, testing, compatibility with other programs, etc. At the bare minimum, make sure you are at least running the latest revision, e.g. 7.4.30 if running 7.4. Moving forward, you will need to keep an eye on the Postgres commits list and/or the detailed release notes for new versions, and examine if any of the fixed bugs apply to your version or your situation. If they do, you'll need to figure out how to apply the patch to your older version, and then release this new version into your environment. Sound risky? It gets worse, because your patch is only being used and tested by an extremely small pool of people, has no build farm support, and is not available to the Postgres developers. If you want to go this route, there are companies familiar with the Postgres code base (including End Point) that will help you do so. But know in advance that we are also going to push you very hard to upgrade to a modern, supported version instead (which we can help you with as well, of course :).

Spree Sample Data: Orders and Checkout

A couple of months ago, I wrote about setting up Spree sample data in your Spree project with fixtures to encourage consistent feature development and efficient testing. I discussed how to create sample product data and provided examples of creating products, option types, variants, taxonomies, and adding product images. In this article, I'll review the sample order structure more and give an example of data required for a sample order.

The first step for understanding how to set up Spree order sample data might require you to revisit a simplified data model to examine the elements that relate to a single order. See below for the interaction between the tables orders, checkouts, addresses, users, line items, variants, and products. Note that the data model shown here applies to Spree version 0.11 and there are significant changes with Spree 0.30.

Basic diagram for Spree order data model.

The data model shown above represents the data required to build a single sample order. An order must have a corresponding checkout and user. The checkout must have a billing and shipping address. To be valid, an order must also have line items that have variants and products. Here's an example of a set of fixtures to create this bare minumum sample data:

  id: 1
  user_id: 1
  number: "R00000001"
  state: new
  item_total: 20.00
  created_at: <%= %>
  completed_at: <%= %>
  total: 20.00
  adjustment_total: 0.00
  bill_address: address_1
  ship_address: address_1
  email: ''
  order_id: 1
  state: complete
  shipping_method: canada_post
  firstname: Steph
  lastname: Powell
  address1: 12360 West Carolina Drive
  city: Lakewood
  state_id: 889445952
  zipcode: 80228
  country_id: 214
  phone: 000-000-0000
  order_id: 1
  variant: test_variant
  quantity: 2
  price: 10.00
  product: test_product
  price: 10.00
  cost_price: 5.00
  count_on_hand: 10
  is_master: true
  sku: 1-master
  name: Test Product 1
  description: Lorem ipsum...
  available_on: <%= %>
  count_on_hand: 10
  permalink: test-product
#copy Spree core to create a user with id=1

After adding fixtures for the minimal order data required, you might be interested in adding peripheral data to test custom work or test new feature development. This peripheral data might include:

  • shipping methods: A checkout belongs to a shipping method, and has many shipping rates and shipments.
  • shipments: An order has many shipments. Shipments are also tied to the shipping method.
  • inventory units: An order has many inventory units, corresponding to each item in the order.
  • payments: Orders and checkouts have many payments that must cover the cost of an order. Multiple payments can be assigned to each order.
  • adjustments: Shipping charges and tax charges are tracked by adjustments, which belong to orders.
  • return authorizations: Return authorizations belong to orders and track returns on orders, tied to inventory_units in the order that are returned.

In my experience, I've worked with a few Spree projects where we created fixtures for setting peripheral sample data to test custom shipping and inventory management. Again, note that the data models described in this article are in place in Spree <= 0.11.0. Spree 0.30 will introduce data model changes to be discussed at a later date.

Seeking a Ruby, Rails, Spree developer

Today I realized that we never posted our job announcement on our own blog even though we'd posted it to several job boards. So here it is:

We are looking for a developer who can consult with our clients and develop Ruby web applications. Most of our needs center around Rails, Spree, and SQL, but Sinatra, DataMapper, and NoSQL are lately coming into play too.

End Point is a 15-year-old web consulting company based in New York City, with 20 full-time staff developers working remotely from around the United States. We prefer open source technology and do collaborative development with Git, GNU Screen, IRC, and voice.

Experience with mobile and location-based technologies is a plus.

Please email to apply.

Surge 2010 wrap-up

Following up on my earlier post about day 1 of the conference, here is an unsorted collection of what I felt were noteworthy observations made in talks at Surge 2010:

Web engineering as a separate discipline from computer science or software development started around 1999. It is interdisciplinary, involving human factors engineering, systems engineering, operations research, fault-tolerant design, and control systems engineering. (John Allspaw)

A real-time system is one in which the correctness of a system is tied to its timeliness. Eventual consistency is an oxymoron if timeliness is part of the data itself. Caching by CDNs can't solve our problems here. (Bryan Cantrill)

Pre-fab metrics are worth less (and maybe worthless) when not tied to something in your business. Message queues enable lots of new uses because of the ability to have multiple observers. See Esper (Java, GPL) for live ongoing SQL-like queries of messages from AMQP sources, etc. (Theo Schlossnagle)

On scaling up vs. out: If your numbers show "up" is enough, be happy you can keep your system simpler. (Theo Schlossnagle)

Anyone can only ever know the past in a distributed system. There's no such thing as global state in reality. Our systems are always at least slightly inconsistent with the world. "Eventually consistent" just acknowledges the reality of delay and focuses on measuring and dealing with that. (Justin Sheehy)

Reliability compared to resiliency: Being resilient means success of your mission despite partial failure of components. How do you deal with failure? Degrade, and know before your users do. (Justin Sheehy)

Build in monitoring during development, so it's not a bottleneck right before deployment. (John Allspaw)

Data comes from the devil. Models come from God. Data + Models = Insight. Data needs to be put in a prison (a model) and made to confess the truth. Measurement is a process. Numbers aren't right. What is the error range? Visualization is helpful, but analyze the raw data looking for anomalies (such as > 100% efficiency, etc.). VAMOOS = visualize, analyze, modelize, over & over till satisfied. (Neil Gunther)

Anycast for DNS alone tends to localize on the user's recursive resolver, not their actual location. Anycast for the actual content delivery automatically localizes on the user's actual location. (Tom Daly)

To scale up, add more capacity to do X, make system do X faster, or stop doing so much X. What makes a task take time? It's utilizing a resource, it's waiting for a response, or it's waiting for synchronization. "Shard early & often" is expensive & unnecessary for most situations. Sharding makes sense when write demand exceeds capacity. (Baron Schwartz)

Eight fallacies of distributed computing were discussed by Ruslan Belkin.

Mike Mallone of SimpleGeo gave a fascinating talk on working with geolocation data in Cassandra. It'd be wonderful to see an open source release of their order-preserving partitioner that allows for range queries in a single dimension. Or to start with, just the slides from Mike's talk!

In summary, it was a very good conference!

Two Cool Things about Liquid Galaxy

I. It uses COTS Hardware.

Liquid Galaxy is suitable for using with COTS (Commodity Off The Shelf) hardware. Yes, Google Earth itself is rather resource intensive, so it helps performance to use fiesty computers (including ones with SSDs) but it's still COTS hardware. Of course the very cool thing about using COTS hardware is that the price is right and gets better all the time.

II. A Simple, elegant and powerful Master/Slave configuration and communication approach

Liquid Galaxy works by configuring its "slave" systems to have offsets from the point of view of the master system that the system's user navigates on. The slave systems "know" their locations relative to the master system. The master system broadcasts its location to the slaves via UDP packets. It's then up to the slave systems to figure out what portion of a Google Earth globe they need to retrieve themselves relative to the coordinates broadcast from the master system.

With this approach it's easy to scale to a large number of slave systems. An interesting extension to this configuration and communication approach that the Google engineering team for the project provided for is the ability to configure one or more remote Liquid Galaxies to remotely mirror the views being displayed in a given "Master Galaxy". This will allow teams of users to remotely view the same Google Earth views within the awesome environment of distinct Liquid Galaxies. The remote Liquid Galaxies essentially "play" the same views as are seen in the Master Galaxy, but negligible network traffic is passed from system to system.

See the End Point website for more info about Liquid Galaxy and our support offerings for it.

Surge 2010 day 1

Today (technically, yesterday) was the first day of the Surge 2010 conference in Baltimore, Maryland. The Tremont Grand venue is perfect for a conference. The old Masonic lodge makes for great meeting rooms, and having a hallway connect it to the hotel was nice to avoid the heavy rain today. The conference organization and scheduling and Internet have all been solid. Well done!

There were a lot of great talks, but I wanted to focus on just one that was very interesting to me: Artur Bergman's on scaling Wikia. Some points of interest:

  • They (ab)use Google Analytics to track other things besides the typical pages viewed by whom, when. For example, page load time as measured by JavaScript, with data sent to a separate GA profile for analysis separately from normal traffic. That is then correlated with exit rates to give an idea of the benefit of page delivery speed in terms of user stickiness.
  • They use the excellent Varnish reverse proxy cache.
  • 500 errors from the origin result in a static page served by Varnish, with error data hitting a separate Google Analytics profile.
  • They have both geographically distributed servers and team.
  • They've found SSDs (solid state disks) to be well worth the extra cost: fast, using less power in a given server, and requiring fewer servers overall. They have to use Linux software RAID because no hardware RAID controllers they've tested could keep up with the speed of SSD. They have run into the known problems with disk write performance dropping as they fill and recycle, but haven't found it to be a problem when used on replaceable cache machines.
  • They run their own CDN, with nodes running Varnish, Quagga (for BGP routing), BIND, and Scribe. But they use Akamai for static content.
  • Even running Varnish with 1 second TTL can save your backend app servers when heavy traffic arrives! One hit per second is no problem; thousands may mean meltdown.
  • Serving stale cached content when the backend is down can be a good choice. It means most visitors will never know anything was wrong. (Depends on the site's functions, of course.)
  • Their backup datacenter in Iowa is in a former nuclear bunker. See monitoring graphs for it.
  • Wikia ops staff interact with their users via IRC. This "crowdsourced monitoring" has resulted in a competition between Wikia ops people and the users to see who can spot outages first.
  • Having their own hardware in multiple redundant datacenters has meant much more leverage in pricing discussions with datacenters. "We can just move."
  • They own their own hardware, and run on bare metal. At no time does user traffic pass through any virtualized systems at all. The performance just isn't there. They do use virtual machines for some external monitoring stuff.
  • They use Riak for N-master inter-datacenter synchronization, and RiakFS for sessions and files. RiakFS is for the "legacy" MediaWiki need for POSIX access to files, but they can serve those files to the general public from Riak's HTTP interface via Varnish cache.
  • They use VPN tunnels between datacenters. Sometimes using their own routes, even over multiple hops, leads to faster transit than going over the public Internet.
  • Lots of interesting custom VCL (Varnish Configuration Language) examples.

This had plenty of interesting things to consider for any web application architecture.