Welcome to End Point’s blog

Ongoing observations by End Point people

Annotating Your Logs

We recently did some PostgreSQL performance analysis for a client with an application having some scaling problems. In essence, they wanted to know where Postgres was getting bogged down, and once we knew that we'd be able to target some fixes. But to get to that point, we had to gather a whole bunch of log data for analysis while the test software hit the site.

This is on Postgres 8.3 in a rather locked down environment, by the way. Coordinated pg_rotate_logfile() was useful, but occasionally it would seem to devolve to something resembling: "Okay, we're adding 60 more users ... now!" And I'd write down the time stamp, and figure out an appropriate place to slice the log file later.

Got me thinking, what if we could just drop an entry into the log file, and use it to filter things out later? My first instinct was to start looking at seeing if a patch would be accepted, maybe a wrapper for ereport(), something easy. Turns out, it's even easier than that...

pubsite=# DO $$BEGIN RAISE LOG 'MARK: 60 users'; END;$$;
Time: 0.464 ms
pubsite=# DO $$BEGIN RAISE LOG 'MARK: 120 users'; END;$$;
Time: 0.378 ms
pubsite=# DO $$BEGIN RAISE LOG 'MARK: 360 users'; END;$$;
Time: 0.700 ms

Of course the above will only work on version 9.0 and up (eventually). Previous versions that have PL/pgSQL turned can just create a function that does the same thing. The "LOG" severity level is an informational message that's supposed to always make it into the log files. So with those in place, a grep through the log can reveal just where they appear, and sed can extract the sections of log between those lines and feed them into your favorite analysis utility:

postgres@mothra:~$ grep -n 'LOG:  MARK' /var/log/postgresql/postgresql-9.0-main.log 
19180:2011-03-31 20:20:37 EDT LOG:  MARK: 60 users
19478:2011-03-31 20:25:48 EDT LOG:  MARK: 120 users
20247:2011-03-31 20:32:15 EDT LOG:  MARK: 360 users
postgres@mothra:~$ sed -n '19180,19478p' /var/log/postgresql/postgresql-9.0-main.log | bin/ > 60users.html

Oh, and the performance problem? Turns out it wasn't Postgres at all, every single query average execution time was shown to vary minimally as the concurrent user count was scaled higher and higher. But that's another story.

Postgres Build Farm Animal Differences

I'm a big fan of the Postgres Build Farm, a distributed network of computers that are constantly installling, building, and testing Postgres to detect any problems in the code. The build farm works best when there is a wide variety of operating systems and architectures testing. Thus, while I have a rather common x86_64 Linux box available for testing, I try to make it a little unique to get better test coverage.

One thing I've been working on is clang support (clang is an alternative to gcc). Unfortunately, the latest version of clang has a bug that prevents it from building Postgres on Linux boxes. I submitted a small patch to the Postgres source to fix this, but it was decided that we'll wait until clang fixes their bug. Supposedly they have in their svn head, but I've not been able to get that to compile successfully.

So I also just installed gcc 4.6.0, the latest and greatest. Installing it was not easy (nasty problems with the mfpr dependencies), but it's done now and working. It probably won't make any difference as far as the results, but at least my box is somewhat different from all the other x86_64 Linux boxes in the farm. :)

I've asked before on the list (with no response) about what sort of configuration changes could be made to expand the range of testing. The build farm itself provides a handful of things to choose from, and most of the animals in the farm have most of them configured (I have everything except "pam" and "vpath" enabled). However, one thing I've thought about changing is NAMEDATALEN. It's basically a compile-time option that sets the maximum number of characters things like table names can have. It is set by default to 64, while the SQL spec wants it to be 128. The problem is that this causes some tests to fail, as they have a hard-coded assumption about the length. The real problem of course is that Postgres' 'make check' is a very crude test. I've got some ideas on how to fix that, but that's another post for another day. So, anyone have other ideas on how to make my particular build farm member, and others like it, more useful?

Interactive Git: My New Found Friend(s)

As a software engineer I'm naturally inclined to be at least somewhat introverted :-), combine that with the fact that End Point is PhysicalWaterCooler challenged and you have a recipe for two things to naturally occur, 1) talking to oneself (but then who doesn't do that really? no, really.), 2) finding friends in unusual places. Feeling a bit socially lacking after a personal residence move, I was determined to set out to find new friends, so I found one, his name is "--interactive", or Mr. git add --interactive.

"How did we meet?" You ask. While working on a rather "long winded" project I started to notice myself sprinkling in TODOs throughout the source code, not a bad habit really (presuming they do actually eventually get fixed), but unfortunately the end result is having a lot of changed files in git that you don't really need to commit, but at the same time don't really need to see every time you want to review code. I'm fairly anal about reviewing code and so I was generally in the habit of running a `git status` followed by a `git diff ` on every file that was mentioned by status. These are two great friends, but of late they just don't seem to be providing the inspiration they once did. Enter my new friend `git add --interactive`. Basically he combines the two steps for me in a nice, neat controlled way while adding a bit of spice to my life, in particular per change inclusion capability. When running `git add` with the interactive flag you are provided with an overall index status immediately followed by a prompt. At that prompt you have an option of "5) patch", by entering "5", then return, you are then provided the index (basically) again. From that index you can select from a list of files for which you would like to review patches. For each reviewed patch you can then specify whether to include that patch for commit, skip it, divide (split) it into smaller patches for further review, or even edit it. When selecting the files to review the patches for it is simple to choose a range of files by entering a specifically formatted string, i.e. "1-12,15-18,19". With --interactive the time it takes to review the code pending commit and skip through the TODOs is greatly reduced, something the client definitely appreciates.

"But what about your other old friends?" You then ask. Well, as it turns out my spending so much time with interactive add made `git stash` feel a bit lonely, and it dawned on me that tracking those TODOs in the working tree at all may be a bit silly. What could a guy do, perhaps these two friends might actually like to party together? As it turns out they had already been introduced and do like to party together (not sure why they couldn't have just invited me before, though it might have something to do with my past friendship with SVN and RCS). Either way, to once and for all get those unsightly TODOs out from under my immediate purview while keeping other changes I still needed in the index I found `git stash save --patch --no-keep-index "TODO Tracking"`. "save" instructs git stash to save a new stash, "--patch" tosses it into an interactive mode similar to the one described above for add, "--no-keep-index" instructs stash not to keep the changes in the working tree that are added to the created stash, and the "TODO Tracking" is just a message to make it easy for a human to understand what the stash contains (I made this one up for my specific immediate purpose). This leaves my working tree and index clean for me to do more pressing work and to know that when I have the time/need to restore those past TODOs I can, so that they may be worked on as well. Note that I've not really used this technique much (read: I've just done it now for the first time) so we'll see if it really is that useful, but the interactive patching I've used and it is definitely worth it.

As a further side bar I was discussing multiple commit indexes in a Git repo with someone in the #yui channel, and as soon as I found the above it occurred to me that using multiple stashes where you pop them could work in effect the same way, though I don't know if there is a way to add patches to an already created stash. That might make a neat feature to investigate and/or request from the Git core.

Just so you aren't too concerned, there is still a place in my heart for `git add` and `git status` even if I don't see them as frequently as I once did.

ActiveProduct -- Just the Spree Products


I wanted to see how difficult it would be to cut out the part of Spree that makes up the product data model and turn it into a self sufficient Rails 3 engine. I followed the tutorial here to get started with a basic engine plugin. Since it sounded good and nobody else seems to be using it, I decided to call this endeavor ActiveProduct.

Which Bits?

The next step was to decide which parts I needed to get. Minimally, I need the models and migrations that support them. If that works out, then I can decide what to do about the controllers and views later.

That said, the lines between what is needed to create a product and the rest of the system are not always so clear cut. You have to cut somewhere, though, so I cut like this:

  • Image
  • InventoryUnit
  • OptionType
  • OptionValue
  • ProductGroup
  • ProductOptionType
  • ProductProperty
  • Product
  • ProductScope
  • Property
  • Prototype
  • Variant


Each of these has a model file in spree/core/app/models, which I just copied over to the app/models directory of my engine.


It'd be convenient if I could have just carved the appropriate parts out of the schema.rb file for the migration. But said file does not appear to be in evidence. Building a spree instance, and trying to coerce one out of it just seems too annoying, so I did something else.

I started from the first migration, removed all of the table definitions that didn't interest me and manually applied all of the migrations in the migration file to the remaining definitions. By manually applied, I mean I went through each migration file one at a time and made the specified change to the original set of definitions. There are, of course, all kinds of reasons why this is a terrible idea. For a reasonably small set of tables with a simple set of relations, the trade-off isn't too bad.

Migration Generator

With a single migration in hand, I followed the tutorial at here as a guide to create a generator for it in the engine. With the migration set up as a generator, I went to my sandbox rails app and ran the migration by doing the following:

$ rails g active_product create_migration_file
$ rake db:migrate

Did it Work?

At this point, I had some the tables in the database and the model files in place it was time to see if things worked.

$ rails console
rb(main):001:0> p =

...and I got a big wall of error messages. So I could not even instantiate the class, much less start using it. Well, I kind of expected that.

Resolving Dependencies

Missing Constants

Following the error messages started with an unresolved dependency on delegate_belongs_to. A little sleuthing lead me into the Spree lib directory were a copy of this plugin lives. Some further knocking around the interwebs lead me to this project which appears to be the canonical version of the plugin. Since I was trying to create a stand alone module, I wanted set this up as an external dependency (which I will defer figuring out how to enforce in the engine until later).

As an aside, (the newer version of) delegate_belongs_to has an issue with and API change in ActiveRecord for Rails 3. A version that will at least load with ActiveProduct can be found here.

The active_product engine currently assumes that delegates_belongs_to is available in the project that it is installed in. I set it up as a normal plugin in the vendor/plugins directory.

Circular Dependencies

With that out of the way, the next error seemed to be about Variants and Scope. In spree/core/lib/scopes there are a couple of files that interact with the Product and Variant classes in a somewhat messy way. In order to make use of the scopes that are defined there, I needed to pull them in. Ultimately, it probably makes sense to include the changes directly into the affected class files. Since I was still experimenting here, I just moved them to approximately the same place in the engine.

It turns out that the dependency relationships between Product, Variant, and the Scopes module are pretty complex. I spent a fair amount of time trying to sort them out manually, but was unable to find any reasonable way. Eventually, I decided to give up and fall back on the auto loader to handle it for me.

The auto-loader in Rails seems to cover a multitude of sins. A well behaved independent module will need to remove all of these interdependencies. There are a couple of significant problems with leaving them be:

  • Other potential users of such a module will not necessarily make sure that everything is auto-loaded, so a this module would just be broken. Sinatra comes to mind.
  • Interdependencies increase complexity. While complexity is not inherently bad, it can be a source of bugs and errors, so it ought to be avoided when possible.

To start with, I moved the scope.rb file and the scope directory to active_product/lib/auto. And I added the following to the ActiveProduct module definition in active_product/lib/active_product/engine.rb:

module ActiveProduct
  class Engine < Rails::Engine
    config.autoload_paths += %W(#{config.root}/lib/auto)

There are two interesting things about this to note:

  • The engine lib directory is not auto-loaded in the same way that app/models, app/controllers, etc are. There apparently is no convention for loading a lib directory. I picked 'lib/auto', but there are not any constraints on what can be added.
  • The engine has its own config variable that is loaded and honored as part of the Rails app config.

Now What?

Now when I tried to instantiate the class I found that it called a couple of methods that I'm not quite sure what I am going to do with yet. These are:


make_permalink is provided by an interestingly named Railslove module in the spree_core lib and seems harmless enough. For now, I commented the call out of the Product lib.

search_methods is provided by the MetaSearch plugin which is a Spree dependency. Search is neat, but I'll sort it out later. Again, I commented it out and will deal with it if it causes problems.

Where are we now

I can now instantiate a new product object from the console. That seems to somewhat validate the effort to isolate the module. You may be tempted to ask if you could use that product instance for anything; saving a copy to the data store, for instance. Here's a hint: circular dependencies.

The code that I've worked on up till now can be seen here.

Referral Tracking with Google Analytics

It's pretty easy to use Google Analytics to examine referral traffic, including using custom referral tracking codes. Here's how:

Once you have referrers or affiliates that plan to link to your site, you can ask that those affiliates append a unique tracking ID to the end of the URL. For example, I'll use the following referral ID's to track metrics from Milton and Roger's websites to End Point's site.


After you've seen some traffic build up from those affiliates, you must create two Custom Advanced Segments in Google Analytics:

Follow the link to create an Advanced Segment. The New Advanced Segment page.

Once you've landed on the New Advanced Segment page, you create a custom segment by dragging "Landing Page" from the "Content" tab to define the criteria, and set it to contains your unique referral identifier.

Roger's Referral Traffic Milton's Referral Traffic

That's it! You now have custom Advanced Segments defined to track referral or affiliate data. You can select the Advanced Segments from any metrics page:

All traffic compared to referral traffic from Milton and Roger's sites.

Traffic from Milton's website only.

You can also examine conversion driven from the affiliate. For example, how does conversion driven by one affiliate compare to the entire site's conversion? On our site, conversion is measured by contact form submission — but on ecommerce sites, you can measure conversion in the form of purchases relative to different affiliates.

Roger's Referral conversion versus conversion of the entire site. Roger's doing pretty good!

One potential disadvantage to this method for affiliate tracking is that you are creating duplicate content in Google by introducing additional URLs. You may want to use the rel="canonical" tag on the homepage to minimize duplicate content in search engine indexes. A very similar alternative to this method to bypass adding a referral ID would be to create custom segments defined by Source and Referral Path, however, the method described in this article is valuable for sites that may have a redirect between the referral site and the landing URL ( links to redirects to retains the referral information).

Google Analytics is a great tool that allows you to measure analytics such as the ones shown in this post. It's fairly standard for our all of our clients to request Google Analytics installation. Google announced last week that a new Google Analytics platform will be rolled out soon, which includes a feature update to multiple segments that will allow us to examine traffic from multiple affiliates without showing "All Visits".

Note that the data presented in this article is fictitious.
I don't think Milton and Roger (shown above) will be linking to End Point's site any time soon!

Liquid Galaxy Project in GSoC 2011!

We're very happy to announce The Liquid Galaxy Project has been accepted as a Mentoring Organization for the Google Summer of Code (GSoC) Program!

Talented students interested in Liquid Galaxy, panoramic systems, and the opportunity to be paid a stipend of $5000 by Google to develop and enhance open source code should go to the project's Ideas Page and make a proposal to the project for review:

We look forward to student submissions on these exciting projects that will make Liquid Galaxy an even more impressive system, including:

  • Controlling Liquid Galaxy with novel input devices like a Wiimote, Android phone, or Kinect
  • Patching other open-source software enabling multi-system panoramic display
  • Production of panoramic video, audio, and photography
  • Your own innovative idea!

Lazy Image Loading in JavaScript with jQuery

This week I had a duh moment while working on a little jQuery-driven interface for a side hobby.

I've occasionally used or attempted to do image preloading in JavaScript, jQuery, and YUI. Preloading images happens after a page is loaded: follow-up image requests are made for images that may be needed, such as hover images, larger sizes of thumbnail images on the page, or images below the fold that do not need to load at page request time. Adobe Fireworks spits out this code for preloading images, which is a bit gross because the JavaScript is typically inline and it doesn't take advantage of common JavaScript libraries. But this is probably acceptable for standalone HTML files that get moved between various locations during design iterations.

<body onload="MM_preloadImages('/images/some_image.png','/images/some_other_image.png')">

I found many examples of preloading images with jQuery that look something like this:

jQuery.preloadImages = function()
  for(var i = 0; i<arguments.length; i++)
    jQuery("<img>").attr("src", arguments[i]);

I implemented this method, but in my code the preloading was happening asynchronously and I needed to find something that would execute some other behavior after the image was loaded. Before I found the solution I wanted, I tried using jQuery's get method and tested jQuery's ready method, but neither was suitable for the desired behavior. I came across jQuery's load event, which binds an event handler to the "load" JavaScript event and can be used on images. So, I came up with the following bit of code to lazily load images:

 var img = $('<img>')
  .attr('src', some_image_source);
 if($(img).height() > 0) {
  // do something awesome
 } else {
  var loader = $('<img>')
   .attr('src', 'images/ajax_loader.gif')
  $(img).load(function() {
   // do something awesome (the same awesome thing as above)

So my bit of code creates a new image element. If the image's height is greater than 0 because it's already been requested, it does some awesome method. If its height is 0, it displays an ajax loader image, then does the same awesome method and removes the ajax loader image. See the screenshot below to get an idea of how this is used.

The image on the left has been loaded and resized to fit its frame. The image to be displayed on the right is loading.

Interestingly enough, the code above works in IE 8, Chrome, and Firefox, but it appears that IE handles image loading a bit differently than the other two browsers — I haven't investigated this further. This lazy-image loading reduces unnecessary requests made to pre-load images that may or may not be accessed by users and the added touch of an ajax loader image communicates to the user that the image is loading. I haven't added a response for image load failure, which might be important, but for now the code makes the assumption that the images exist.

I found a few jQuery plugins for lazy image loading, but I think they might overkill in this situation. One of the jQuery plugins I found is based on YUI's ImageLoader, a utility that similarly delays loading of images.

Product Personalization for Ecommerce on Interchange with Scene7

One of the more challenging yet rewarding projects Richard and I have worked on over the past year has been an ecommerce product personalization project with Paper Source. I haven't blogged about it much, but wanted to write about the technical challenges of the project in addition to shamelessly self-promote (a bit).

Personalize this and many other products at Paper Source.

Paper Source runs on Interchange and relies heavily on JavaScript and jQuery on the customer-facing side of the site. The "personalization" project allows you to personalize Paper Source products like wedding invitations, holiday cards, stationery, and business cards and displays the dynamic product images with personalized user data on the fly using Adobe's Scene7. The image requests are made to an external location, so our application does not need to run Java to render these dynamic personalized product images.

Technical Challenge #1: Complex Data Model

To say the data model is complex is a bit of an understatement. Here's a "blurry" vision of the data model for tables driving this project. The tables from this project have begun to exceed the number of Interchange core tables.

A snapshot of the data model driving the personalization project functionality.

To give you an idea of what business needs the data model attempts to meet, here are just a few snapshots and corresponding explanations:

At the highest level, there are individual products that can be personalized.
Each product may or may not have different color options, or what we refer to as colorways. The card shown here has several options: gravel, moss, peacock, night, chocolate, and black. Clicking on each colorway here will update the image on the fly.
In addition to colorways, each product will have corresponding paper types and print methods. For example, each product may be printed on white or cream paper and each product may have a "digital printing" option or a letterpress option. Colorways shown above apply differently to digital printing and letterpress options. For example, letterpress colorways are typically a subset of digital printing colorways.
Each card has a set of input fields with corresponding fonts, sizes, and ink colors. The input fields can be input fields or text boxes. All cards have their own specific set of data to control the input fields – one card may have 4 sections with 1 text field in each section while another card may have 6 sections with 1 text field in some sections and 2 text fields in other sections. In most cases, inks are limited between card colorways. For example, black ink is only offered on the black colorway card, and blue ink is only offered on the blue colorway card.
Each card also has a set of related items assigned to it. When users toggle between card colorways, the related item thumbnails update to match the detail option. This allows users to see an entire suite of matching cards: a pink wedding invite, RSVP, thank you, and stationery or a blue business card, matching letterhead stationery, and writing paper shown here.
In addition to offering the parent product, envelopes are often tied to the products. In most cases, there are default envelope colors tied to products. For example, if a user selected a blue colorway product, the blue envelope would show as the default on the envelopes page.
In addition to managing personalization of the parent products, the functionality also meets the business needs to offer customization of return address printing on envelopes tied to products. For example, here is a personalized return address printing tied to my wedding invitation.

Technical Challenge #2: Third Party Integration with Limited Documentation

There are always complexities that come up when implementing third-party service in a web application. In the case of this project, there is a fairly complex structure for image requests made to Scene7. In the case of dynamic invitations, cards, and stationery, examples of image requests include: anchor=-50,-50 &size=2000,2000&layer=1&src=is{PaperSource/A7_env_back_closed_sfwhite}& anchor=2900,-395 &rotate=-90&op_usm=1,1,8,0&resMode=sharp&qlt=95,1&pos=100,50&size=1800,1800& layer=3&src=fxg {PaperSource/W136-122208301?&imageres=150}&anchor=0,0&op_usm=1,1,1,0&pos=500,315& size=1732,1732 &effect=0&resMode=sharp&fmt=jpg anchor=-50, 50&size=2000,2000&layer=1&src=is{PaperSource/4bar_env_back_closed_fig}&anchor=0,0& pos=115,375 &size=1800,1800&layer=2&src=fxg{PaperSource/4barV_white_background_key}&anchor=0,0& rotate=-90& pos=250,1757&size=1733,1733&layer=3& src=fxg{PaperSource/ST57-2011579203301?&$color_fig=true&$color_black=false&$ color_chartreuse =false&$color_espresso=false&$color_moss=false&$color_peacock=false &$ink_0=780032 &$ink_2=780032&$ink_1=780032&imageres=150}&anchor=0,0 &op_usm=2,1,1, 0&pos=255,513&size=1721,1721&resMode=sharp&effect=0&fmt=jpg,- 50&size=2000,2000&layer=1&src=is{PaperSource/4bar_env_back_closed_night}&anchor=0,0&pos =115,375&&size=1800,1800&layer=2&src=fxg{PaperSource/4bar_white_sm} &anchor=0,0&rotate=-90&pos=250,1757&size=1733,1733&layer=3&src=fxg{PaperSource/W139-201203301?} &anchor=0,0&op_usm=2,1,1,0&pos=255,513&size=1721,1721&resMode=sharp&effect=0&fmt=jpg

Each argument is significant to the dynamic image; background envelope color, card colorway, ink color, card positioning, envelope positioning, image quality, image format, and paper color are just a few of the factors controlled by the image arguments. And part of the challenge was dealing with the lack of documentation to build the logic to render the dynamic images.


As I mentioned above, this has been a challenging and rewarding project. Paper Source has sold personalizable products for a couple of years now. They continue to move their old personalized products to use this new functionality including many stationery products moved yesterday. Below are several examples of Paper Source products that I created with the new personalized functionality.

Google 2-factor authentication

About a month ago, Google made available to all users their new 2-factor authentication, which they call 2-step authentication. In addition to the customary username and password, this optional new feature requires that you enter a 6-digit number that changes every 30 seconds, generated by the Google Authenticator app on your Android, BlackBerry, or iPhone. The app looks like this:

This was straightforward to set up and has worked well for me in the past month. It would thwart bad guys who intercept your password in most cases. It would also lock you out of your Google account if you lose your phone and your emergency scratch codes. :)

I was happy to see this is all based on some open standards under development, and Google has made this even more useful by releasing an open source PAM module called google-authenticator. With that PAM module, a Linux system administrator can require a Google Authenticator code in addition to password authentication for login.

I tried this out on a CentOS x86_64 system and found it fairly straightforward to set up. I ran into two minor gotchas which were reported by others as well:

  • The Makefile calls sudo directly, which it shouldn't -- I was running a minimal installation without sudo installed, and in any case the administrator should decide when to become root and how. (Issue 17)
  • The Makefile installs into /lib/security instead of /lib64/security. This has since been fixed. (Issue 6)

After build and installation it was easy to generate a secret key for each individual user account. The key is stored in the user's home directory, which Issue 4 notes has some downsides, and the resolution to Issue 24 provides a partial workaround for this. The home directory seems like a nice default to me.

In the end, I found the google-authenticator module isn't suitable for my regular ssh use due to no fault of its developers. I normally use SSH public key authentication and that's handled by OpenSSH natively, separately from PAM, and thus bypasses 2-factor authentication entirely. So I can have 2-factor authentication with password authentication, but not with public key authentication, which is really what I want.

Does anyone know of a way to configure things that way? I wasn't able to find a way, so I'm not planning on using this for shell logins right now. But it's still a nice option for Google logins right now, and I expect the google-authenticator project will advance over time.

Presenting at PgEast

I'm excited to be going to the upcoming PostgreSQL East Conference. This will be both my first PostgreSQL conference to attend, as well as my first time presenting. I will be giving a talk on Bucardo entitled Bucardo: More than Just Multi-Master. I'll be in NYC for the conference, so I'll get to work for a couple days at our company's main office as well.

I look forward to learning more about PostgreSQL, putting some names and faces with some IRC nicks, and socializing with others in the PostgreSQL community; after all, Postgres' community is one of its strongest assets.

Hope to see you there!

Liquid Galaxy in The New York Times

We got a charge at End Point seeing an article on the front page of The New York Times website this last Sunday about the @america cultural center exhibit the US State Department opened in Indonesia this last December. The article features the Liquid Galaxy that End Point installed on Google's behalf there. There was no mention of End Point, but heck, we're only a few clicks away from the Liquid Galaxy link in the article!:

The article also appeared on page A6 of the publication's New York print edition, leading off the International Section of the paper with a big 9" by 6" color photo with the Liquid Galaxy in the background--the same photo that appears on the website.

Kiel made the trip from New York to Jakarta to do the installation. After he installed the Liquid Galaxy in the exhibition space he moved it with local help to a ball room nearby where there was a State Department gala for the opening of @america. Kiel got dressed up in black (not so unusual for him) and staffed the booth to make sure there were no glitches for the event, then he oversaw the move back to home base.

This Liquid Galaxy was a bit different from past ones for us since Jakarta uses 230 volt power, so there were a few equipment differences from what we'd used before. Also, it ended up being a seven section system rather than eight sections because space at the exhibition is tight and setting up just seven sections provided the clearance needed for wheelchairs in the area around the Liquid Galaxy to conform with US ADA requirements.

On an international project like this it wasn't surprising that there were extra bureaucratic and logistical issues to be worked through and many odd-hours phone calls and emails. Ultimately the project had to be delivered on a very tight timeline. It was certainly a pleasure working with Google and with the embassy and its contractors in Jakarta on the project. There were a lot of knowledgeable and committed people who stepped up to help get the Liquid Galaxy set up in time for the @america opening.

Photograph from

jQuery and Long-Running Web App Processes: A Case Study

I was recently approached by a client's system administrator with a small but interesting training/development project. The sys-admin, named Rod, had built a simple intranet web application that used a few PHP pages, running on an Amazon EC2 instance, to accept some user input and kick off multiple long-running server side QA deployment processes. He wanted to use Ajax to start the process as well as incrementally display its output, line-by-line, on the same web page. However, waiting for the entire process to finish to display its output was a poor user experience, and he wasn't able to get an Ajax call to return any output incrementally over the lifetime of the request.

Rod asked me to help him get his web app working and train him on what I did to get it working. He admitted that this project was a good excuse for him to learn a bit of jQuery (a good example of keeping your tools sharp) even if it wasn't necessarily the best solution in this case. I have always enjoy training others, so we fired up Skype, got into an IRC channel, and dove right in.

First, I started with the javascript development basics:
  1. Install the Firebug add-on for Firefox
  2. Use Firebug's Console tab to watch for javascript errors and warnings
  3. Use Firebug's Net tab to monitor what ajax calls your app is making, and what responses they are getting from the server
  4. Replace all those hateful debug alert() calls with console.log() calls, especially for ajax work

A special note about console.log(): Some browsers (including any Firefox that doesn't have Firebug installed) do not natively supply a console object. To work around this, we defined the following console.log stub at the very top of our single javascript file:

if (typeof console === 'undefined') {
    console = { log: function() {} };

Rod's javascript was a mix of old-school javascript that he had remembered from years ago, and new-school jQuery he had recently pulled from various tutorials. His basic design was this: When the user clicked the "Deploy" button, it should kick off two separate Ajax requests; "Request A" would initiate a POST request to deploy.php. Deploy.php made a number of system calls to call slow-running external scripts, and logged their output to a temporary logfile on the server. "Request B" would make a GET request to getoutput.php every 2 seconds (which simply displayed the output of said logfile) and display its output in a scrollable div element on the page.

Hearing Rod describe it to me, I wondered if he might be headed down the wrong path with his design. But, he already had put time into getting the server-side code working and did not want to change direction at this point. Discussing it with him further, it became clear that he did not want to re-write the server-side code and that we could in fact make his current design produce working code with teachable concepts along the way.

To start, Rod told me that his "ajax POST request (Request A) wasn't firing." As the Russian proverb says, "Trust, but verify." So, we opened Firebug's Net tab, clicked the web app's Deploy button (actually its only button - Steve Jobs look out) and saw that the ajax request was in fact firing. However, it was not getting back a successful HTTP 200 status code and as such, was not getting handled by jQuery as Rod expected. Expanding the ajax request in the Net tab let us see exactly what name/value data that was getting POSTed. We spotted a typo in one of his form input names and fixed it. Now Request A was clearly firing, POSTing the correct data to the correct URL, and getting recognized as successful by jQuery. (More on this in a bit.)

Rod's code was making Request A from within a jQuery event handler defined for his form's Deploy button. But, he was making Request B via an HTML onClick attribute within that same HTML tag. He was getting all sorts of strange results with that setup based on which request was returning first, if Request B's function call was correctly returning false to prevent the entire form from getting POSTed to itself, etc. Consolidating logic and control into event handlers that are defined in one place is preferable to peppering a web page with HTML onClick, onChange, etc. javascript calls. So, we refactored his original jQuery event handler and onClick javascript call into the following code snippet:

//global variable for display_output() interval ID
var poll_loop;

$(".deploy_button").click(function() {
        beforeSend: function() {
            $('#statusbox').html("Running deployment...");
        type: "POST",
        url: "deploy.php",
        data: build_payload(),
        success: function() {
            console.log('Qa-run OK');
            //previously called via an onClick
            poll_loop = setInterval(function() {
                display_output("#statusbox", 'getoutput.php');
            }, 2000);
        error: function() {
            console.log('Qa-run failed.');

That $.ajax(...) call is our jQuery code that initiates the Request A ajax call and defines anonymous functions to call based on the HTTP status code of Request A. If Request A returns an HTTP 200 status code from the server, the anonymous function defined for the 'success:' key will be executed. If any other HTTP code is returned, the anonymous function defined for the (optional) 'error:' key is executed. We refactored the onClick's call to display_output() into the 'success:' function above. Now, it only gets called if Request A is successful, which is the only time we'd want it to execute.

The body of the 'success:' anonymous function calls setInterval() to create an asynchronous (in that it does not block other javascript execution) javascript loop that calls display_output() every 2 seconds. The setInterval() function returns an "interval ID" that is essentially a reference to that interval. We save that interval ID to the 'poll_loop' variable that we intentionally make global (by declaring it with 'var' outside any containing block) so we can cancel the interval later.

Here is the display_output() function that makes Request B and gets called every 2 seconds:

function display_output(elementSelector, sourceUrl) {
    var html = $(elementSelector).html();
    if ("EODEPLOY") > 0) {
        alert('Deployment Finished.');
    if ("DEPLOY_ERROR") > 0) {
        alert('Deployment FAILED.');

That .load() method is jQuery shorthand for making an ajax GET request and assigning the returned HTML/text into the element object on which it's called. Because the display_output() function is responsible for terminating the interval that calls it, we need to define our end cases. If either "EODEPLOY" (for a successful deployment) or "DEPLOY_ERROR" (for a partially failed deployment) appear as strings within the resulting HTML, we call clearInterval() to stop the infinite loop, and alert the user accordingly. If neither of our end cases are encountered, display_output() will be executed again in 2 seconds.

As it stands, the poll_loop interval will run indefinitely if the server-side code somehow fails to ever return the two strings we're looking for. I left that end case as an exercise up to Rod, but suggested he add a global variable that could be used to measure the number of display_output() calls or the elapsed time since the Deploy button was clicked, and end the loop once an upper limit was hit.

Other suggested features that Rod and I discussed but I've omitted from this article include:
  1. Client-side input validation using javascript regular expressions
  2. Matching server-side input validation because sometimes the call is coming from inside the house
  3. Adding a unique identifier that is passed as part of both Request A and Request B to better identify requests and to prevent temp file naming conflicts from multiple concurrent users.
  4. Packaging display_output()'s "Deployment FAILED" output and providing a button to easily send the output to Rod's team

I'm sure there are a ton of other possible solutions for a project like this. For example, I know that Jon and Sonny developed a more advanced polling solution for another client,, using YUI's AsyncQueue. Without getting to deeply into the server-side design, I'm curious to hear how other people might approach this problem. What do you think?

A Ruby on Rails Tag Cloud Tutorial with Spree

A tag cloud from a recent End Point blog post.

Tag clouds have become a fairly popular way to present data on the web. One of our Spree clients recently asked End Point to develop a tag cloud reporting user-submitted search terms in his Spree application. The steps described in this article can be applied to a generic Rails application with a few adjustments.

Step 1: Determine Organization

If you are running this as an extension on Spree pre-Rails 3.0 versions, you'll create an extension to house the custom code. If you are running this as part of a Rails 3.0 application or Spree Rails 3.0 versions, you'll want to consider creating a custom gem to house the custom code. In my case, I'm writing a Spree extension for an application running on Spree 0.11, so I create an extension with the command script/generate extension SearchTag.

Step 2: Data Model & Migration

First, the desired data model for the tag cloud data should be defined. Here's what mine will look like in this tutorial:

Next, a model and migration must be created to introduce the class, table and it's fields. In Spree, I run script/generate extension_model SearchTag SearchRecord and update the migration file to contain the following:

class CreateSearchRecords < ActiveRecord::Migration
  def self.up
    create_table :search_records do |t|
      t.string :term
      t.integer :count, :null => false, :default => 0

  def self.down
    drop_table :search_records

I also add a filter method to my model to be used later:

class SearchRecord < ActiveRecord::Base
  def self.filter(term)
    term.gsub(/\+/, ' ')
      .gsub(/\s+/, ' ')
      .gsub(/^\s+/, '')
      .gsub(/\s+$/, '')
      .gsub(/[^0-9a-z\s-]/, '')

Step 3: Populating the Data

After the migration has been applied, I'll need to update the code to populate the data. I'm going to add an after filter on every user search. In the case of using Spree, I update search_tag_extension.rb to contain the following:

def activate
  Spree::ProductsController.send(:include, Spree::SearchTagCloud::ProductsController)

And my custom module contains the following:

module Spree::SearchTagCloud::ProductsController
  def self.included(controller)
    controller.class_eval do
      controller.append_after_filter :record_search, :only => :index

  def record_search
    if params[:keywords]
      term = SearchRecord.filter(params[:keywords])
      return if term == ''
      record = SearchRecord.find_or_initialize_by_term(term)
      record.update_attribute(:count, record.count+1)

The module appends an after filter to the products#index action. The after filter method cleans the search term and creates a record or increments the existing record's count. If this is added directly into an existing Rails application, this bit of functionality may be added directly into one or more existing controller methods to record the search term.

Step 4: Reporting the Data

To present the data, I create a controller with script/generate extension_controller SearchTag Admin::SearchTagClouds first. I update config/routes.rb with a new action to reference the new controller:

map.namespace :admin do |admin|
  admin.resources :search_tag_clouds, :only => [:index]

And I update my controller to calculate the search tag cloud data, shown below. The index method method retrieves all of the search records, sorts, and grabs the the top x results, where x is some configuration defined by the administrator. The method determines the linear solution for scaling the search_record.count to font sizes ranging from 8 pixels to 25 pixels. This order of terms is randomized (.shuffle) and linear equation applied. This linear shift can be applied to different types of data. For example, if a tag cloud is to show products with a certain tag, the totals per tag must be calculated and scaled linearly.

class Admin::SearchTagCloudsController < Admin::BaseController
  def index
    search_records = SearchRecord.all
      .collect { |r| [r.count, r.term] }
    max = search_records.empty? ? 1 : search_records.first.first

    # solution is: a*x_factor - y_shift = font size 
    # max font size is 25, min is 8
    x_factor = (Spree::SearchTagCloud::Config[:max] - 
      Spree::SearchTagCloud::Config[:min]) / max.to_f
    y_shift = max.to_f*x_factor - Spree::SearchTagCloud::Config[:max]

    @results = search_records.shuffle.inject([]) do |a, b|
      a.push([b[0].to_f*x_factor - y_shift, b[1]])

The data is presented to the user in the following view:

<h3>Tag Cloud:</h3>
<div id="tag_cloud">
<% @results.each do |b| %>
<span style="font-size:<%= b[0] %>px;"><%= b[1] %></span>
<% end -%>

Step 5: Adding Flexibility

In this project, I added configuration variables for the total number of terms displayed, and maximum and minimum font size using Spree's preference architecture. In a generic Rails application, this may be a nice bit of functionality to include with the preferred configuration architecture.

Example tag cloud from the extension. Additional modifications can be applied to change the overall styling or color of individual search terms.


These steps are pretty common for introducing new functionality into an existing application: data migration and model, manipulation on existing controllers, and presentation of results with a new or existing controller and view. Following MVC convention in Rails keeps the code organized and methods simple. In the case of Spree 0.11, this functionality has been packaged into a single extension that is abstracted from the Spree core. The code can be reviewed here, with a few minor differences.

Ecommerce on Sinatra: A Shopping Cart Story

In a couple recent articles, I wrote about the first steps for developing an ecommerce site in Ruby on Sinatra. Or, here's a visual summary of the articles:

In the first article, a single table data model existed with a couple of Sinatra methods defined. In the second article, users and products were introduced to the data model. The Sinatra app still has minimal customer-facing routes (get "/", post "/") defined, but also introduces backend admin management to view orders and manage products.

In this article, I introduce a shopping cart. With this change, I modify the data model to tie in orderlines, where orderlines has a belongs_to relationship with orders and products. I'll make the assumption that for now, a cart is a set of items and their corresponding quantities.

The new data model with tables orderlines, products, orders, and users.

An Important Tangent

First, let's discuss cart storage options, which is an important topic for an ecommerce system. Several cart storage methods are described below:

  • Conventional SQL database models: Conventional SQL (MySQL, PostgreSQL, etc.) tables can be set up to store shopping cart items, quantities, and additional information. This can be nice if designed so that cart information matches the existing data model (e.g. orders & orderlines), so data can be clean and easy to work with using object-relational mappers or direct SQL. For example this makes it easy for administrative tools to report on abandoned carts. One disadvantage of this kind of storage is that it increases database I/O at the already limited chokepoint of a master database. Another disadvantage is that you need to eventually clean up data as users abandon their carts, or deal with tables that grow large much more quickly than the orders tables. For example, Spree, an open source Ruby on Rails ecommerce platform that End Point works with frequently, stores carts in the database (order & line_items table), and for one of our clients, approximately 66% of the order data is from abandoned carts.
  • Serialized object store: Here cart items, quantities, and additional information is stored in a session object and serialized to disk in files, key/value stores like memcached, in NoSQL databases (some of which can scale horizontally fairly nicely), or even as a BLOB in an SQL database. Sessions are assigned a random ID string and linked to users either by a cookie or in the URL (note: tracking session IDs in URLs has become less common due to its interference with caching and search engine indexing). This type of storage is very convenient for developers and tends to perform fairly well. However, if there is heavy server load, saving the session at the end of every request can introduce a bottleneck, especially when multiple application servers are using a single shared session data store. Also, the developer convenience can turn into a mess if the session becomes a dumping ground for ephemeral data that becomes permanent, or which causes pages to be un-RESTful as they're not based solely on the URL. Interchange, an open source Perl ecommerce framework that End Point works with often, uses this method of cart storage by default.
  • Cookie cart storage: Cart items, quantities, and additional information can be stored directly in cookies in the user's browser. Cookies don't add any server storage overhead, but do add network overhead to each request, and have limited storage space. Typically, you'd only want to store information in cookies that is fine in the untrusted environment of users' browsers, such as SKU and quantity. You can introduce hashing to protect integrity if you want to include custom pricing, or reversible encryption of the data to store sensitive data, such as personalized product options or personal information.
  • JavaScript stored carts: An uncommon (but possible) cart-storage method is to store cart items, quantities, and additional information in a JavaScript data structure in the browser's memory. This does not introduce any server-side load as storage and processing occurs on the client side. This could be done where front-end view manipulation occurs entirely by web service requests and JavaScript DOM manipulation: A user comes to the web store, products are rendered and listed with an AJAX request to the web service and a user manipulates the cart. All of this happens while the user never leaves the page. The cart object continues to reflect the user's cart and is only sent to the server when the user is ready to finalize their order, along with billing, shipping, and other order information. This type of ecommerce solution isn't SEO-friendly by default because it does not readily display all content, and closing the browser window for the store could lose the cart. But it might be suitable in some situations, and using new HTML 5 LocalStorage would add permanence and make this a more palatable option. End Point recently built a web service based YUI JavaScript application for Locate Express, but ecommerce is not a component of their system.

Back to the App

For this demo, I chose to go with Cookie-based cart storage for several reasons. At the lowest level, I define a few different structures for the cart:

  • cookie_format: e.g. "2:1;18:2;". semi-colon delimited items, where product id and quantity are separated by colon. This is the simplest cart format stored to the cookie.
  • hash format: e.g: { 2: 1, 18: 2 }. keys are product ids, quantities are the corresponding hash values. This format makes the cart items easy to manipulate (update, remove, add) but does not require database lookup (potentially saving database bandwidth).
  • object format: e.g.
    >> @cart ="2:1;18:2")
    >> @cart.items.inspect
    { :product => #Product with id of 2,
      :quantity => 1 },
    { :product => #Product with id of 18,
      :quantity => 2 }
    >> = # sum of (item_cost*quantity)
    The cart object is created whenever the cart and it's items are displayed, such as on the actual cart page. Cart construction requires read requests from the database.

Next up, I define several Cart class methods for interacting with the cart:

def self.to_hash(cookie)
  cookie ||= ''
  cookie.split(';').inject({}) do |hash, item|
    hash[item.split(':')[0]] = (item.split(':')[1]).to_i
class method to convert cart from cookie format to hash
def self.to_string(cart)
  cookie = ''
  cart.each do |k, v|
    cookie += "#{k.to_s}:#{v.to_s};" if v.to_i > 0
class method to convert cart from hash format to cookie format
def self.add(cookie, params)
  cart = to_hash(cookie)
  cart[params[:product_id]] ||= 0
  cart[params[:product_id]] += params[:quantity].to_i

def self.remove(cookie, product_id)
  cart = to_hash(cookie)
  cart[product_id] = 0

def self.update(cookie, params)
  cart = to_hash(cookie)
  cart.each { |k, v| cart[k] = params[:quantity][k].to_i }
class methods for adding, removing, and updating items. each method converts to hash then converts to hash format, performs operation, then returns as cookie format
attr_accessor :items
attr_accessor :total

def initialize(cookie='')
  self.items = []
  cookie ||= ''
  cookie.split(';').each do |item|
    self.items << {
      :product => Product.find(item.split(':')[0]),
      :quantity => (item.split(':')[1]).to_i }
  end = self.items.sum { |item|
instance attributes (items, total) defined here and constructor pulls info from database and calculates the cart total upon initialization

I define some Sinatra methods to work with my cart methods. I also update the order completion action to store orderline information:

app.get '/cart' do
  @cart =["cart"])
  erb :cart,
    :locals => {
      :params => {
        :order => {},
        :credit_card => {}
Build our cart when a get request is made to "/cart". '/cart/add' do
    { :value => Cart.add(request.cookies["cart"], params),
      :path => '/' 
  redirect "/cart"
end '/cart/update' do
    { :value => Cart.update(request.cookies["cart"], params),
       :path => '/' 
  redirect "/cart"
app.get '/cart/remove/:product_id' do |product_id|
    { :value => Cart.remove(request.cookies["cart"], product_id),
      :path => '/'
  redirect "/cart"
The post and get requests to add, update, and remove use the cart class methods. We set the request.cookie with a path of '/' and redirect to /cart.
  cart =["cart"])
  cart.items.each do |item|
    Orderline.create({ :order_id =>,
      :product_id => item[:product].id,
      :price => item[:product].price,
      :quantity => item[:quantity] })
    gateway_response = gateway.authorize(*100, credit_card)
During order processing, orderlines are created and assigned to the current order and the payment gateway authorizes the order total. If a successful transaction goes through, the cart is set to an empty string. If not, the cart cookie is not modified.


From the top: the changes described here introduce the orderlines table, a cart object and methods to manage items in the user's cart and several Sinatra methods for working with the cart object. The homepage is updated to list items and add to cart form fields and the existing order processing method is updated to store data into the orderlines table.

Below are some screenshots from the resulting app with shopping cart functionality: the homepage, cart page, and the empty cart screenshot.

homepage empty cart display shopping cart page

The code described in this article is part of an ongoing Sinatra based ecommerce application available here. The repository has several branches corresponding to the previous articles and potential future articles. I'd like thank Jon for contributing to the section in this article regarding cart storage options.

API gaps: an Android MediaPlayer example

Many programming library APIs come with several levels of functionality, including the low-level but flexible way, and the high-level and simpler but limited way. I recently came across a textbook case of this in Android's Java audio API, in the MediaPlayer class.

We needed to play one of several custom Ogg Vorbis audio files in the Locate Express Android app to alert the user to various situations.

Getting this going initially was fairly straightforward:

In this simplified version of our PlaySound class we pass in the app resource ID of the sound file, and using the MediaPlayer.create() method is about as simple as can be.

We keep a map of playing sound files so that external events can stop all playing sounds at once in a single call.

We set an OnCompletionListener to clean up after ourselves if the sound plays to its end without interruption.

Everything worked fine. Except for a pesky volume problem in real-world use. MediaPlayer uses Android's default audio stream, which seemed to be STREAM_MUSIC. That plays the audio files fine, but has an interesting consequence during the actual playing: You can't turn the volume down or up because the volume control outside of any specific media-playing contexts affects the STREAM_RING volume, not the one we're playing on. Practically speaking, that's a big problem because if the music stream was turned up all the way and the alert goes off at full volume in a public place, you have no way to turn it down! (Not a hypothetical situation, as you may guess ...)

Switching to STREAM_RING would be the obvious and hopefully simple thing to do, but calling the MediaPlayer.setAudioStreamType() method must be done before the MediaPlayer state machine enters the prepared state, which the MediaPlayer.create() does automatically. The convenience is our undoing!

Switching over to the low-level way of doing things turns out to be a bit of a pain because:

  1. There's no interface to pass an Android resource ID to one of the setDataSource() methods. Instead, we have to use a file path, file descriptor, or URI.
  2. The easiest of those options seemed to be a URI, and doing a little research, the format comes to light: android.resource://
  3. We have to handle IOException which wasn't throwable using the higher-level MediaPlayer.create() invocation.

Putting it all together, we end up with:

It's not so hard, but it's not the one-line addition of MediaPlayer.setAudioStreamType() that I expected. This is an example of how the API lacking a MediaPlayer.setDataSource(Context, int) for the resource ID makes a simple change a lot more painful than it really needs to be -- especially since the URI variation could easily be handled behind the scenes by MediaPlayer.

I later took a look at the Android MediaPlayer class source to see how the create() method does its work:

Instead of creating a URI, the authors chose to go the file descriptor route, and they check for exceptions just like I had to. It seems more cumbersome to have to open the file, get the descriptor, and manually call getStartOffset() and getLength() in the call to setDataSource, but perhaps there's some benefit there.

This gap between low-level and high-level interfaces is another small lesson I'll remember both when using and creating APIs.

There's one final unanswered question I had earlier: Was STREAM_MUSIC really the default output stream? Empirically that seemed to be the case, but I didn't see it stated explicitly anywhere in the documentation. To find out for sure we have to delve into the native C++ code that backs MediaPlayer, in libmedia/mediaplayer.cpp, and sure enough, in the constructor the default is set:

mStreamType = AudioSystem::MUSIC;

My experience with Android so far has been that it's well documented, but it's been very nice to be able to read the source and see how the core libraries are implemented when needed.

SSH: piping data in reverse

I found myself ssh'd several hops away and needing to copy output from a script back to localhost. Essentially what I wanted was a way to get the data in question piped backwards from my SSH connection so I could capture it locally. Since I utilize .ssh/config extensively, I could connect to the server in question from localhost with a single ssh command, however bringing the data back the other way would make it a multi-step process of saving a temporary file, copying it to a commonly accessible location which had the permissions/authentication setup or intermediately sshing to each node along the path -- in short it exceeded my laziness threshold. So instead, I did the following:

[me@localhost]$ ssh user@remote nc -l 11235 > output.file  # long, complicated connection hidden behind .ssh/config + ProxyCommand

[me@remotehost]$ perl -ne 'print if /startpat/ .. /endpat/' file/to/be/extracted | nc localhost 11235

I ended up choosing an arbitrary port and ran a remote listen process via ssh to pass on any output directed to the specific remote port and capturing as STDOUT on my local machine. There are a couple reasons I think this setup is nicer when compared to just running ssh user@remote perl -ne ... directly:

  • You can take your time to figure out the exact command invocation you would like to use—i.e., you can twiddle with the local command output, then when you're happy with the output, pipe it back.
  • You avoid extra worries about escaping/quoting issues. Particularly if you're running a complicated pipeline remotely, it's hard to craft the exact remote command you would like ssh to execute without a few missteps, or at least a concerted effort to review/verify. (Anyone who's tried to pass arguments containing whitespace to a remote command will know the pain I'm talking about.)