Welcome to End Point’s blog

Ongoing observations by End Point people

Importing Comments into Disqus using Rails

It seems everything is going to the cloud, even comment systems for blogs. Disqus is a platform for offloading the ever growing feature set users expect from commenting systems. Their website boasts over a million sites using their platform and offers a robust feature set and good performance. But before you can drink the Kool-Aid, you've got to get your data into their system.

If you're using one of the common blog platforms such as WordPress or Blogger, there are fairly direct routes Disqus makes available for automatically importing your existing comment content. For those with an unsupported platform or a hand-rolled blog, you are left with exporting your comments into XML using WordPress's WXR standard.

Disqus leaves a lot up to the exporter, providing only one page in there knowledge base for using what they describe as a Custom XML Import Format. In my experience the import error messages were cryptic and my email support request is still unanswered 5 days later. (Ok, so it was Christmas weekend!)

So let's get into the nitty gritty details. First, the sample code provided in this article is based on Rails 3.0.x, but should work with Rails 3.1.x as well. Rails 2.x would work just as well by modifying the way the Rails environment is booted in the first lines. I chose to create a script to dump the output to standard output which could be piped in to a file for upload. Let's see some of the setup work.

Setting up a Rails script

I choose to place the script in the RAILS_ROOT/scripts directory and named it wxr_export.rb. This would allow me to call the script with the Rails 2.x style syntax (ahh, the nostalgia):

script/wxr_export.rb > comments.xml

This would fire up the full Rails enviornment, execute our Ruby code, and pipe the standard output to a file called comments.xml. Pretty straightforward, but it's not that often Rails developers think about creating these kind of scripts, so it's worth discussing to see the setup mechanics.

#!/usr/bin/env ruby
require File.expand_path('../../config/boot', __FILE__)
require File.expand_path('../../config/environment', __FILE__)

I think the first line is best explained by this excerpt from Ruby Programming:

First, we use the env command in the shebang line to search for the ruby executable in your PATH and execute it. This way, you will not need to change the shebang line on all your Ruby scripts if you move them to a computer with Ruby installed a different directory.

The next two lines are essentially asking the script to boot the correct Rails environment (development, testing, production). It's worth briefly offering an explanation of the syntax of these two somewhat cryptic lines. File#expand_path converts a pathname to an absolute pathname. If passed only the first string, it would use the current working path to evaluate, but since we pass __FILE__ we are asking it to use the current file's path as the starting point.

The config/boot.rb file is well documented in the Rails guides which explains that boot.rb defines the location of your Gemfile, hooks up Bundler, which adds the dependencies of the application (including Rails) to the load path, making them available for the application to load.

The config/enviornment.rb file is also well documented and effectively loads the Rails packages you've specified, such as ActiveModel, ActiveSupport, etc...

Exporting WXR content

Having finally loaded our Rails enviornment in a way we can use it, we are ready to actually build the XML we need. First, let's setup our XML and the gerenal format we'll use to popualate our file:

# script/wxr_export.rb

xml = => STDOUT, :indent => 2)

xml.instruct! :xml, :version=>"1.0", :encoding=>"UTF-8"

xml.rss 'version' => "2.0",
        'xmlns:content' => "",
        'xmlns:dsq' => "",
        'xmlns:dc' => "",
        'xmlns:wp' => "" do do
    Articles.all.each do |article|
      if should_be_exported?(article)
        xml.item do

          #Article XML goes here

   article.comments.each do |comment|
     #Comments XML goes here

   end #article.comments.each
 end   #xml.item
      end     #if should_be_exported?
    end       #Articles.all.each
end           #xml.rss

This is the general form for the WXR format as described by Disqus's knowledge base article. Note that you need to nest the comments inside each specific Article's XML. I found that I needed to filter some of my output so I added a helper function called should_be_exported? which can be defined at the top of the script. This would allow you to exclude Articles without comments, or whatever criteria you might find helpful.

With our basic format in place, let's look at the syntax for exporting the Article fields. Keep in mind that the fields you'll want to pull from in your system will likely be different, but the intention is the same.

Inside the Article XML block

# script/wxr_export.rb

# Inside the Article XML block

xml.title article.title create_url_for(article)

xml.content(:encoded) { |x| x << "" }

xml.dsq(:thread_identifier) { |x| x << }

xml.wp(:post_date_gmt) { |x| x << article.created_at.utc.to_formatted_s(:db) }

xml.wp(:comment_status) { |x| x << "open" } #all comments open

Let's look at each of these fields one by one:

  • xml.title: This is pretty straight forward, just the plain text tile of the blog article.
  • Disqus can use URLs for determining which comments to display on your page, so it asks you to provide a URL associated with this article. I found that for this particular app, it would be easier to write another helper function to generate the URLs then using the Rails routes. If you wish to use the Rails routes (and I suggest you do), then I suggest checking out this excellent post for using routes outside of views.
  • xml.content(:encoded): The purpose of this field is clear, but the syntax is not. Hope this saves you some time and headache!
  • xml.dsq(:thread_identifier): The other way Disqus can identify your article is by a unique identifier. This is strongly recommended over the use of a URL. We'll just use your unique identifier in the database.
  • xml.wp(:post_date_gmt): The thing to keep in mind here is that we need the date in a very particular format. It needs to be in YYYY-MM-DD HH:MM:SS 24-hour format and adjusted to GMT which typically implies UTC. Rails 3 makes this very easy for us, bless their hearts.
  • xml.wp(:comment_status): This app wanted to leave all comments open. You may have different requirements so consider adding a helper function.

Inside the Comment XML block

article.comments.each do |comment|
  xml.wp(:comment) do

    xml.wp(:comment_id) { |x| x << }

    xml.wp(:comment_author) do |x| 
      if comment.user.present? &&
        x <<
 x << ""
    xml.wp(:comment_author_email) do |x| 
      if comment.user.present? &&
        x <<
        x << ""

    xml.wp(:comment_author_url) do |x|
      if comment.user.present? && comment.user.url.present?
        x << comment.user.url
        x << ""

    xml.wp(:comment_author_IP) { |x| x << "" }

    xml.wp(:comment_date_gmt) { |x| x << comment.created_at.utc.to_formatted_s(:db) }

    xml.wp(:comment_content) { |x| x << "" }

    xml.wp(:comment_approved) { |x| x << 1 } #approve all comments

    xml.wp(:comment_parent) { |x| x << 0 }

  end #xml.wp(:comment)
end #article.comments.each

Again, let's inspect this one field at a time:

  • xml.wp(:comment_id): Straightforward, a simple unique identifier for the comment.
  • xml.wp(:comment_author): Because some commentors may not have a user associated with them, I added some extra checks to make sure the author's user and name were present. I'm sure there's a way to shorten the number of lines used, but I was going for readability here. I'm not certain it was necessary to include the blank string, but after some of the trouble I had importing, I wanted to minimize the chance of strange XML syntax issues.
  • xml.wp(:comment_author_email): More of the same safe guards of having empty data.
  • xml.wp(:comment_author_url): More of the same safe guards of having empty data.
  • xml.wp(:comment_author_IP): We were not collecting user IP data, so I put in some bogus data which Disqus did not seem to mind.
  • xml.wp(:comment_date_gmt): See xml.wp(:post_date_gmt) above for comments about date/time format.
  • xml.wp(:comment_content): See xml.content(:encoded) above for comments about encoding content.
  • xml.wp(:comment_approved): Two options here, 0 or 1. Typically you'd want to automatically approve your existing comments, unless of course you wanted to give a moderator a huge backlog of work.
  • xml.wp(:comment_parent): This little field turned out to be the cause of a lot of trouble for me. In the comments on Disqus's XML example, it says parent id (match up with wp:comment_id), so initially, I just put in the comment's ID in this field. This returned the very unhelpful error * url * URL is required to which I still have my unanswered supprot email in to Disqus. By trial error, I found that by just setting the comment_parent to zero, I could successfully upload my comment content. If you are using threaded comments, I suspect this field will be of more importance to you then it was to me. When I hear from Disqus, I will update this article with more information.

Labeling input boxes including passwords

I'm currently working on a new site and one of the design aspects of the site is many of the form fields do not have labels near the input boxes, they utilize labels that are inside the input box and fade away when text is entered. The label is also supposed to reappear if the box is cleared out. Originally I thought this was a pretty easy problem and wrote out some jQuery to do this quickly. The path I went down first was to set the textbox to the value we wanted displayed and then clear it on focus. This worked fine, however I reached a stumbling block when it came to password input boxes. My solution did not work properly because text in a password box is hidden and the label would be hidden as well. Most people would probably understand what went in each box, but I didn't want to risk confusing anyone, so I needed to find a better solution

I did some searching for jQuery and labels for password inputs and turned up several solutions. The first one actually put another text box on top of the password input, but that seemed prone to issue. The solution I decided to ultimately use is called In-Fields Labels, a jQuery plugin by Doug Neiner. In this solution Doug has floating labels that appear over the top of the textbox, and they dim slightly when focus is gained and then disappear completely when typing begins. The plugin does not mess with the value in the input box at all.

It was fairly easy to get up and running. I added the plugin to the page, created some styling for the labels, added label tags with the class of 'overlay' for each input box and called $('label.overlay').inFieldLabels();. This was all that was needed to get us going.

Normal view

Focus in the password box

Typing in the password box

The effect is pretty cool and it provides a good interface for the user as they are reminded up until the time they type in the box what they are supposed to enter.

Converting CentOS 6 to RHEL 6

A few years ago I needed to convert a Red Hat Enterprise Linux (RHEL) 5 development system to CentOS 5, as our customer did not actively use the system any more and no longer wanted to renew the Red Hat Network entitlement for it. Making the conversion was surprisingly straightforward.

This week I needed to make a conversion in the opposite direction: from CentOS 6 to RHEL 6. I didn't find any instructions on doing so, but found a RHEL 6 to CentOS 6 conversion guide with roughly these steps:

yum clean all
mkdir centos
cd centos
rpm --import RPM-GPG-KEY-CentOS-6
rpm -e --nodeps redhat-release-server
rpm -e yum-rhn-plugin rhn-check rhnsd rhn-setup rhn-setup-gnome
rpm -Uhv --force *.rpm
yum upgrade

I then put together a plan to do more or less the opposite of that. The high-level overview of the steps is:

  1. Completely upgrade the current CentOS and reboot to run the latest kernel, if necessary, to make sure you're starting with a solid system.
  2. Install a handful of packages that will be needed by various RHN tools.
  3. Log into the Red Hat Network web interface and search for and download onto the server the most recent version of these packages for RHEL 6 x86_64:
    • redhat-release-server-6Server
    • rhn-check
    • rhn-client-tools
    • rhnlib
    • rhnsd
    • rhn-setup
    • yum
    • yum-metadata-parser
    • yum-rhn-plugin
    • yum-utils
  4. Install the Red Hat GnuPG signing key.
  5. Forcibly remove the package that identifies this system as CentOS.
  6. Forcibly upgrade to the downloaded RHEL and RHN packages.
  7. Register the system with Red Hat Network.
  8. Update any packages that now need it using the new Yum repository.

The exact steps I used today to convert from CentOS 6.1 to RHEL 6.2 (with URL session tokens munged):

yum upgrade
shutdown -r now
yum install dbus-python libxml2-python m2crypto pyOpenSSL python-dmidecode python-ethtool python-gudev usermode
mkdir rhel
cd rhel
wget ''
wget ''
wget ''
wget ''
wget ''
wget ''
wget ''
wget ''
wget ''
wget ''
rpm --import fd431d51.txt
rpm -e --nodeps centos-release
rpm -e centos-release-cr
rpm -Uhv --force *.rpm
rpm -e yum-plugin-fastestmirror
yum clean all
yum upgrade

I'm expecting to use this process a few more times in the near future. It is very useful when working with a hosting provider that does not directly support RHEL, but provides CentOS, so we can get the new servers set up without needing to request a custom operating system installation that may add a day or two to the setup time.

Given the popularity of both RHEL and CentOS, it would be neat for Red Hat to provide a tool that would easily switch, at least "upgrading" from CentOS to RHEL to bring more customers into their fold, if not the other direction!

Rails Request-Based Routing Constraints in Spree

I recently adopted an unreleased ecommerce project running Spree 0.60.0 on Rails 3.0.9. The site used a Rails routing constraint and wildcard DNS to dynamically route subdomains to the “dispatch” action of the organizations_controller. If a request’s subdomain component matched that regular expression, it was routed to the dispatch method. Here's the original route:

match '/' => 'organizations#dispatch', :constraints => { :subdomain => /.+/ }

The business requirement driving this feature was that a User could register an Organization by submitting a form on the site. Once that Organization was marked "approved" by an admin, that Organization would become accessible at their own subdomain - no server configuration required.

For marketing reasons, we decided to switch from subdomains to top-level subdirectories. This meant RESTful routes (e.g. wouldn’t cut it. In order to handle this, I created a routing constraint class called OrgConstraint. This routing constraint class works in tandem with a tweaked version of that original route.

match '*org_url' => 'organizations#show', :constraints =>

The :constraints param takes an instance of a class (not a class name) that responds to a matches? predicate method that returns true or false. If matches? returns true, the request will be routed to that controller#action. Else, that route is treated like any other non-matching route. Here’s the entire OrgConstraint class:

class OrgConstraint
  def matches?(request)
    Organization.valid_url? request.path_parameters[:org_url]

Note how Rails automatically passes the request object to the matches? method. Also note how the relative url of the request is available via the :org_url symbol - the same identifier we used in the route definition. The Organization.valid_url? class method encapsulates the logic of examining a simple cached (via Rails.cache) hash consisting of organization urls as keys and true as their value.

The final step in this process is, of course, the organizations_controller’s show method. It now needs to look for that same :org_url param that the route definition creates, in the standard params hash we all know and love:

def show
  @organization = Organization.find(params[:id]) if params[:id]  
  # from routing constraint
  @organization ||= Organization.find_by_url(params[:org_url]) if params[:org_url]  

I should point out that Rails instantiates exactly one instance of your routing constraint class when it first loads your routes. This means you’ll want to ensure your class’s design will respond appropriately to changes in any underlying data. This is one of the reasons the Organization class caches the {$org_url => true} hash rather than using instance variables within the OrgConstraint class.


Modifying Models in Rails Migrations

As migrations have piled up in projects that I work on, one problem seems to come up fairly consistently. New changes to models can break migrations.

This can happen a number of different ways. One way is to break old migrations. Another is for the changes to be made to the file before the migration is run (timing issues with version control).

While these can be (and usually are) considered coordination rather than technical issues, sometimes you just need to handle them and move on.

One case I'd like to cover here is removing or changing associations. At the time the migration is expected to run, the file for the model class will have been updated already, so it is hard use that in the migration itself, even though it would be useful.

In this case I found myself with an even slightly trickier example. I have a model that contains some address info. Part of that is an association to an external table that lists the states. So part of the class definition was like so:

Class Contact 
 belongs_to :state

What I needed to do in the migration was to remove the association and introduce another field called "state" which would just be a varchar field representing the state part of the address. The two problems the migration would encounter were:

  1. The state association would not exist at the time it ran
  2. And even if it did, there would be a name conflict between it and the new column I wanted

To get around these restrictions I did this in my migration:

Contact.class_eval do
  belongs_to :orig_state,
             :class_name => "State",
             :foreign_key => "state_id"

This creates a different association named "orig_state" using the states table for the Contact class. I can now use my original migration code more-or-less as is, and still create a new state column. column.

Another problem I had was that the table had about 300 rows of data that failed one of the validations called "validate_names". I didn't feel like sorting it out, so I just added the following code to the above class_eval block:

define_method(:validate_names) do

With these two modifications to the Contact class, I was able to use the simple migration with all of my Rails associations to do what I needed in the migration without resorting to hand crafting more complex SQL that would have been required in order to not have to refer to the model classes at all in the migration.

Nifty In-Button Confirmation

I've been working on a personal email client after work, called Warm Sunrise, that forces myself to keep a manageable inbox. One of the goals of the project was to get to a zero-inbox everyday, so I needed a 'Delete All' button that was easy-to-use without running the risk of accidentally deleting emails. I took a look at JavaScript's confirm, which is jarring, and jQuery's dblClick, which doesn't provide any feedback to the user after the first click, leaving the user to wonder why their emails weren't deleted.

Given these options, I built my own button using Rails 3.1, jQuery, and CoffeeScript, that better fit the goals I set out with. It requires a double click, but gives the user a confirmation in the button itself, without any sort of timeout. You can see a video of it in action here:

Starting with app/views/letters/index.html.erb, I generated the buttons using Rails helpers and Twitter's Bootstrap classes:

<%= link_to 'Write letter', new_letter_path, :class => "btn primary pull-right far-right" %>
<%= link_to 'Delete all', '#', :class => "btn pull-right no_danger", :id => "delete_all" %>
<%= link_to 'Are you sure?', delete_all_letters_path, :method => :destroy, :class =>"btn pull-right danger confirm", :id => "delete_all", :style => "display:none;" %>

Notice that the 'Delete all' button doesn't actually specify a url and the 'Are you sure?' link's style is set to "display:none"

Here's the relationship I set up in my models:


belongs to :user


has_many :letters, :dependent => :destroy

I set up config/routes.rb to work with the explicit path I set in:

post 'delete_all_letters' => 'letters#delete_all'

Finally, I finished this lot by adding the delete_all action to my app/controllers/letters_controller.rb:

def delete_all 

    respond_to do |format|
        format.html { redirect_to letters_url, notice: 'Successfully deleted all letters.' }
        format.json { head :ok }

CoffeeScript is a beautiful language that compiles to JavaScript, which I prefer to JavaScript itself. You can read more about it here. Let's take a look at the CoffeeScript that makes this button work:

$('a#delete_all.no_danger').hover( ->
    $(this).click( ->
$('a#delete_all.no_danger').mouseleave( ->
$('a#delete_all.danger').mouseleave( ->

Since the button's text changes to a confirmation on the first click, makes it better for my purposes than Javascript's dblClick method. Check the video to see what it looks like in action.

Let's take a look at what this compiles to in plain JavaScript, too, since this is the only thing the browser sees:

$('a#delete_all.no_danger').hover(function() {
    return $(this).click(function() {
        return $('a#delete_all.confirm').show();
$('a#delete_all.no_danger').mouseleave(function() {
    return $(this).removeClass('danger');
$('a#delete_all.danger').mouseleave(function() {
    return $('a#delete_all.no_danger').show();

Not shown in the video, but I modified index.html.erb to only show the 'Delete all' button when the user has a zero-inbox.

<%= link_to 'Write letter', new_letter_path, :class => "btn primary pull-right far-right" %>
<% if !@letters.empty? %>
    <%= link_to 'Delete all', '#', :class => "btn pull-right no_danger", :id => "delete_all" %>
    <%= link_to 'Are you sure?', delete_all_letters_path, :method => :destroy, :class =>"btn pull-right danger confirm", :id => "delete_all", :style => "display:none;" %>
<% end %>

Sanitizing supposed UTF-8 data

As time passes, it's clear that Unicode has won the character set encoding wars, and UTF-8 is by far the most popular encoding, and the expected default. In a few more years we'll probably find discussion of different character set encodings to be arcane, relegated to "data historians" and people working with legacy systems.

But we're not there yet! There's still lots of migration to do before we can forget about everything that's not UTF-8.

Last week I again found myself converting data. This time I was taking data from a PostgreSQL database with no specified encoding (so-called "SQL_ASCII", really just raw bytes), and sending it via JSON to a remote web service. JSON uses UTF-8 by default, and that's what I needed here. Most of the source data was in either UTF-8, ISO Latin-1, or Windows-1252, but some was in non-Unicode Chinese or Japanese encodings, and some was just plain mangled.

At this point I need to remind you about one of the most unusual aspects of UTF-8: It has limited valid forms. Legacy encodings typically used all or most of the 255 code points in their 8-byte space (leaving point 0 for traditional ASCII NUL). While UTF-8 is compatible with 7-bit ASCII, it does not allow any possible 8-bit byte in any position. See the Wikipedia summary of invalid byte sequences to know what can be considered invalid.

We had no need to try to fix the truly broken data, but we wanted to convert everything possible to UTF-8 and at the very least guarantee no invalid UTF-8 strings appeared in what we sent.

I previously wrote about converting a PostgreSQL database dump to UTF-8, and used the Perl CPAN module IsUTF8.

I was going to use that again, but looked around and found an even better module, exactly targeting this use case: Encoding::FixLatin, by Grant McLean. Its documentation says it "takes mixed encoding input and produces UTF-8 output" and that's exactly what it does, focusing on input with mixed UTF-8, Latin-1, and Windows-1252.

It worked as advertised, very well. We would need to use a different module to convert some other legacy encodings, but in this case this was good enough and got the vast majority of the data right.

There's even a standalone fix_latin program designed specifically for processing Postgres pg_dump output from legacy encodings, with some nice examples of how to use it.

One gotcha is similar to a catch that David Christensen reported with the Encode module in a blog post here about a year ago: If the Perl string already has the UTF-8 flag set, Encoding::FixLatin immediately returns it, rather than trying to process it. So it's important that the incoming data be a pure byte stream, or that you otherwise turn off the UTF-8 flag, if you expect it to change anything.

Along the way I found some other CPAN modules that look useful for cases where I need more manual control than Encoding::FixLatin gives:

  • Search::Tools::UTF8 - test for and/or fix bad ASCII, Latin-1, Windows-1252, and UTF-8 strings
  • Encode::Detect - use Mozilla's universal charset detector and convert to UTF-8
  • Unicode::Tussle - ridiculously comprehensive set of Unicode tools that has to be seen to be believed

Once again Perl's thriving open source/free software community made my day!

Hurray for tracking configuration files in source control

In a number of places we've started tracking configuration files in git. It's great for Postgres configs, Apache or nginx, DNS zone files, Nagios, all kinds of things. A few clients have private offsite repos we push to, like at GitHub, but for the most part they're independent repos. It's still great for keeping track of what was changed when, and by whom.

In one case we have a centralized Nagios instance that does little more than receive passive checks from a number of remote systems. I'd set the checks on the remote systems but not loaded that configuration in yet. However while getting the central system set up, muscle memory kicked in and I suddenly had a half-red console as it's loading in stale data.

We don't need a flood of false alerts over email, but I don't want to completely revert the config and lose all those services...

[root nagios]# git stash; service nagios restart; git stash apply
Saved working directory and index state WIP on master: 0e9113b Made up commit for blog
HEAD is now at 0e9113b Made up commit for blog
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
# On branch master
# (etc)

Green! A small victory, for sure, but it shows one more advantage of modern SCM's.

Preventing Global Variables in JavaScript

JavaScript's biggest problem is its dependence on global variables
--Douglas Crockford, JavaScript: The Good Parts

Recently I built out support for affiliate management into’s Sinatra app using JavaScript and YUI.

I used a working page from the admin, Service Providers, as a starting point to get something up and running for affiliates quickly. By the time I finished, the Affiliates page worked great, but forms on the Service Provider page no longer populated with data.

Identifying a misbehaving global variable

There were no errors in the console, and the forms on the Service Providers page remained broken even after restoring an old copy of service_providers.js. As it turns out, a global variable, edit_map, was defined within service_providers.js, and again in the copied affiliates.js. Credit for spotting the problem goes to Brian Miller.

The fix was as simple as moving edit_map's declaration into the file's YUI sandbox, so that version of edit_map wouldn't be visible to any other pages in the admin.

Preventing global variables

As projects grow and complexity increases, it becomes easier and easier to overlook global variables and thus run into this tough-to-debug problem. Douglas Crockford’s Javascript: The Good Parts covers several workarounds to using global variables.

Rather than declaring variables globally, like this:

var edit_map = { 'business[name]' : 'business_name' };

the author recommends declaring them at the beginning of functions whenever possible:

YUI().use("node", "io", "json",
function(Y) {
    var edit_map = { 'business[name]' : 'business_name' };

In all other cases, he suggests using Global Abatement, which prevents your global variables from affecting other libraries. For example,

var LocateExpress = {};
LocateExpress.edit_map = { 'business[name]' : 'business_name' };

YUI().use("node", "io", "json",
function(Y) {
    return LocateExpress.edit_map;

I highly recommend JavaScript: The Good Parts to learn about the best JavaScript has to offer and workarounds for its ugly side. The author also wrote a very popular code-checker, JSLint, which could help debug this nasty problem by highlighting implicit global variables.

Using Gmail at Work

The Short Story

For those who don't care about why, just how...
  1. Create a new Gmail account
  2. Setup Mail Fetcher
  3. Setup send email from another account;and make it your default
  4. Verify you send and receive as your corporate address by default using the web client
  5. Setup your mobile
  6. From your mobile go to and Enable "Send Mail As" for this device (tested only on iOS)
  7. Verify you send and receive as your corporate address by default using your mobile client
  8. Setup Google Authorship with your domain's email address

The Long Story

Here at End Point there are a lot of opinions about email clients. Our hardcore folks like Alpine while for most people Evolution, Thunderbird, or Outlook will do. As a Gmail user since September 2004, I found I needed to figure out how get our corporate email system to work with my preferred client.

My first reaction was to have Gmail act as an IMAP client. I found (as many others had) that Gmail does not support IMAP integration with other accounts. However, Gmail does have a POP email client known as Mail Fetcher. I found that Gmail does support encrypted connections via POP, so use them if your email server supports them. When combined with the HTTPS by default, access to the Gmail web client seemed sufficiently secure.

I now needed to send email not as my Gmail address, but as my End Point address. Google has well documented how to send email from another account. Again encrypted SMTP is supported and is strongly recommended. Also be sure to make your corporate email account the default account so you will always use your corporate email address and not the Gmail address.

After verifying I was sending and receiving email properly, I needed to get my mobile setup. There are a variety of options available for all the mobile platforms. On my iPhone, I had several other accounts already setup and found the native client to be acceptable. I decided I would configure the native iPhone email app to access Gmail, as well as Contacts and Calendar using Google's support for Microsoft's ActiveSync protocol, which Google has licensed and rebranded as Google Sync.

I had used Google Sync for other Exchange accounts at my previous job and found it worked very well. However, there are some known issues, like not being able to accept event invitations recieved via POP. It's worth checking these issues out to see if there are any blockers for you.

After setting up "Google Sync" on my iPhone, I tested again, and found that by default, it would use my Gmail account as my default outgoing email account, despite the setting in the Gmail web client. I needed to use my corporate address here at End Point for sending mail from mobile; I thought I was sunk!

Fortunately, it seems I over looked a section in the Google Sync setup documentation, labeled "Enable Send Mail As feature". This feature solved my problem by having me go to from my iOS device and check Enable "Send Mail As" for this device. This would tell Google Sync to use the default outgoing account I had specified in the web client.

One requirement here at End Point which this configuration does not meet is support for PGP encryption/decryption of messages. There is a Chrome plugin that claims to offer support, but as the authors from this post highlight:

There may also be resistance from crypto users – who already are a security-conscious lot – to trusting private keys and confidential messages to a set of PGP functions folded inside some JavaScript running inside a browser.

I'd have to say I agree. After following the instructions to install the plugin, I balked when it asked for my private key; I just didn't feel comfortable. Despite this shortfall, most End Point email isn't encrypted end-to-end. However, I can feel good knowing that my "last mile" connection to End Point's servers are encrypted, end-to-end using encrypted POP, SMTP, and HTTPS.

Liquid Galaxy at Le Pavillon de l'Arsenal in Paris

Today there was an exciting opening of a new 48-screen Liquid Galaxy display at Le Pavillon de l'Arsenal in Paris. The configuration and use of this display is distinct from other Liquid Galaxies in that it consists of six columns and eight rows of 55" bezel-less displays set out on the floor to show just the city of Paris. This Liquid Galaxy replaced a physical model of Paris that was previously set up in the same space. It has four podiums with touch screens that visitors can use to navigate about Paris. The museum produced an impressive video showing the setup of this project:

End Point had the pleasure of working for and with Google on this project. Pierre Lebeau of Google spearheaded the project—at least from our point of view. Pierre's quick and clever thinking from a high-level perspective and his leadership were crucial for getting the project done on schedule. He's posted a nice blog article about the project. In addition to the Googlers on site our engineers also had the opportunity to see the talented museum staff at work and to work with JCDecaux who set up and are supporting the Planar Clarity displays. Kiel and Adam spent a couple of weeks each on the installation and customization (Adam is still there) and there was a lot of preparation beyond the on-site work that was required. So, hats off to Kiel and Adam!

Some new functionality and configuration for us that was incorporated in this setup of Liquid Galaxy included:

  • Driving four displays with each of the rack-mounted computers, rather than the one or two displays that we have been accustomed to for each computer of the system
  • Restricting the overall area of the display to just a specific region of the map, i.e., Paris in this case
  • Deploying a new web interface developed by Google for the touch screen
  • Integrating a new window manager to hide the menu bars in the displays
  • Enabling the use of multiple podiums to control the display.

While all the Liquid Galaxies that we have worked on and set up previously provided a wrap-around view, the Liquid Galaxy in Le Pavillon de l'Arsenal simply provides a large flat-panel view. A particular challenge therefore was figuring out how to display Google Earth's spherical view (necessitated by a single camera viewpoint) upon a flat display surface. With a lot of attention to detail and a reasonable amount of experimentation with various configuration parameters we organized the 48 different viewports to provide a crisp display while balancing the need for predictable user control.

My next visit to Paris will definitely be including a visit to Le Pavillon de l'Arsenal!

Sunspot, Solr, Rails: Working with Results

Having worked with Sunspot and Solr in several large Rails projects now, I've gained some knowledge about working with result sets optimally. Here's a brief explanation on working with results or hits from a search object.

MVC Setup

When working with Sunspot, searchable fields are defined in the model:

class Thing < ActiveRecord::Base
  searchable do
    text :field1, :stored => true
    text :field2
    string :field3, :stored => true
    integer :field4, :multiple => true

The code block above will include field1, field2, field3, and field4 in the search index of things . A keyword or text search on things will search field1 and field2 for matches. field3 and field4 may be used for scoping, or limiting the search result set based to specific values of field3 or field4.

In your controller, a new search object is created with the appropriate scoping and keyword values, shown below. Pagination is also added inside the search block.

class ThingsController < ApplicationController
  def index
    @search = do
      #fulltext search
      fulltext params[:keyword]

      if params.has_key?(:field3)
        with :field3, params[:field3]
      if params.has_key?(:field4)
        with :field3, params[:field4]

      paginate :page => params[:page], :per_page => 25

In the view, one can iterate through the result set, where results is an array of Thing instances.

<% @search.results.each do |result| -%>
<h2><%= result.field3 %></h2>
<%= result.field1 %>
<% end -%>

Working with Hits

The above code works. It works nicely until you display many results on one page where instantiation of things is not expensive. But the above code will call the query below for every search, and subsequently instantiate Ruby objects for each of the things found. This can become sluggish when the result set is large or the items themselves are expensive to instantiate.

# development.log
Thing Load (0.9ms)  SELECT "things".* FROM "things" WHERE "things"."id" IN (6, 12, 7, 13, 8, ...)

An optimized way to work with search results sets is working directly with hits. @search.hits is an array of Sunspot::Search::Hits, which represent the raw information returned by Solr for a single returned item. Hit objects provide access to stored field values, identified by the :stored option in the model's searchable definition. The model definition looks the same. The controller may now look like this:

class ThingsController < ApplicationController
  def index
    search = do
      #fulltext search
      fulltext params[:keyword]

      if params.has_key?(:field3)
        with :field3, params[:field3]
      if params.has_key?(:field4)
        with :field3, params[:field4]

    @hits = search.hits.paginate :page => params[:page], :per_page => 25

And working with the data in the view may look like this:

<% @hits.each do |result| -%>
<h2><%= hit.stored(:field3) %></h2>
<%= hit.stored(:field1) %>
<% end -%>

In some cases, you may want to introduce an additional piece of logic prior pagination, which is the case with the most recent Rails application I've been working on:


    filtered_results = []

    search.hits.each do |hit|
      if hit.stored(:field3) == "some arbitrary value"
        filtered_results << hit
      elsif hit.stored(:field1) == "some other arbitrary value"
        filtered_results << hit
    @hits = filtered_results.paginate :page => params[:page], :per_page => 25

Sunspot and Solr are rich with functionality and features that can add value to a Rails application, but it's important to identify areas of the application where database calls can be minimized and lazy loading can be optimized for better performance. The standard log file and database log file are good places to start looking.

Christmas Tree Commerce in 2011

Took a bit of a break today to get one of those perennial activities out of the way, the great Christmas tree shop. Much hasn't changed about this time honored tradition, don the hats and gloves (well, at least until global warming takes over), pile the family in the car, and hit the ATM to get a bundle of cash to pass "under the table." Not so fast, this is 2011 and Christmas tree lots aren't what they used to be.

Rest assured much of the experience hasn't changed, you still get to wade up and down aisles of freshly cut firs. Trying to select just the right balance of fat vs. thin, tall vs. short, density vs. ornament hanging potential, and there is still some haggling over price (if you are lucky) and the inevitable chainsawing, bundling, and twining to the top of the old station wagon (well, SUV). But today did have a big difference, and one that our e-commerce clients and more so our bricks and mortar clients should be particularly mindful, the "cash box" with the flip up lid and stacks of tens and twenties had been replaced by an iPad with a card reader. This Christmas tree lot has gone high tech, all the way. The iPad totaled the order, and with card reader attached, took my payment, allowed me to sign the screen with a finger, and e-mailed me my receipt.

As much as I appreciated the simplicity and convenience of paying it is a tough argument to make that it is that much better from the consumer side, paying in cash was pretty simple before too, but the secret here is for the vendor. This particular vendor (not exactly Apple) has eight tree lots around town, but what the iPad has done for this little, short lived merchant is provide real time inventory tracking, supply management, resource management, and the underpinnings of customer relationship management. In an instant they are able to see which lots are having the most foot traffic, which lots trend at which times, which are running low on a particular sort of tree, and make adjustments to stock and resourcing accordingly. I will be quite surprised if I don't receive an e-mail from them next year reminding me just where their lot is located, and that it is time to buy the centerpiece of holiday decorations.

Of course next year it will be 2012, and credit card mag strips are really so 2011, so if I need to bring anything other than my cellphone I'll be just a little disappointed. (Did I mention they provide delivery? I guess in case you drive a smart car.)


Running Integration Tests in Webkit Without a Browser

As your ruby web applications increase in UI complexity, they get harder to test using the standard Cucumber or even RSpec integration test suites. This is because of introduction of JavaScript in your UI. You've probably been there before. Here's the use case: You've written your integration tests for your Rails app and up to this point, you've been able to get away with not tagging your cucumber scenarios with the "@javascript" tag. Everything is going smoothly and then it's time to implement that one UI feature that is going to require an Ajax call or some javascript to hide or unhide a crucial piece of the user experience. This must be included in your integration tests.

So you go through the pain of setting up cucumber to work with selenium and tag your scenario as javascript so that it will run the test in the browser. At first, there's this thrill of excitement as you get to see Firefox load, and then run through a series of steps, executing your tests and then seeing them pass. Job done.

But maybe there's a different scenario at play here. What if you don't do your development in an environment that has a browser? At End Point, we are strong advocates of doing development on the same environment that your app is going to run on. It eliminates unexpected issues down the road. We believe in it so much, actually, that we've created DevCamps that allows you to setup development environments on a server.

Obviously, your selenium based tests are not going to work here without some work to get it to run headless.

The good folks at thoughtbot have come up with a great solution to this and it is called capybara-webkit. Capybara webkit assumes that you are using capybara for your testing framework. If you are using webrat, the transition is fairly smooth. You'll probably only need to change a few minor details in your tests.

What capybara-webkit does for you is enable you to run your tests inside of webkit. This will simulate an environment that will be very close to what you would see in Google Chrome or Safari as well as many mobile browsers. I've found that except for some edge cases, it covers Firefox and IE as well.

To install capybara-webkit you will need to install the Qt development toolkit. It's fairly straight forward so I'll just refer you to the github wiki page for instructions for the various platforms. In Ubuntu, I just ran the following:

sudo apt-get install libqt4-dev

If you are installing on a server environment, you'll also need to install Xvfb. You can do that in Ubuntu with the following command:

sudo apt-get install xvfb

It's a little outside the scope of this blog post to go into other little things you need to setup with xvfb. The important thing is that you set it up to run on display 99. Another important note, is that you don't have to set it up to run on boot. We will be starting it up when we run our tests if it isn't running.

The next step is to configure your cucumber tests to use the capybara-webkit driver. To do that, add

gem "capybara-webkit"

to your Gemfile in the development and test group. Then in your env.rb file for cucumber add the following lines:

Capybara.javascript_driver = :webkit

In some cases, I've found it helpful to also specify a server port and app_host as follows:

Capybara.server_port = '8000'
Capybara.app_host = 'http://localhost:8000'

Now your tests are setup to run in webkit. The final step is running the tests. To do this, you'll need to run them from within xvfb. You can do that with the following command:

xvfb-run bundle exec cucumber

I've created an alias for this and dropped it in my .bashrc file. Here's my entry, but you can set it up anyway you'd like.

alias xcuke="xvfb-run bundle exec cucumber"

Now running tests is a simple as running xcuke from the root of my Rails application.

There are a couple of big benefits to running capybara-webkit. First is speed. In my experience tests run much faster than they do in Selenium. Second, all JavaScript errors are dumped to STDOUT so you can see them in the output of your cucumber tests. Third, all of your tests are being run on webkit instead of rack so you get a test environment that acts more like a real browser would behave.

Thanks to the guys at thoughtbot for putting together this awesome gem.

Semaphore limits and many Apache instances on Linux

On some of our development servers, we run many instances of the Apache httpd web server on the same system. By "many", I mean 30 or more separate Apache instances, each with its own configuration file and child processes. This is not unusual on DevCamps setups with many developers working on many projects on the same server at the same time, each project having a complete software stack nearly identical to production.

On Red Hat Enterprise Linux 5, with somewhere in the range of 30 to 40 Apache instances on a server, you can run into failures at startup time with this error or another similar one in the error log:

[error] (28)No space left on device: Cannot create SSLMutex

The exact error will depend on what Apache modules you are running. The "space left on device" error does not mean you've run out of disk space or free inodes on your filesystem, but that you have run out of SysV IPC semaphores.

You can see what your limits are like this:

# cat /proc/sys/kernel/sem
250 32000 32 128

I typically double those limits by adding this line to /etc/sysctl.conf:

kernel.sem = 500 64000 64 256

That makes sure you'll get the change at the next boot. To make the change take immediate effect:

# sysctl -p

With those limits I've run 100 Apache instances on the same server.

Working with constants in Ruby

Ruby is designed to put complete power into the programmer's hands and with great power comes great responsibility! This includes the responsibility for freezing constants. Here's an example of what someone might THINK is happening by default with a constant.

class Foo
  DEFAULTS = [:a, :b]

default = Foo::DEFAULTS
default << :c
Foo::DEFAULTS #=> [:a, :b, :c]  WHOOPS!

As you can see, assigning a new variable from a constant lets you modify what you thought was a constant! Needless to say, such an assumption would be very difficult to track down in a real application. Let's see how we might improve on this design. First, let's freeze our constant.

class Foo
  DEFAULTS = [:a, :b].freeze

default = Foo::DEFAULTS
default << :c #=>  ERROR can't modify frozen array

Now we'll get very specific feedback about offending code. The question is how can we use our constant now as a starting point for array, and still be able to modify it later? Let's look at some more code.

Foo::DEFAULTS.frozen? #=> true
Foo::DEFAULTS.clone.frozen? #=> true, this was my first guess, but it turns out we need...
Foo::DEFAULTS.dup.frozen? #=> false

It's worth reading the docs on clone and dup to understand there difference, but in short, clone replicates the internal state of the object while dup creates a new instance of the object. There was one more question I needed to answer; what would happen when I wanted to append another frozen array to a non-frozen array? Let's look to the code again!

default = Foo::DEFAULTS.dup  #not frozen
new_default = default + [:c].frozen
new_default.frozen? # false

So it seems that the initial state of the object carries the frozen state, allowing you to append frozen arrays without having to dup them. The moral of the story here is don't make assumptions about Ruby! One of the best ways to challenge your assumptions is with unit tests.

Performing Bulk Edits in Rails: Part 2

This is the second article in the series on how to perform a bulk edit in Rails. Let's recap our user's story from Part 1.

  • User makes a selection of records and clicks "Bulk Edit" button
  • User works with the same form they would use for a regular edit, plus
    • check boxes are added by each attribute to allow the user to indicate this variable should be affected by the bulk edit
    • only attributes which are the same among selected records should be populated in the form

Part 1 addressed the first part of our user story. Now that we have our user's selection, we need to create an interface to allow them to select attributes affected by the bulk edit. Let's start with the form we'll use to POST our input.

# app/controllers/bulk_edits_controller.rb

def new
  @foos = Foo.find(params[:stored_file_ids]) #params collected by work done in Part 1
  @foo =

# app/views/bulk_edit/new.html.erb

<%= form_for @foo, :url => "/bulk_edits" do |f| %>
  <% @foos.each do |foo| %>
    <%= hidden_field_tag "foo_ids[]", %>
  <% end %>
  <%= render "foos/form", :f => f %>
  <%= f.submit %>
<% end %>

Let's first look at how we formed our form_for tag. Although this is a form for a Foo object, we don't want to POST to foos_controller#create so we add :url => "/bulk_edits" which will POST to the bulk_edits_controller#create. Additionally, we need to send along the foo_ids we eventually want to bulk update. Finally, we don't want to re-create the form we already have for Foo. By modifying one master form, we'll make long term maintenance easier. Now that we've got our form posting to the right place, let's see what modifications will need to make to our standard form to allow the user to highlight attributes they want to modify.

# app/views/foos/_form.html.erb

<%= check_box_tag "bulk_edit[]", :bar %>
<%= f.label :bar %>
<%= f.text_field :bar %>

Bulk edit check boxes appear in front of field names to let users know which fields will be modified.

We've added another check_box_tag to the form to record which attributes the user will select for bulk updating. However, we only want to display this when we're doing a bulk edit. Let's tweak this a bit further.

# app/views/foos/_form.html.erb

<%= bulk_edit_tag :bar %>
<%= f.label :bar %>
<%= f.text_field :bar %>

# app/helpers/foos_helper.rb

def bulk_edit_tag(attr)
  check_box_tag("bulk_edit[]", attr) if bulk_edit?

def bulk_edit?
  params[:controller] == "bulk_edits"

With these modifications to the form in place, the user can now specify which fields are eligible for bulk editing. Now we need the logic to determine how to populate the bar attribute based on the user's selection. This way, the user will see that an attribute is the same across all selected items. Let's revise our bulk edit controller.

# app/controllers/bulk_edit_controller.rb

def new
  @foos = Foo.find(params[:foo_ids])
  matching_attributes = Foo.matching_attributes_from(@foos)
  @foo =

# app/models/foo.rb

def self.matching_attributes_from(foos)

  matching = {}
  attriubtes_to_match =  #see attribute_names for more details

  foos.each do |foo|

    attributes_to_match.each do |attribute|

      value = foo.__send__(attribute)  #see send, invokes the method identified by symbol, use underscore version to avoid namespace issues

      if matching[attribute].nil?
        matching[attribute] = value  #assume it's a match

      elsif matching[attribute] != value
        matching[attribute] = "" #on the first mismatch, empty the value, but don't make it nil




Only fields which are the same across all selected records will be populated. Other fields will be left blank by default.

With Foo#matching_attributes_for generating a hash of matching attributes, the form will only populate fields which match across all of the user's selected items. With our form in place, the last step is to actually perform the bulk edit.

# app/controllers/bulk_edits_controller.rb
def create
  if params.has_key? :bulk_edit

    foos = Foo.find(params[:foo_ids])
    foos.each do |foo|

        eligible_params = {}
        params[:bulk_edit].each do |eligible_attr|

            #create hash of eligible_attributes and the user's value
            eligible_params.merge! { eligible_attr => params[:foo][eligible_attr] } 


        #update each record, but only with eligible attributes


We've now completed the entire user story. Users are able to use check boxes to identify which attributes should be bulk updated. They also get to see which attributes match across their selection. Things are, of course, always more involved with a real production application. Keep in mind this example does not make good use of mass assignment protection using attr_accessible and forcing an empty whitelist of attributes by using config.active_record.whitelist_attributes = true. This is a best practice that should be implemented anytime you need sever-side validation of your forms.

Additionally, there may be cases where you want to perform bulk edits of more complex attributes, such as nested attributes. Consider appending your additional attributes to the array and then tweaking the eligible attributes logic. Also consider implementing a maximum number of records which are able to be bulk edited at a time; wouldn't want your server to time out. Good luck!

Kamelopard Release

After completing no small amount of refactoring, I'm pleased to announce a new release of Kamelopard, a Ruby gem for generating KML. KML, as with most XML variants, requires an awful lot of typing to write by hand; Kamelopard makes it all much easier by mechanically generating all the repetitive XML bits and letting the developer focus on content. An example of this appears below, but first, here's what has changed most recently:

  • All KML output comes via Ruby's REXML library, rather than simply as string data that happens to contain XML. This not only makes it much harder for Kamelopard developers to mess up basic syntax, it also allows examination and modification of the KML data using XML standards such as XPath.
  • Kamelopard classes now live within a module, preventing namespace collisions. This is important for any large-ish library, and probably should have been done all along. Previous to this, some classes had awfully strange names designed to prevent namespace collisions; these classes have been changed to simpler, more intuitive names now that collisions aren't a problem.
  • Perhaps the biggest change is the incorporation of a large and (hopefully) comprehensive test suite. I'm a fan of test-driven development, but didn't start off on the right foot with Kamelopard. It originally shipped with a Ruby script that tried a few examples and hoped it didn't crash; that has been replaced with a full RSpec-based test suite, including tests for each class and in particular, extensive test of the KML output to ensure it meets the KML specification. Run these tests from the Kamelopard source with the command
    rspec spec/*

Now for some code. We recently got a data set containing several thousand locations, describing the movement of an aircraft on final approach and landing, with the request that we turn it into a Google Earth tour, where the viewer would follow the aircraft's path, flight simulator style. The actual KML result is over 56,000 lines, but the KML code is fairly simple:

require 'rubygems'
require 'kamelopard'
require 'csv'

CSV.foreach(ARGV[0]) do |row|
    time = row[0]
    lon = row[1].to_f
    lat = row[2].to_f
    alt = row[3].to_f

    p = lon, lat, alt, :absolute
    c =, get_heading, get_tilt, get_roll, :absolute)
    f = c, nil, pause, :smooth

puts Kamelopard::Document.instance.get_kml_document.to_s

Along with some trigonometry and linear algebra to calculate the heading, tilt, and roll, and a CSV file of data points, the script above is all it took; the KML result runs correctly in Google Earth without further modification. Kamelopard has been published to, so installation is simply

gem install kamelopard
Give it a try!

Book Recommendation: Ghost in the Wires

I recently listened to Ghost in the Wires by Kevin Mitnick as an audiobook during my long Thanksgiving vacation drives. This non-fiction book is a first-person account about Kevin Mitnick's phone and computer break-in (or what he claims to be ethical hacking) adventures in the late eighties and early nineties, and it touches on the following legal proceedings from 1995 on. A couple of interesting things stood out to me:

  • Kevin's tactics revolve around social engineering, or techniques that capitalize on flaws in "human hardware" to gain information. The book was an eye opener in terms of how easily Kevin gained access to systems, as there are countless examples of Kevin's ability to gain credibility, pretext, introduce diversions, etc.
  • Another highlight of the book for me was learning details of how bug reports were exploited to gain sensitive information. Kevin gained access to bug reports on proprietary software to exploit the software and gain access to the systems running the software. I don't think of my own clients' bug reports as an extremely valuable source of information for exploiting vulnerabilities to gain user information, but there have been a few instances in the past where bugs could have been used maliciously.

Follow-up Comments

One thing that strikes me is how the internet and technology has changed since Kevin's infringements, specifically in the development of open source software. End Point works with open source operating systems, packages, monolithic ecommerce applications, and modular open source elements (e.g. Rails gems, CPAN modules). Bug reports on open source applications are easily accessible. For example, here is an article on the security vulnerabilities in recent versions of Rails.

The responsibility of keeping up with security updates shifts to the website owner leveraging these open source solutions (or the hosting provider and/or developer in some cases). I spoke with a few developers a couple of years ago about how public WordPress security vulnerabilities enable unethical hackers to easily gain access to sites running WordPress without the security updates. With the increased popularity of open source and visibility of security vulnerabilities, it's important to keep up with security updates, especially those which might make sensitive user information available.

With the advancement in technology, security processes should become a normal part of development. For example, End Point has standard security processes in place such as use of ssh keys, firewalls for server access, and PGP encryption. Our clients also follow PCI compliance regulations regarding storing credit card numbers and security numbers in encrypted form only, or in some cases not at all if a third party payment processor is used. It's nice to use a third party service for storing credit card data since the responsibility of storing sensitive cardholder data shifts to the third party (however, the interaction between your site and the third party must be protected).

This is an interesting read (or listen) that I recommend to anyone working with sensitive information in the tech field. Learning about the social engineering techniques was fascinating in itself and technical bits are scattered throughout the book which make it suitable for tech-savvy and non-tech-savvy readers.

Global Variables in Interchange Jobs

Those familiar with writing global code in Interchange are certainly familiar with the number of duplicate references of certain global variables in different namespaces. For example, the Values reference is found in both the main namespace ($::Values) as well as in Vend::Interpolate ($Values usually from within usertags). One can also access the Values reference through the Session reference, which itself can be found in main ($::Session), Vend ($Vend::Session), and Vend::Interpolate ($Session usually from within usertags) namespaces with, e.g., $::Session->{values}. Most times, as long as context allows, any of those access points are interchangeable, and there's a good mix you see from developers using all of them.

In recent work for a client, I had developed an actionmap that incorporated access to the session for some of its coding--certainly not an uncommon occurrence. When I work in global space, I tend to use the main namespace references since they are available in all contexts within Interchange (or so I thought). The actionmap was constructed, tested, and put into production, where it worked as expected.

After a short period of operation, the client came to us and noted that in their actual operating procedure, the actionmap must process many more data points than we had it operate on in testing, causing it to take much more time. Thus, for their usual workload, they found the process was timing out and Interchange housekeeping reaping the process.

After a brief discussion, we decided the expedient course of action was to convert the work from a browser-initiated actionmap into an Interchange job. The code was easily exposed as a usertag as well, so in very short order we had the same functionality available as a job, where the job was now triggered by the browser access previously running the actionmap.

The change resolved the immediate problem, so now all work was completing, but the client brought a new issue to our attention. The reporting from the job was not as it was supposed to be. None of the code had been modified in the changeover, and the code when run as an actionmap produced the proper reporting.

The problem tracked down eventually to that session access. When the code was run in the context of the job, the Session reference was not copied into the main (or, as it turns out, Vend::Interpolate) namespace. Without the assumed session values in place, it was causing the report to produce invalid output.

To demonstrate, I constructed a simple usertag to dump the reference addresses of the 5 mentioned global variables:

UserTag  ic-globals  Routine <<EOR
sub {
    return <<EOP;
.     \$Session: $Session
\$Vend::Session: $Vend::Session
    \$::Session: $::Session

       \$Values: $Values
     \$::Values: $::Values

I then created both a test page and an IC job that only called [ic-globals]. Running them both demonstrates the problem quite clearly.

From test page:

      $Session: HASH(0xb0e1898)
$Vend::Session: HASH(0xb0e1898)
    $::Session: HASH(0xb0e1898)

       $Values: HASH(0xb0e1dd8)
     $::Values: HASH(0xb0e1dd8)

Output from job:

$Vend::Session: HASH(0xb221fa0)

       $Values: HASH(0x926ddd8)
     $::Values: HASH(0x926ddd8)

Interchange jobs provide yet a new context where you must consider your global variable usage. In particular, if you find code executed in the context of a job produces inconsistencies with the same code in other contexts, review your global variable usage and confirm those variables are what you assume they are.

Appending one PDF to another using PDF Toolkit

Ever need to manipulate PDFs? Prefer the command line? Us too. Imagine you have a contract in PDF format. When people print, sign, and re-scan the contract, that's good documentation of the signature, but the clarity of the original machine-readable text is lost and the the file's size is unnecessarily large. One solution is to append the scanned signature page to the original contract document.

There are many PDF editors out there which address this need. One command line solution that works well is PDF Labs's PDF Toolkit. Let's look at how we would use PDF Toolkit to append one document to another.

pdftk contract.pdf scanned_contract.pdf cat output original_and_signed_contract.pdf

With this command we now have both contracts in their entirety. What we really want is to just take the signature page and append it. Let's revise our command a bit to only take the signature page using what PDF Toolkit calls handles.

pdftk A=contract.pdf B=scanned_contract.pdf cat A B5 output contract_with_signature_attached.pdf

We've assigned each document to a handle (A and B), which allows us to define the order of the output as well as the pages we want to select for the output. With the argument B5 PDF Toolkit knows we only want the fifth page of the scanned_contract.pdf. Ranges are also supported, so we could write something like B4-5 too.
Unfortunately, the scanned contract was scanned upside down, so let's rotate 180 degrees by adding the -endS argument.

pdftk A=contract.pdf B=scanned_contract.pdf cat A B5-endS output contract_with_signature_attached.pdf

One notable issue I encountered while rotating individual pages was the inability to rotate and append only the first page. When specifying an option like B1-endS, the entire "B" document would be rotated and appended instead of just the first page. One other gotcha to remember: escape spaces and special characters when providing the names of documents. For example, if our document was named "scanned contract.pdf" we would need to do this:

pdftk contract.pdf scanned\ contract.pdf cat output signed_contract.pdf

The PDF Toolkit is licensed under GNU General Public License (GPL) Version 2. PDF Labs's website provides a host of other examples including how to encrypt, password-protect, and repair PDFs.

Performing Bulk Edits in Rails: Part 1

This will be the first article in a series, outlining how to implement a bulk edit in Rails 3.1.1 (although most any version of Rails will do). Today we'll be focusing on a simple user interface to allow the user to make a selection of records. But first, let's look at our user story.

The user story

  • User makes a selection of records and clicks "Bulk Edit" button
  • User works with the same form they would use for a regular edit, plus
    • check boxes are added by each attribute to allow the user to indicate this variable should be affected by the bulk edit
    • only attributes which are the same among selected records should be populated in the form
An example UI from Google's AdWords interface for
selecting multiple records for an action.

Sounds straight forward, right?  Well, there are a couple of gotcha's to be worked out along the way.

Capturing the user's selection

We'd like to offer the user a form with check boxes to click so when submitted, our controller gets an array of IDs we can pass to our ActiveRecord finder. It's best implemented using check_box_tag which means it's not auto-magically wired with an ActiveRecord object, which makes sense in this case because we don't want our form manipulating an active record object. We simply want to send our user's selection of records along to a new page.  Let's see what this looks like.

# app/views/search/_results.html

<% @foos.each do |foo| %>
  <%= check_box_tag "foo_ids[]",  %>
<% end %>

# when posted looks like
# "foo_ids"=>["4", "3", "2"]
Because we now have an array of IDs selected, it becomes very easy for us to work with our user's selection.
# app/controller/bulk_edit_controller.rb

def new
  if params[:foo_ids].is_a?(Array) && params[:foo_ids].length > 1  #let's make sure we got what we expected
    @foos = Foo.find(params[:foo_ids])
    redirect_to search_path

Refining the UI with Javascript and CSS

It's not just enough to have these check boxes. We need our "Bulk Edit" button only to appear when the user has made an appropriate selection. Let's update our view code to give our tags some class.

# app/views/search/_results.html

<%= form_tag new_bulk_edit_path, :method => "GET", :id => "bulk-edit-form" do %>
  <%= submit_tag "Bulk Edit", :id => "bulk-edit-submit" %>
<% end %>

<div class="search_results">
  <% @foos.each do |foo| %>
    <%= check_box_tag "foo_ids[]",, false, :class => "downloadable"  %>
  <% end %>

# app/assets/stylesheets/search.css

#bulk-edit-submit {
  input { display: none; }

We've added the downloadable class tag to our check boxes, while adding a simple form to send data to the new_bulk_edit_path. This path corresponds to the new action, which typically, you don't post forms to (which is why we needed to be explicit about setting the GET method). However, in this case we need this information before we can proceed with a new bulk edit. We've also hidden the submit button by default. We'll need some Javascript to show and hide it.

# app/assets/javascripts/search.js

$('.downloadable').click(function() {     //when an element of class downloadable is clicked
  var check_count = $('.downloadable:checked').size();  //count the number of checked elements
  if( check_count > 1 ) {
  } else {

At this point, you might have noticed that we're submitting a form with no fields in it! While we could simply wrap our form_tag around our search results, but we may not always want this. For example, what if we need multiple forms to be able to send our selection to different controllers in our application? Right now we're working on a bulk edit, but you know the client is expecting a bulk download as well. We can't wrap the same search results partial in multiple forms. Let's see how we would populate the our form using more Javascript.

# app/assets/javascripts/search.js

$('#bulk-edit').submit(function() {  //When the bulk-edit form is submitted
  $('#bulk-edit input:checked').remove();  //clear all checked elements from form
  var selected_items = $('.downloadable:checked').clone();
  return true;  //VERY IMPORTANT, needed to actually submit the form

This is a simple, unobtrusive way to give your forms a little more flexibility. It's also a good example of how to use :checked as a modifier on our jQuery selector.

Namespacing and Refactoring our Javascript

Knowing you'll need to implement a bulk-download form later in this same style, so let's refactor out this cloning functionality.

# app/assets/javascripts/search.js

$('#bulk-edit').submit(function() {
  MyAppName.clone_downloadable_checkboxes_to($(this));  //You MUST wrap "this" inside $()
  return true;

if(!window.MyAppName) {
  MyAppName = {};  //Initialize namespace for javascript functions

MyAppName.clone_downloadable_checkboxes_to = function(destination) {
  var selected_items = $('.downloadable:checked').clone();

One of the big highlights here is namespacing our Javascript function. While the chances are low that someone out there is going to have clone_downloadable_checkboxes_to in the global namespace too, it's always best to use proper namespaces.

Well, we've made it through the first part of our user story. The user can now check their boxes, and submit a form to the appropriate Rails resource. Stay tuned to see how we implement the second half of our user's story.

Advanced Rights and Roles Management in Rails

I've been working with Phunk, Brian, and Evan on a large Rails 3.1 project that has included several unique challenges. One of these challenges is a complex rights, roles, and accessibility system, which I'll discuss here.

Before I wrote any code, I researched existing authorization systems, and came across this article which lists a few of the popular authorization gems in Rails. After reading through the documentation on several more advanced current authorization gems, I found that no gem offered the level of complexity we needed, where rights are layered on top of roles and can be mapped out to specific actions. Because the client and my team were most familiar with acl9, we chose to work with it and layer rights on top of the existing access control subsystem. Here's a look at the data model we were looking for:

The data model shows a has_and_belongs_to_many (or many-to-many) relationship between users and roles, and roles and rights. Things are an example model, which belong_to users. Rights map out to methods in the controller that can be performed on thing instances.


Starting from the admin interface, a set of rights can be assigned to a role, a standard has_and_belongs_to_many relationship:

The admin interface includes ability to assign roles to users, another has_and_belongs_to_many relationship:

And the user model has an instance method to determine if the user's rights include the current method or right:

class User < ActiveRecord::Base
  def can_do_method?(method)
    self.rights.detect { |r| == method }

At the controller level without abstraction, we use the access control system to determine if the user has the ability to do that particular action, by including a conditional on the rule. Note that in these examples, the user also must be logged in, which is connected to the application's authentication system (devise).

class ThingsController < ApplicationController
  access_control do
    allow logged_in, :to => :example_right1, :if => :allow_example_right1?
    allow logged_in, :to => :example_right2, :if => :allow_example_right2?
    allow logged_in, :to => :example_right3, :if => :allow_example_right3?

  def allow_example_right1?
  def example_right1
    # actual method on Thing instance
  def allow_example_right2?
  def example_right2
    # actual method on Thing instance
  def allow_example_right3?
  def example_right3
    # actual method on Thing instance

The controller is simplified with the following abstraction. The access control statements do not need to be modified for each new potential method/right, but the method itself must be defined.

class ThingsController < ApplicationController
  access_control do
    allow logged_in, :to => :generic_method, :if => :allow_generic_method?

  def allow_generic_method?
  def generic_method

  def example_right1
    # actual method on Thing instance
  def example_right2
    # actual method on Thing instance
  def example_right3
    # actual method on Thing instance

And don't forget the handler for Acl9::AccessDenied exceptions, inside the ApplicationController, which handles both JSON and HTML responses:

class ApplicationController < ActionController::Base
  # Rescuing from any Access denied messages, generic JSON response or redirect and flash message
  rescue_from Acl9::AccessDenied do |exception|
    respond_to do |format|
      format.json do  
        render :json => { :success => false, :message => "You do not have access to do this action." }
      format.html do
        flash[:error] = 'You do not have access to view this page.'
        redirect_to root_url


Note that in actuality, our application has additional complexities, such as:

  • The relationship between rights and $subject is polymorphic, where $subject is a user or a role. This slightly complicates the has_and_belongs_to_many relationship between rights and users or roles. The can_do_method? predicate is updated to consider both user assigned rights and role assigned rights.
  • Performance is a consideration in this application, so Rails low-level caching may be leveraged to minimize accessibility lookup.
  • There is a notion of a global right and an ownership-level right, which means that a user with an ownership-level right may have the ability to do certain method only if they own the thing. A user with a global right has the ability to do the method regardless of ownership. This complicates our can_do_method? predicate further, to determine if the user has the global right or ownership-level right for that method on that thing.
  • A few methods have more complex business logic which determine whether or not a user has the ability to do that method. In those cases, an additional access_control allow rule is created, and distinct conditional predicate is used to determine if the user can do that method (i.e. allow_generic_method? is not used for these actions).

Other than the additional complexities, leveraging acl9's access control subsystem makes for a clean rights and roles management solution. Stay tuned for a follow-up article on leveraging this data model in combination with Rails' attr_accessible functionality to create elegant server-side validation.

Finding PostgreSQL temporary_file problems with tail_n_mail

Image by Flickr user dirkjanranzijn

PostgreSQL does as much work as it can in RAM, but sometimes it needs to (or thinks that it needs to) write things temporarily to disk. Typically, this happens on large or complex queries in which the required memory is greater than the work_mem setting.

This is usually an unwanted event: not only is going to disk much slower than keeping things in memory, but it can cause I/O contention. For very large, not-run-very-often queries, writing to disk can be warranted, but in most cases, you will want to adjust the work_mem setting. Keep in mind that this is very flexible setting, and can be adjusted globally (via the postgresql.conf file), per-user (via the ALTER USER command), and dynamically within a session (via the SET command). A good rule of thumb is to set it to something reasonable in your postgresql.conf (e.g. 8MB), and set it higher for specific users that are known to run complex queries. When you discover a particular query run by a normal user requires a lot of memory, adjust the work_mem for that particular query or set of queries.

How do you tell when you work_mem needs adjusting, or more to the point, when Postgres is writing files to disk? The key is the setting in postgresql.conf called log_temp_files. By default it is set to -1, which does no logging at all. Not very useful. A better setting is 0, which is my preferred setting: it logs all temporary files that are created. Setting log_temp_files to a positive number will only log entries that have an on-disk size greater than the given number (in kilobytes). Entries about temporary files used by Postgres will appear like this in your log file:

2011-01-12 16:33:34.175 EST LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp16501.0", size 130220032

The only important part is the size, in bytes. In the example above, the size is 124 MB, which is not that small of a file, especially as it may be created many, many times. So the question becomes, how can we quickly parse the files and get a sense of which queries are causing excess writes to disk? Enter the tail_n_mail program, which I recently tweaked to add a "tempfile" mode for just this purpose.

To enter this mode, just name your config file with "tempfile" in its name, and have it find the lines containing the temporary file information. It's also recommended you make use of the tempfile_limit parameter, which limits the results to the "top X" ones, as the report can get very verbose otherwise. An example config file and an example invocation via cron:

$ cat tail_n_mail.tempfile.myserver.txt

## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Thu Nov 10 01:23:45 2011
MAILSUBJECT: Myserver tempfile sizes
INCLUDE: temporary file

FILE: /var/log/pg_log/postgres-%Y-%m-%d.log

$ crontab -l | grep tempfile

## Mail a report each morning about tempfile usage:
0 5 * * * bin/tail_n_mail tnm/tail_n_mail.tempfile.myserver.txt --quiet

For the client I wrote this for, we run this once a day and it mails us a nice report giving the worst tempfile offenders. The queries are broken down in three ways:

  • Largest overall temporary file size
  • Largest arithmetic mean (average) size
  • Largest total size across all the same query

Here is a slightly edited version of an actual tempfile report email:

Date: Mon Nov  7 06:39:57 2011 EST
Total matches: 1342
Matches from [A] /var/log/pg_log/2011-11-08.log: 1241
Matches from [B] /var/log/pg_log/2011-11-09.log:  101
Not showing all lines: tempfile limit is 5

  Top items by arithmetic mean    |   Top items by total size
    860 MB (item 5, count is 1)   |   17 GB (item 4, count is 447)
    779 MB (item 1, count is 2)   |    8 GB (item 2, count is 71)
    597 MB (item 7, count is 1)   |    6 GB (item 334, count is 378)
    597 MB (item 8, count is 1)   |    6 GB (item 46, count is 104)
    596 MB (item 9, count is 1)   |    5 GB (item 3, count is 63)

[1] From file B Count: 2
Arithmetic mean is 779.38 MB, total size is 1.52 GB
Smallest temp file size: 534.75 MB (2011-11-08 12:33:14.312 EST)
Largest temp file size: 1024.00 MB (2011-11-08 16:33:14.121 EST)
First: 2011-11-08 05:30:12.541 EST
Last:  2011-11-09 03:12:22.162 EST
SELECT ab.order_number, TO_CHAR(ab.creation_date, 'YYYY-MM-DD HH24:MI:SS') AS order_date,
FROM orders o
JOIN order_summary os ON (os.order_id =
JOIN customer c ON (o.customer =
ORDER BY creation_date DESC

[2] From file A Count: 71
Arithmetic mean is 8.31 MB, total size is 654 MB
Smallest temp file size: 12.12 MB (2011-11-08 06:12:15.012 EST)
Largest temp file size: 24.23 MB (2011-11-08 19:32:45.004 EST)
First: 2011-11-08 06:12:15.012 EST
Last:  2011-11-09 04:12:14.042 EST
CREATE TEMPORARY TABLE tmp_sales_by_month AS SELECT * FROM sales_by_month_view;

While it still needs a little polishing (such as showing which file each smallest/largest came from), it has already been an indispensible tool forfinding queries that causing I/O problems via frequent and/or large temporary files.