End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Importing Comments into Disqus using Rails

It seems everything is going to the cloud, even comment systems for blogs. Disqus is a platform for offloading the ever growing feature set users expect from commenting systems. Their website boasts over a million sites using their platform and offers a robust feature set and good performance. But before you can drink the Kool-Aid, you've got to get your data into their system.

If you're using one of the common blog platforms such as WordPress or Blogger, there are fairly direct routes Disqus makes available for automatically importing your existing comment content. For those with an unsupported platform or a hand-rolled blog, you are left with exporting your comments into XML using WordPress's WXR standard.

Disqus leaves a lot up to the exporter, providing only one page in there knowledge base for using what they describe as a Custom XML Import Format. In my experience the import error messages were cryptic and my email support request is still unanswered 5 days later. (Ok, so it was Christmas weekend!)

So let's get into the nitty gritty details. First, the sample code provided in this article is based on Rails 3.0.x, but should work with Rails 3.1.x as well. Rails 2.x would work just as well by modifying the way the Rails environment is booted in the first lines. I chose to create a script to dump the output to standard output which could be piped in to a file for upload. Let's see some of the setup work.

Setting up a Rails script

I choose to place the script in the RAILS_ROOT/scripts directory and named it wxr_export.rb. This would allow me to call the script with the Rails 2.x style syntax (ahh, the nostalgia):

script/wxr_export.rb > comments.xml

This would fire up the full Rails enviornment, execute our Ruby code, and pipe the standard output to a file called comments.xml. Pretty straightforward, but it's not that often Rails developers think about creating these kind of scripts, so it's worth discussing to see the setup mechanics.

#!/usr/bin/env ruby
require File.expand_path('../../config/boot', __FILE__)
require File.expand_path('../../config/environment', __FILE__)

I think the first line is best explained by this excerpt from Ruby Programming:

First, we use the env command in the shebang line to search for the ruby executable in your PATH and execute it. This way, you will not need to change the shebang line on all your Ruby scripts if you move them to a computer with Ruby installed a different directory.

The next two lines are essentially asking the script to boot the correct Rails environment (development, testing, production). It's worth briefly offering an explanation of the syntax of these two somewhat cryptic lines. File#expand_path converts a pathname to an absolute pathname. If passed only the first string, it would use the current working path to evaluate, but since we pass __FILE__ we are asking it to use the current file's path as the starting point.

The config/boot.rb file is well documented in the Rails guides which explains that boot.rb defines the location of your Gemfile, hooks up Bundler, which adds the dependencies of the application (including Rails) to the load path, making them available for the application to load.

The config/enviornment.rb file is also well documented and effectively loads the Rails packages you've specified, such as ActiveModel, ActiveSupport, etc...

Exporting WXR content

Having finally loaded our Rails enviornment in a way we can use it, we are ready to actually build the XML we need. First, let's setup our XML and the gerenal format we'll use to popualate our file:

# script/wxr_export.rb

xml = Builder::XmlMarkup.new(:target => STDOUT, :indent => 2)

xml.instruct! :xml, :version=>"1.0", :encoding=>"UTF-8"

xml.rss 'version' => "2.0",
        'xmlns:content' => "http://purl.org/rss/1.0/modules/content/",
        'xmlns:dsq' => "http://www.disqus.com/",
        'xmlns:dc' => "http://purl.org/dc/elements/1.1/",
        'xmlns:wp' => "http://wordpress.org/export/1.0/" do
 
  xml.channel do
    Articles.all.each do |article|
      if should_be_exported?(article)
        xml.item do

          #Article XML goes here

   article.comments.each do |comment|
      
     #Comments XML goes here

   end #article.comments.each
 end   #xml.item
      end     #if should_be_exported?
    end       #Articles.all.each
  end         #xml.channel
end           #xml.rss

This is the general form for the WXR format as described by Disqus's knowledge base article. Note that you need to nest the comments inside each specific Article's XML. I found that I needed to filter some of my output so I added a helper function called should_be_exported? which can be defined at the top of the script. This would allow you to exclude Articles without comments, or whatever criteria you might find helpful.

With our basic format in place, let's look at the syntax for exporting the Article fields. Keep in mind that the fields you'll want to pull from in your system will likely be different, but the intention is the same.

Inside the Article XML block

# script/wxr_export.rb

# Inside the Article XML block

xml.title article.title

xml.link create_url_for(article)

xml.content(:encoded) { |x| x << "" }

xml.dsq(:thread_identifier) { |x| x << article.id }

xml.wp(:post_date_gmt) { |x| x << article.created_at.utc.to_formatted_s(:db) }

xml.wp(:comment_status) { |x| x << "open" } #all comments open

Let's look at each of these fields one by one:

  • xml.title: This is pretty straight forward, just the plain text tile of the blog article.
  • xml.link: Disqus can use URLs for determining which comments to display on your page, so it asks you to provide a URL associated with this article. I found that for this particular app, it would be easier to write another helper function to generate the URLs then using the Rails routes. If you wish to use the Rails routes (and I suggest you do), then I suggest checking out this excellent post for using routes outside of views.
  • xml.content(:encoded): The purpose of this field is clear, but the syntax is not. Hope this saves you some time and headache!
  • xml.dsq(:thread_identifier): The other way Disqus can identify your article is by a unique identifier. This is strongly recommended over the use of a URL. We'll just use your unique identifier in the database.
  • xml.wp(:post_date_gmt): The thing to keep in mind here is that we need the date in a very particular format. It needs to be in YYYY-MM-DD HH:MM:SS 24-hour format and adjusted to GMT which typically implies UTC. Rails 3 makes this very easy for us, bless their hearts.
  • xml.wp(:comment_status): This app wanted to leave all comments open. You may have different requirements so consider adding a helper function.

Inside the Comment XML block

article.comments.each do |comment|
  
  xml.wp(:comment) do

    xml.wp(:comment_id) { |x| x << comment.id }

    xml.wp(:comment_author) do |x| 
      if comment.user.present? && comment.user.name.present?
        x << comment.user.name
      else
 x << ""
      end 
    end 
                  
    xml.wp(:comment_author_email) do |x| 
      if comment.user.present? && comment.user.email.present?
        x << comment.user.email
      else
        x << ""
      end 
    end 

    xml.wp(:comment_author_url) do |x|
      if comment.user.present? && comment.user.url.present?
        x << comment.user.url
      else
        x << ""
      end
    end

    xml.wp(:comment_author_IP) { |x| x << "255.255.255.255" }

    xml.wp(:comment_date_gmt) { |x| x << comment.created_at.utc.to_formatted_s(:db) }

    xml.wp(:comment_content) { |x| x << "" }

    xml.wp(:comment_approved) { |x| x << 1 } #approve all comments

    xml.wp(:comment_parent) { |x| x << 0 }

  end #xml.wp(:comment)
end #article.comments.each

Again, let's inspect this one field at a time:

  • xml.wp(:comment_id): Straightforward, a simple unique identifier for the comment.
  • xml.wp(:comment_author): Because some commentors may not have a user associated with them, I added some extra checks to make sure the author's user and name were present. I'm sure there's a way to shorten the number of lines used, but I was going for readability here. I'm not certain it was necessary to include the blank string, but after some of the trouble I had importing, I wanted to minimize the chance of strange XML syntax issues.
  • xml.wp(:comment_author_email): More of the same safe guards of having empty data.
  • xml.wp(:comment_author_url): More of the same safe guards of having empty data.
  • xml.wp(:comment_author_IP): We were not collecting user IP data, so I put in some bogus data which Disqus did not seem to mind.
  • xml.wp(:comment_date_gmt): See xml.wp(:post_date_gmt) above for comments about date/time format.
  • xml.wp(:comment_content): See xml.content(:encoded) above for comments about encoding content.
  • xml.wp(:comment_approved): Two options here, 0 or 1. Typically you'd want to automatically approve your existing comments, unless of course you wanted to give a moderator a huge backlog of work.
  • xml.wp(:comment_parent): This little field turned out to be the cause of a lot of trouble for me. In the comments on Disqus's XML example, it says parent id (match up with wp:comment_id), so initially, I just put in the comment's ID in this field. This returned the very unhelpful error * url * URL is required to which I still have my unanswered supprot email in to Disqus. By trial error, I found that by just setting the comment_parent to zero, I could successfully upload my comment content. If you are using threaded comments, I suspect this field will be of more importance to you then it was to me. When I hear from Disqus, I will update this article with more information.

Labeling input boxes including passwords

I'm currently working on a new site and one of the design aspects of the site is many of the form fields do not have labels near the input boxes, they utilize labels that are inside the input box and fade away when text is entered. The label is also supposed to reappear if the box is cleared out. Originally I thought this was a pretty easy problem and wrote out some jQuery to do this quickly. The path I went down first was to set the textbox to the value we wanted displayed and then clear it on focus. This worked fine, however I reached a stumbling block when it came to password input boxes. My solution did not work properly because text in a password box is hidden and the label would be hidden as well. Most people would probably understand what went in each box, but I didn't want to risk confusing anyone, so I needed to find a better solution

I did some searching for jQuery and labels for password inputs and turned up several solutions. The first one actually put another text box on top of the password input, but that seemed prone to issue. The solution I decided to ultimately use is called In-Fields Labels, a jQuery plugin by Doug Neiner. In this solution Doug has floating labels that appear over the top of the textbox, and they dim slightly when focus is gained and then disappear completely when typing begins. The plugin does not mess with the value in the input box at all.

It was fairly easy to get up and running. I added the plugin to the page, created some styling for the labels, added label tags with the class of 'overlay' for each input box and called $('label.overlay').inFieldLabels();. This was all that was needed to get us going.

Normal view

Focus in the password box

Typing in the password box

The effect is pretty cool and it provides a good interface for the user as they are reminded up until the time they type in the box what they are supposed to enter.

Converting CentOS 6 to RHEL 6

A few years ago I needed to convert a Red Hat Enterprise Linux (RHEL) 5 development system to CentOS 5, as our customer did not actively use the system any more and no longer wanted to renew the Red Hat Network entitlement for it. Making the conversion was surprisingly straightforward.

This week I needed to make a conversion in the opposite direction: from CentOS 6 to RHEL 6. I didn't find any instructions on doing so, but found a RHEL 6 to CentOS 6 conversion guide with roughly these steps:

yum clean all
mkdir centos
cd centos
wget http://mirror.centos.org/centos/6.0/os/x86_64/RPM-GPG-KEY-CentOS-6
wget http://mirror.centos.org/centos/6.0/os/x86_64/Packages/centos-release-6-0.el6.centos.5.x86_64.rpm
wget http://mirror.centos.org/centos/6.0/os/x86_64/Packages/yum-3.2.27-14.el6.centos.noarch.rpm
wget http://mirror.centos.org/centos/6.0/os/x86_64/Packages/yum-utils-1.1.26-11.el6.noarch.rpm
wget http://mirror.centos.org/centos/6.0/os/x86_64/Packages/yum-plugin-fastestmirror-1.1.26-11.el6.noarch.rpm
rpm --import RPM-GPG-KEY-CentOS-6
rpm -e --nodeps redhat-release-server
rpm -e yum-rhn-plugin rhn-check rhnsd rhn-setup rhn-setup-gnome
rpm -Uhv --force *.rpm
yum upgrade

I then put together a plan to do more or less the opposite of that. The high-level overview of the steps is:

  1. Completely upgrade the current CentOS and reboot to run the latest kernel, if necessary, to make sure you're starting with a solid system.
  2. Install a handful of packages that will be needed by various RHN tools.
  3. Log into the Red Hat Network web interface and search for and download onto the server the most recent version of these packages for RHEL 6 x86_64:
    • redhat-release-server-6Server
    • rhn-check
    • rhn-client-tools
    • rhnlib
    • rhnsd
    • rhn-setup
    • yum
    • yum-metadata-parser
    • yum-rhn-plugin
    • yum-utils
  4. Install the Red Hat GnuPG signing key.
  5. Forcibly remove the package that identifies this system as CentOS.
  6. Forcibly upgrade to the downloaded RHEL and RHN packages.
  7. Register the system with Red Hat Network.
  8. Update any packages that now need it using the new Yum repository.

The exact steps I used today to convert from CentOS 6.1 to RHEL 6.2 (with URL session tokens munged):

yum upgrade
shutdown -r now
yum install dbus-python libxml2-python m2crypto pyOpenSSL python-dmidecode python-ethtool python-gudev usermode
mkdir rhel
cd rhel
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/redhat-release-server/6Server-6.2.0.3.el6/x86_64/redhat-release-server-6Server-6.2.0.3.el6.x86_64.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/rhn-check/1.0.0-73.el6/noarch/rhn-check-1.0.0-73.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/rhn-client-tools/1.0.0-73.el6/noarch/rhn-client-tools-1.0.0-73.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/rhnlib/2.5.22-12.el6/noarch/rhnlib-2.5.22-12.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/rhnsd/4.9.3-2.el6/x86_64/rhnsd-4.9.3-2.el6.x86_64.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/rhn-setup/1.0.0-73.el6/noarch/rhn-setup-1.0.0-73.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/yum/3.2.29-22.el6/noarch/yum-3.2.29-22.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/yum-metadata-parser/1.1.2-16.el6/x86_64/yum-metadata-parser-1.1.2-16.el6.x86_64.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/yum-rhn-plugin/0.9.1-36.el6/noarch/yum-rhn-plugin-0.9.1-36.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget 'https://content-web.rhn.redhat.com/rhn/public/NULL/yum-utils/1.1.30-10.el6/noarch/yum-utils-1.1.30-10.el6.noarch.rpm?__gda__=XXX_YYY&ext=.rpm'
wget https://www.redhat.com/security/fd431d51.txt
rpm --import fd431d51.txt
rpm -e --nodeps centos-release
rpm -e centos-release-cr
rpm -Uhv --force *.rpm
rpm -e yum-plugin-fastestmirror
yum clean all
rhn_register
yum upgrade

I'm expecting to use this process a few more times in the near future. It is very useful when working with a hosting provider that does not directly support RHEL, but provides CentOS, so we can get the new servers set up without needing to request a custom operating system installation that may add a day or two to the setup time.

Given the popularity of both RHEL and CentOS, it would be neat for Red Hat to provide a tool that would easily switch, at least "upgrading" from CentOS to RHEL to bring more customers into their fold, if not the other direction!

Rails Request-Based Routing Constraints in Spree

I recently adopted an unreleased ecommerce project running Spree 0.60.0 on Rails 3.0.9. The site used a Rails routing constraint and wildcard DNS to dynamically route subdomains to the “dispatch” action of the organizations_controller. If a request’s subdomain component matched that regular expression, it was routed to the dispatch method. Here's the original route:

match '/' => 'organizations#dispatch', :constraints => { :subdomain => /.+/ }

The business requirement driving this feature was that a User could register an Organization by submitting a form on the site. Once that Organization was marked "approved" by an admin, that Organization would become accessible at their own subdomain - no server configuration required.

For marketing reasons, we decided to switch from subdomains to top-level subdirectories. This meant RESTful routes (e.g. domain.com/organizations/143) wouldn’t cut it. In order to handle this, I created a routing constraint class called OrgConstraint. This routing constraint class works in tandem with a tweaked version of that original route.

match '*org_url' => 'organizations#show', :constraints => OrgConstraint.new

The :constraints param takes an instance of a class (not a class name) that responds to a matches? predicate method that returns true or false. If matches? returns true, the request will be routed to that controller#action. Else, that route is treated like any other non-matching route. Here’s the entire OrgConstraint class:

class OrgConstraint
  def matches?(request)
    Organization.valid_url? request.path_parameters[:org_url]
  end
end

Note how Rails automatically passes the request object to the matches? method. Also note how the relative url of the request is available via the :org_url symbol - the same identifier we used in the route definition. The Organization.valid_url? class method encapsulates the logic of examining a simple cached (via Rails.cache) hash consisting of organization urls as keys and true as their value.

The final step in this process is, of course, the organizations_controller’s show method. It now needs to look for that same :org_url param that the route definition creates, in the standard params hash we all know and love:

def show
  @organization = Organization.find(params[:id]) if params[:id]  
  # from routing constraint
  @organization ||= Organization.find_by_url(params[:org_url]) if params[:org_url]  
  ...
end

I should point out that Rails instantiates exactly one instance of your routing constraint class when it first loads your routes. This means you’ll want to ensure your class’s design will respond appropriately to changes in any underlying data. This is one of the reasons the Organization class caches the {$org_url => true} hash rather than using instance variables within the OrgConstraint class.


 

Modifying Models in Rails Migrations

As migrations have piled up in projects that I work on, one problem seems to come up fairly consistently. New changes to models can break migrations.

This can happen a number of different ways. One way is to break old migrations. Another is for the changes to be made to the file before the migration is run (timing issues with version control).

While these can be (and usually are) considered coordination rather than technical issues, sometimes you just need to handle them and move on.

One case I'd like to cover here is removing or changing associations. At the time the migration is expected to run, the file for the model class will have been updated already, so it is hard use that in the migration itself, even though it would be useful.

In this case I found myself with an even slightly trickier example. I have a model that contains some address info. Part of that is an association to an external table that lists the states. So part of the class definition was like so:

Class Contact 
 belongs_to :state
 ...
end

What I needed to do in the migration was to remove the association and introduce another field called "state" which would just be a varchar field representing the state part of the address. The two problems the migration would encounter were:

  1. The state association would not exist at the time it ran
  2. And even if it did, there would be a name conflict between it and the new column I wanted

To get around these restrictions I did this in my migration:

Contact.class_eval do
  belongs_to :orig_state,
             :class_name => "State",
             :foreign_key => "state_id"
end

This creates a different association named "orig_state" using the states table for the Contact class. I can now use my original migration code more-or-less as is, and still create a new state column. column.

Another problem I had was that the table had about 300 rows of data that failed one of the validations called "validate_names". I didn't feel like sorting it out, so I just added the following code to the above class_eval block:

define_method(:validate_names) do
  true
end

With these two modifications to the Contact class, I was able to use the simple migration with all of my Rails associations to do what I needed in the migration without resorting to hand crafting more complex SQL that would have been required in order to not have to refer to the model classes at all in the migration.

Nifty In-Button Confirmation

I've been working on a personal email client after work, called Warm Sunrise, that forces myself to keep a manageable inbox. One of the goals of the project was to get to a zero-inbox everyday, so I needed a 'Delete All' button that was easy-to-use without running the risk of accidentally deleting emails. I took a look at JavaScript's confirm, which is jarring, and jQuery's dblClick, which doesn't provide any feedback to the user after the first click, leaving the user to wonder why their emails weren't deleted.

Given these options, I built my own button using Rails 3.1, jQuery, and CoffeeScript, that better fit the goals I set out with. It requires a double click, but gives the user a confirmation in the button itself, without any sort of timeout. You can see a video of it in action here:


Starting with app/views/letters/index.html.erb, I generated the buttons using Rails helpers and Twitter's Bootstrap classes:

<%= link_to 'Write letter', new_letter_path, :class => "btn primary pull-right far-right" %>
<%= link_to 'Delete all', '#', :class => "btn pull-right no_danger", :id => "delete_all" %>
<%= link_to 'Are you sure?', delete_all_letters_path, :method => :destroy, :class =>"btn pull-right danger confirm", :id => "delete_all", :style => "display:none;" %>

Notice that the 'Delete all' button doesn't actually specify a url and the 'Are you sure?' link's style is set to "display:none"

Here's the relationship I set up in my models:

app/models/letter.rb

belongs to :user

app/models/user.rb

has_many :letters, :dependent => :destroy

I set up config/routes.rb to work with the explicit path I set in:

post 'delete_all_letters' => 'letters#delete_all'

Finally, I finished this lot by adding the delete_all action to my app/controllers/letters_controller.rb:

def delete_all 
    current_user.letters.delete_all

    respond_to do |format|
        format.html { redirect_to letters_url, notice: 'Successfully deleted all letters.' }
        format.json { head :ok }
    end 
end 

CoffeeScript is a beautiful language that compiles to JavaScript, which I prefer to JavaScript itself. You can read more about it here. Let's take a look at the CoffeeScript that makes this button work:

$('a#delete_all.no_danger').hover( ->
    $(this).addClass('danger')
    $(this).click( ->
        $('a#delete_all.no_danger').hide()
        $('a#delete_all.confirm').show()
    )   
)
$('a#delete_all.no_danger').mouseleave( ->
    $(this).removeClass('danger')
)
$('a#delete_all.danger').mouseleave( ->
    $(this).hide()
    $('a#delete_all.no_danger').show()
)

Since the button's text changes to a confirmation on the first click, makes it better for my purposes than Javascript's dblClick method. Check the video to see what it looks like in action.

Let's take a look at what this compiles to in plain JavaScript, too, since this is the only thing the browser sees:

$('a#delete_all.no_danger').hover(function() {
    $(this).addClass('danger');
    return $(this).click(function() {
        $('a#delete_all.no_danger').hide();
        return $('a#delete_all.confirm').show();
    });
});
$('a#delete_all.no_danger').mouseleave(function() {
    return $(this).removeClass('danger');
});
$('a#delete_all.danger').mouseleave(function() {
    $(this).hide();
    return $('a#delete_all.no_danger').show();
});

Not shown in the video, but I modified index.html.erb to only show the 'Delete all' button when the user has a zero-inbox.

<%= link_to 'Write letter', new_letter_path, :class => "btn primary pull-right far-right" %>
<% if !@letters.empty? %>
    <%= link_to 'Delete all', '#', :class => "btn pull-right no_danger", :id => "delete_all" %>
    <%= link_to 'Are you sure?', delete_all_letters_path, :method => :destroy, :class =>"btn pull-right danger confirm", :id => "delete_all", :style => "display:none;" %>
<% end %>

Sanitizing supposed UTF-8 data

As time passes, it's clear that Unicode has won the character set encoding wars, and UTF-8 is by far the most popular encoding, and the expected default. In a few more years we'll probably find discussion of different character set encodings to be arcane, relegated to "data historians" and people working with legacy systems.

But we're not there yet! There's still lots of migration to do before we can forget about everything that's not UTF-8.

Last week I again found myself converting data. This time I was taking data from a PostgreSQL database with no specified encoding (so-called "SQL_ASCII", really just raw bytes), and sending it via JSON to a remote web service. JSON uses UTF-8 by default, and that's what I needed here. Most of the source data was in either UTF-8, ISO Latin-1, or Windows-1252, but some was in non-Unicode Chinese or Japanese encodings, and some was just plain mangled.

At this point I need to remind you about one of the most unusual aspects of UTF-8: It has limited valid forms. Legacy encodings typically used all or most of the 255 code points in their 8-byte space (leaving point 0 for traditional ASCII NUL). While UTF-8 is compatible with 7-bit ASCII, it does not allow any possible 8-bit byte in any position. See the Wikipedia summary of invalid byte sequences to know what can be considered invalid.

We had no need to try to fix the truly broken data, but we wanted to convert everything possible to UTF-8 and at the very least guarantee no invalid UTF-8 strings appeared in what we sent.

I previously wrote about converting a PostgreSQL database dump to UTF-8, and used the Perl CPAN module IsUTF8.

I was going to use that again, but looked around and found an even better module, exactly targeting this use case: Encoding::FixLatin, by Grant McLean. Its documentation says it "takes mixed encoding input and produces UTF-8 output" and that's exactly what it does, focusing on input with mixed UTF-8, Latin-1, and Windows-1252.

It worked as advertised, very well. We would need to use a different module to convert some other legacy encodings, but in this case this was good enough and got the vast majority of the data right.

There's even a standalone fix_latin program designed specifically for processing Postgres pg_dump output from legacy encodings, with some nice examples of how to use it.

One gotcha is similar to a catch that David Christensen reported with the Encode module in a blog post here about a year ago: If the Perl string already has the UTF-8 flag set, Encoding::FixLatin immediately returns it, rather than trying to process it. So it's important that the incoming data be a pure byte stream, or that you otherwise turn off the UTF-8 flag, if you expect it to change anything.

Along the way I found some other CPAN modules that look useful for cases where I need more manual control than Encoding::FixLatin gives:

  • Search::Tools::UTF8 - test for and/or fix bad ASCII, Latin-1, Windows-1252, and UTF-8 strings
  • Encode::Detect - use Mozilla's universal charset detector and convert to UTF-8
  • Unicode::Tussle - ridiculously comprehensive set of Unicode tools that has to be seen to be believed

Once again Perl's thriving open source/free software community made my day!

Hurray for tracking configuration files in source control

In a number of places we've started tracking configuration files in git. It's great for Postgres configs, Apache or nginx, DNS zone files, Nagios, all kinds of things. A few clients have private offsite repos we push to, like at GitHub, but for the most part they're independent repos. It's still great for keeping track of what was changed when, and by whom.

In one case we have a centralized Nagios instance that does little more than receive passive checks from a number of remote systems. I'd set the checks on the remote systems but not loaded that configuration in yet. However while getting the central system set up, muscle memory kicked in and I suddenly had a half-red console as it's loading in stale data.

We don't need a flood of false alerts over email, but I don't want to completely revert the config and lose all those services...

[root nagios]# git stash; service nagios restart; git stash apply
Saved working directory and index state WIP on master: 0e9113b Made up commit for blog
HEAD is now at 0e9113b Made up commit for blog
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
# On branch master
# (etc)

Green! A small victory, for sure, but it shows one more advantage of modern SCM's.

Preventing Global Variables in JavaScript

JavaScript's biggest problem is its dependence on global variables
--Douglas Crockford, JavaScript: The Good Parts

Recently I built out support for affiliate management into LocateExpress.com’s Sinatra app using JavaScript and YUI.

I used a working page from the admin, Service Providers, as a starting point to get something up and running for affiliates quickly. By the time I finished, the Affiliates page worked great, but forms on the Service Provider page no longer populated with data.

Identifying a misbehaving global variable

There were no errors in the console, and the forms on the Service Providers page remained broken even after restoring an old copy of service_providers.js. As it turns out, a global variable, edit_map, was defined within service_providers.js, and again in the copied affiliates.js. Credit for spotting the problem goes to Brian Miller.

The fix was as simple as moving edit_map's declaration into the file's YUI sandbox, so that version of edit_map wouldn't be visible to any other pages in the admin.

Preventing global variables

As projects grow and complexity increases, it becomes easier and easier to overlook global variables and thus run into this tough-to-debug problem. Douglas Crockford’s Javascript: The Good Parts covers several workarounds to using global variables.

Rather than declaring variables globally, like this:

var edit_map = { 'business[name]' : 'business_name' };

the author recommends declaring them at the beginning of functions whenever possible:

YUI().use("node", "io", "json",
function(Y) {
    var edit_map = { 'business[name]' : 'business_name' };
    ...
});

In all other cases, he suggests using Global Abatement, which prevents your global variables from affecting other libraries. For example,

var LocateExpress = {};
LocateExpress.edit_map = { 'business[name]' : 'business_name' };

YUI().use("node", "io", "json",
function(Y) {
    ...
    return LocateExpress.edit_map;
});

I highly recommend JavaScript: The Good Parts to learn about the best JavaScript has to offer and workarounds for its ugly side. The author also wrote a very popular code-checker, JSLint, which could help debug this nasty problem by highlighting implicit global variables.

Using Gmail at Work

The Short Story

For those who don't care about why, just how...
  1. Create a new Gmail account
  2. Setup Mail Fetcher
  3. Setup send email from another account;and make it your default
  4. Verify you send and receive as your corporate address by default using the web client
  5. Setup your mobile
  6. From your mobile go to m.google.com/sync and Enable "Send Mail As" for this device (tested only on iOS)
  7. Verify you send and receive as your corporate address by default using your mobile client
  8. Setup Google Authorship with your domain's email address

The Long Story

Here at End Point there are a lot of opinions about email clients. Our hardcore folks like Alpine while for most people Evolution, Thunderbird, or Outlook will do. As a Gmail user since September 2004, I found I needed to figure out how get our corporate email system to work with my preferred client.

My first reaction was to have Gmail act as an IMAP client. I found (as many others had) that Gmail does not support IMAP integration with other accounts. However, Gmail does have a POP email client known as Mail Fetcher. I found that Gmail does support encrypted connections via POP, so use them if your email server supports them. When combined with the HTTPS by default, access to the Gmail web client seemed sufficiently secure.

I now needed to send email not as my Gmail address, but as my End Point address. Google has well documented how to send email from another account. Again encrypted SMTP is supported and is strongly recommended. Also be sure to make your corporate email account the default account so you will always use your corporate email address and not the Gmail address.

After verifying I was sending and receiving email properly, I needed to get my mobile setup. There are a variety of options available for all the mobile platforms. On my iPhone, I had several other accounts already setup and found the native client to be acceptable. I decided I would configure the native iPhone email app to access Gmail, as well as Contacts and Calendar using Google's support for Microsoft's ActiveSync protocol, which Google has licensed and rebranded as Google Sync.

I had used Google Sync for other Exchange accounts at my previous job and found it worked very well. However, there are some known issues, like not being able to accept event invitations recieved via POP. It's worth checking these issues out to see if there are any blockers for you.

After setting up "Google Sync" on my iPhone, I tested again, and found that by default, it would use my Gmail account as my default outgoing email account, despite the setting in the Gmail web client. I needed to use my corporate address here at End Point for sending mail from mobile; I thought I was sunk!

Fortunately, it seems I over looked a section in the Google Sync setup documentation, labeled "Enable Send Mail As feature". This feature solved my problem by having me go to m.google.com/sync from my iOS device and check Enable "Send Mail As" for this device. This would tell Google Sync to use the default outgoing account I had specified in the web client.

One requirement here at End Point which this configuration does not meet is support for PGP encryption/decryption of messages. There is a Chrome plugin that claims to offer support, but as the authors from this post highlight:

There may also be resistance from crypto users – who already are a security-conscious lot – to trusting private keys and confidential messages to a set of PGP functions folded inside some JavaScript running inside a browser.

I'd have to say I agree. After following the instructions to install the plugin, I balked when it asked for my private key; I just didn't feel comfortable. Despite this shortfall, most End Point email isn't encrypted end-to-end. However, I can feel good knowing that my "last mile" connection to End Point's servers are encrypted, end-to-end using encrypted POP, SMTP, and HTTPS.

Liquid Galaxy at Le Pavillon de l'Arsenal in Paris

Today there was an exciting opening of a new 48-screen Liquid Galaxy display at Le Pavillon de l'Arsenal in Paris. The configuration and use of this display is distinct from other Liquid Galaxies in that it consists of six columns and eight rows of 55" bezel-less displays set out on the floor to show just the city of Paris. This Liquid Galaxy replaced a physical model of Paris that was previously set up in the same space. It has four podiums with touch screens that visitors can use to navigate about Paris. The museum produced an impressive video showing the setup of this project:

End Point had the pleasure of working for and with Google on this project. Pierre Lebeau of Google spearheaded the project—at least from our point of view. Pierre's quick and clever thinking from a high-level perspective and his leadership were crucial for getting the project done on schedule. He's posted a nice blog article about the project. In addition to the Googlers on site our engineers also had the opportunity to see the talented museum staff at work and to work with JCDecaux who set up and are supporting the Planar Clarity displays. Kiel and Adam spent a couple of weeks each on the installation and customization (Adam is still there) and there was a lot of preparation beyond the on-site work that was required. So, hats off to Kiel and Adam!

Some new functionality and configuration for us that was incorporated in this setup of Liquid Galaxy included:

  • Driving four displays with each of the rack-mounted computers, rather than the one or two displays that we have been accustomed to for each computer of the system
  • Restricting the overall area of the display to just a specific region of the map, i.e., Paris in this case
  • Deploying a new web interface developed by Google for the touch screen
  • Integrating a new window manager to hide the menu bars in the displays
  • Enabling the use of multiple podiums to control the display.

While all the Liquid Galaxies that we have worked on and set up previously provided a wrap-around view, the Liquid Galaxy in Le Pavillon de l'Arsenal simply provides a large flat-panel view. A particular challenge therefore was figuring out how to display Google Earth's spherical view (necessitated by a single camera viewpoint) upon a flat display surface. With a lot of attention to detail and a reasonable amount of experimentation with various configuration parameters we organized the 48 different viewports to provide a crisp display while balancing the need for predictable user control.

My next visit to Paris will definitely be including a visit to Le Pavillon de l'Arsenal!

Sunspot, Solr, Rails: Working with Results

Having worked with Sunspot and Solr in several large Rails projects now, I've gained some knowledge about working with result sets optimally. Here's a brief explanation on working with results or hits from a search object.

MVC Setup

When working with Sunspot, searchable fields are defined in the model:

class Thing < ActiveRecord::Base
  searchable do
    text :field1, :stored => true
    text :field2
    string :field3, :stored => true
    integer :field4, :multiple => true
  end
end

The code block above will include field1, field2, field3, and field4 in the search index of things . A keyword or text search on things will search field1 and field2 for matches. field3 and field4 may be used for scoping, or limiting the search result set based to specific values of field3 or field4.

In your controller, a new search object is created with the appropriate scoping and keyword values, shown below. Pagination is also added inside the search block.

class ThingsController < ApplicationController
  def index
    @search = Sunspot.search(Thing) do
      #fulltext search
      fulltext params[:keyword]

      #scoping
      if params.has_key?(:field3)
        with :field3, params[:field3]
      end 
      if params.has_key?(:field4)
        with :field3, params[:field4]
      end

      paginate :page => params[:page], :per_page => 25
    end
    @search.execute!
  end
end

In the view, one can iterate through the result set, where results is an array of Thing instances.

<% @search.results.each do |result| -%>
<h2><%= result.field3 %></h2>
<%= result.field1 %>
<% end -%>

Working with Hits

The above code works. It works nicely until you display many results on one page where instantiation of things is not expensive. But the above code will call the query below for every search, and subsequently instantiate Ruby objects for each of the things found. This can become sluggish when the result set is large or the items themselves are expensive to instantiate.

# development.log
Thing Load (0.9ms)  SELECT "things".* FROM "things" WHERE "things"."id" IN (6, 12, 7, 13, 8, ...)

An optimized way to work with search results sets is working directly with hits. @search.hits is an array of Sunspot::Search::Hits, which represent the raw information returned by Solr for a single returned item. Hit objects provide access to stored field values, identified by the :stored option in the model's searchable definition. The model definition looks the same. The controller may now look like this:

class ThingsController < ApplicationController
  def index
    search = Sunspot.search(Thing) do
      #fulltext search
      fulltext params[:keyword]

      #scoping
      if params.has_key?(:field3)
        with :field3, params[:field3]
      end 
      if params.has_key?(:field4)
        with :field3, params[:field4]
      end
    end
    search.execute!

    @hits = search.hits.paginate :page => params[:page], :per_page => 25
  end
end

And working with the data in the view may look like this:

<% @hits.each do |result| -%>
<h2><%= hit.stored(:field3) %></h2>
<%= hit.stored(:field1) %>
<% end -%>

In some cases, you may want to introduce an additional piece of logic prior pagination, which is the case with the most recent Rails application I've been working on:

    ...
    search.execute!

    filtered_results = []

    search.hits.each do |hit|
      if hit.stored(:field3) == "some arbitrary value"
        filtered_results << hit
      elsif hit.stored(:field1) == "some other arbitrary value"
        filtered_results << hit
      end
    end
   
    @hits = filtered_results.paginate :page => params[:page], :per_page => 25

Sunspot and Solr are rich with functionality and features that can add value to a Rails application, but it's important to identify areas of the application where database calls can be minimized and lazy loading can be optimized for better performance. The standard log file and database log file are good places to start looking.

Christmas Tree Commerce in 2011

Took a bit of a break today to get one of those perennial activities out of the way, the great Christmas tree shop. Much hasn't changed about this time honored tradition, don the hats and gloves (well, at least until global warming takes over), pile the family in the car, and hit the ATM to get a bundle of cash to pass "under the table." Not so fast, this is 2011 and Christmas tree lots aren't what they used to be.

Rest assured much of the experience hasn't changed, you still get to wade up and down aisles of freshly cut firs. Trying to select just the right balance of fat vs. thin, tall vs. short, density vs. ornament hanging potential, and there is still some haggling over price (if you are lucky) and the inevitable chainsawing, bundling, and twining to the top of the old station wagon (well, SUV). But today did have a big difference, and one that our e-commerce clients and more so our bricks and mortar clients should be particularly mindful, the "cash box" with the flip up lid and stacks of tens and twenties had been replaced by an iPad with a card reader. This Christmas tree lot has gone high tech, all the way. The iPad totaled the order, and with card reader attached, took my payment, allowed me to sign the screen with a finger, and e-mailed me my receipt.

As much as I appreciated the simplicity and convenience of paying it is a tough argument to make that it is that much better from the consumer side, paying in cash was pretty simple before too, but the secret here is for the vendor. This particular vendor (not exactly Apple) has eight tree lots around town, but what the iPad has done for this little, short lived merchant is provide real time inventory tracking, supply management, resource management, and the underpinnings of customer relationship management. In an instant they are able to see which lots are having the most foot traffic, which lots trend at which times, which are running low on a particular sort of tree, and make adjustments to stock and resourcing accordingly. I will be quite surprised if I don't receive an e-mail from them next year reminding me just where their lot is located, and that it is time to buy the centerpiece of holiday decorations.

Of course next year it will be 2012, and credit card mag strips are really so 2011, so if I need to bring anything other than my cellphone I'll be just a little disappointed. (Did I mention they provide delivery? I guess in case you drive a smart car.)

@Happy_Holidays!

Running Integration Tests in Webkit Without a Browser

As your ruby web applications increase in UI complexity, they get harder to test using the standard Cucumber or even RSpec integration test suites. This is because of introduction of JavaScript in your UI. You've probably been there before. Here's the use case: You've written your integration tests for your Rails app and up to this point, you've been able to get away with not tagging your cucumber scenarios with the "@javascript" tag. Everything is going smoothly and then it's time to implement that one UI feature that is going to require an Ajax call or some javascript to hide or unhide a crucial piece of the user experience. This must be included in your integration tests.

So you go through the pain of setting up cucumber to work with selenium and tag your scenario as javascript so that it will run the test in the browser. At first, there's this thrill of excitement as you get to see Firefox load, and then run through a series of steps, executing your tests and then seeing them pass. Job done.

But maybe there's a different scenario at play here. What if you don't do your development in an environment that has a browser? At End Point, we are strong advocates of doing development on the same environment that your app is going to run on. It eliminates unexpected issues down the road. We believe in it so much, actually, that we've created DevCamps that allows you to setup development environments on a server.

Obviously, your selenium based tests are not going to work here without some work to get it to run headless.

The good folks at thoughtbot have come up with a great solution to this and it is called capybara-webkit. Capybara webkit assumes that you are using capybara for your testing framework. If you are using webrat, the transition is fairly smooth. You'll probably only need to change a few minor details in your tests.

What capybara-webkit does for you is enable you to run your tests inside of webkit. This will simulate an environment that will be very close to what you would see in Google Chrome or Safari as well as many mobile browsers. I've found that except for some edge cases, it covers Firefox and IE as well.

To install capybara-webkit you will need to install the Qt development toolkit. It's fairly straight forward so I'll just refer you to the github wiki page for instructions for the various platforms. In Ubuntu, I just ran the following:

sudo apt-get install libqt4-dev

If you are installing on a server environment, you'll also need to install Xvfb. You can do that in Ubuntu with the following command:

sudo apt-get install xvfb

It's a little outside the scope of this blog post to go into other little things you need to setup with xvfb. The important thing is that you set it up to run on display 99. Another important note, is that you don't have to set it up to run on boot. We will be starting it up when we run our tests if it isn't running.

The next step is to configure your cucumber tests to use the capybara-webkit driver. To do that, add

gem "capybara-webkit"

to your Gemfile in the development and test group. Then in your env.rb file for cucumber add the following lines:

Capybara.javascript_driver = :webkit

In some cases, I've found it helpful to also specify a server port and app_host as follows:

Capybara.server_port = '8000'
Capybara.app_host = 'http://localhost:8000'

Now your tests are setup to run in webkit. The final step is running the tests. To do this, you'll need to run them from within xvfb. You can do that with the following command:

xvfb-run bundle exec cucumber

I've created an alias for this and dropped it in my .bashrc file. Here's my entry, but you can set it up anyway you'd like.

alias xcuke="xvfb-run bundle exec cucumber"

Now running tests is a simple as running xcuke from the root of my Rails application.

There are a couple of big benefits to running capybara-webkit. First is speed. In my experience tests run much faster than they do in Selenium. Second, all JavaScript errors are dumped to STDOUT so you can see them in the output of your cucumber tests. Third, all of your tests are being run on webkit instead of rack so you get a test environment that acts more like a real browser would behave.

Thanks to the guys at thoughtbot for putting together this awesome gem.

Semaphore limits and many Apache instances on Linux

On some of our development servers, we run many instances of the Apache httpd web server on the same system. By "many", I mean 30 or more separate Apache instances, each with its own configuration file and child processes. This is not unusual on DevCamps setups with many developers working on many projects on the same server at the same time, each project having a complete software stack nearly identical to production.

On Red Hat Enterprise Linux 5, with somewhere in the range of 30 to 40 Apache instances on a server, you can run into failures at startup time with this error or another similar one in the error log:

[error] (28)No space left on device: Cannot create SSLMutex

The exact error will depend on what Apache modules you are running. The "space left on device" error does not mean you've run out of disk space or free inodes on your filesystem, but that you have run out of SysV IPC semaphores.

You can see what your limits are like this:

# cat /proc/sys/kernel/sem
250 32000 32 128

I typically double those limits by adding this line to /etc/sysctl.conf:

kernel.sem = 500 64000 64 256

That makes sure you'll get the change at the next boot. To make the change take immediate effect:

# sysctl -p

With those limits I've run 100 Apache instances on the same server.

Working with constants in Ruby

Ruby is designed to put complete power into the programmer's hands and with great power comes great responsibility! This includes the responsibility for freezing constants. Here's an example of what someone might THINK is happening by default with a constant.

class Foo
  DEFAULTS = [:a, :b]
end

#irb
default = Foo::DEFAULTS
default << :c
Foo::DEFAULTS #=> [:a, :b, :c]  WHOOPS!

As you can see, assigning a new variable from a constant lets you modify what you thought was a constant! Needless to say, such an assumption would be very difficult to track down in a real application. Let's see how we might improve on this design. First, let's freeze our constant.

class Foo
  DEFAULTS = [:a, :b].freeze
end

#irb
default = Foo::DEFAULTS
default << :c #=>  ERROR can't modify frozen array

Now we'll get very specific feedback about offending code. The question is how can we use our constant now as a starting point for array, and still be able to modify it later? Let's look at some more code.

Foo::DEFAULTS.frozen? #=> true
Foo::DEFAULTS.clone.frozen? #=> true, this was my first guess, but it turns out we need...
Foo::DEFAULTS.dup.frozen? #=> false

It's worth reading the docs on clone and dup to understand there difference, but in short, clone replicates the internal state of the object while dup creates a new instance of the object. There was one more question I needed to answer; what would happen when I wanted to append another frozen array to a non-frozen array? Let's look to the code again!

default = Foo::DEFAULTS.dup  #not frozen
new_default = default + [:c].frozen
new_default.frozen? # false

So it seems that the initial state of the object carries the frozen state, allowing you to append frozen arrays without having to dup them. The moral of the story here is don't make assumptions about Ruby! One of the best ways to challenge your assumptions is with unit tests.

Performing Bulk Edits in Rails: Part 2

This is the second article in the series on how to perform a bulk edit in Rails. Let's recap our user's story from Part 1.

  • User makes a selection of records and clicks "Bulk Edit" button
  • User works with the same form they would use for a regular edit, plus
    • check boxes are added by each attribute to allow the user to indicate this variable should be affected by the bulk edit
    • only attributes which are the same among selected records should be populated in the form

Part 1 addressed the first part of our user story. Now that we have our user's selection, we need to create an interface to allow them to select attributes affected by the bulk edit. Let's start with the form we'll use to POST our input.

# app/controllers/bulk_edits_controller.rb

def new
  @foos = Foo.find(params[:stored_file_ids]) #params collected by work done in Part 1
  @foo = Foo.new
end


# app/views/bulk_edit/new.html.erb

<%= form_for @foo, :url => "/bulk_edits" do |f| %>
  <% @foos.each do |foo| %>
    <%= hidden_field_tag "foo_ids[]", foo.id %>
  <% end %>
  <%= render "foos/form", :f => f %>
  <%= f.submit %>
<% end %>

Let's first look at how we formed our form_for tag. Although this is a form for a Foo object, we don't want to POST to foos_controller#create so we add :url => "/bulk_edits" which will POST to the bulk_edits_controller#create. Additionally, we need to send along the foo_ids we eventually want to bulk update. Finally, we don't want to re-create the form we already have for Foo. By modifying one master form, we'll make long term maintenance easier. Now that we've got our form posting to the right place, let's see what modifications will need to make to our standard form to allow the user to highlight attributes they want to modify.

# app/views/foos/_form.html.erb

<%= check_box_tag "bulk_edit[]", :bar %>
<%= f.label :bar %>
<%= f.text_field :bar %>


Bulk edit check boxes appear in front of field names to let users know which fields will be modified.

We've added another check_box_tag to the form to record which attributes the user will select for bulk updating. However, we only want to display this when we're doing a bulk edit. Let's tweak this a bit further.

# app/views/foos/_form.html.erb

<%= bulk_edit_tag :bar %>
<%= f.label :bar %>
<%= f.text_field :bar %>

# app/helpers/foos_helper.rb

def bulk_edit_tag(attr)
  check_box_tag("bulk_edit[]", attr) if bulk_edit?
end

def bulk_edit?
  params[:controller] == "bulk_edits"
end

With these modifications to the form in place, the user can now specify which fields are eligible for bulk editing. Now we need the logic to determine how to populate the bar attribute based on the user's selection. This way, the user will see that an attribute is the same across all selected items. Let's revise our bulk edit controller.

# app/controllers/bulk_edit_controller.rb

def new
  @foos = Foo.find(params[:foo_ids])
  matching_attributes = Foo.matching_attributes_from(@foos)
  @foo = Foo.new(matching_attributes)
end


# app/models/foo.rb

def self.matching_attributes_from(foos)

  matching = {}
  attriubtes_to_match = Foo.new.attribute_names  #see attribute_names for more details

  foos.each do |foo|

    attributes_to_match.each do |attribute|

      value = foo.__send__(attribute)  #see send, invokes the method identified by symbol, use underscore version to avoid namespace issues

      if matching[attribute].nil?
        matching[attribute] = value  #assume it's a match

      elsif matching[attribute] != value
        matching[attribute] = "" #on the first mismatch, empty the value, but don't make it nil

      end

    end

  end
end



Only fields which are the same across all selected records will be populated. Other fields will be left blank by default.


With Foo#matching_attributes_for generating a hash of matching attributes, the form will only populate fields which match across all of the user's selected items. With our form in place, the last step is to actually perform the bulk edit.

# app/controllers/bulk_edits_controller.rb
def create
  if params.has_key? :bulk_edit

    foos = Foo.find(params[:foo_ids])
    foos.each do |foo|

        eligible_params = {}
        params[:bulk_edit].each do |eligible_attr|

            #create hash of eligible_attributes and the user's value
            eligible_params.merge! { eligible_attr => params[:foo][eligible_attr] } 

        end

        #update each record, but only with eligible attributes
        foo.update_attributes(eligible_params)

    end
  end
end

We've now completed the entire user story. Users are able to use check boxes to identify which attributes should be bulk updated. They also get to see which attributes match across their selection. Things are, of course, always more involved with a real production application. Keep in mind this example does not make good use of mass assignment protection using attr_accessible and forcing an empty whitelist of attributes by using config.active_record.whitelist_attributes = true. This is a best practice that should be implemented anytime you need sever-side validation of your forms.

Additionally, there may be cases where you want to perform bulk edits of more complex attributes, such as nested attributes. Consider appending your additional attributes to the Foo.new.attribute_names array and then tweaking the eligible attributes logic. Also consider implementing a maximum number of records which are able to be bulk edited at a time; wouldn't want your server to time out. Good luck!