News

Welcome to End Point’s blog

Ongoing observations by End Point people

Character encoding in perl: decode_utf8() vs decode('utf8')

When doing some recent encoding-based work in Perl, I found myself in a situation which seemed fairly unexplainable. I had a function which used some data which was encoded as UTF-8, ran Encode::decode_utf8() on said data to convert to Perl's internal character format, then converted the "wide" characters to the numeric entity using HTML::Entities::encode_entities_numeric(). Logging/printing of the data on input confirmed that the data was properly formatted UTF-8, as did running `iconv -f utf8 -t utf8 output.log >/dev/null` for the purposes of review.

However when I ended up processing the data, it was as if I had not run the decode function at all. In this case, the character in question was € (unicode code point U+20AC). The expected behavior from encode_entities_numeric() would be to turn any of the hi-bit characters in the perl string (i.e. all Unicode code points > 0x80) into the corresponding numeric entity (€ - € in this case). However instead of that specific character's numeric entity appearing in the output, the entities which appeared were: € i.e., the raw UTF-8 encoded value for €, with each octet being treated as an independent character instead of part of the whole encoded value.

What was particularly confusing was that extracting the relevant parts from the script in question resulted in the expected answer, so it was clearly not an issue of HTML::Entities not being able to deal with Unicode characters, as this code snippet demonstrates:

$ perl -MHTML::Entities+encode_entities_numeric -MEncode -e '$c=qq{\xE2\x82\xAC}; print encode_entities_numeric(decode_utf8($c))'
--> €

In the actual non-extracted version of the code, I was scratching my head. This was exhibiting the signs of doubly-encoded data, however I couldn't see how that could be the case. There were no PerlIO layers (e.g., :utf8 or :encoding) at play, the data I was outputting to a log file for verification purposes was being written via a brand new filehandle from a bare open(); I verified in multiple ways that the raw octets being passed in to the function were not doubly-encoded (printing the raw character points, counting lengths of the runs of octets and verifying that these matched the length of the UTF-8 encoded value for the represented characters, etc). The more things I tried the more puzzled I got. Finally, I changed the Encode::decode_utf8() call to a Encode::decode('utf8') one, providing the encoding explicitly. At this point, the processing pipeline started working as expected, and hi-bit characters were being output as their full numeric entities.

Since the documentation for decode_utf8 indicated that it should be identical to decode('utf8'), I resorted to the code to find out why it worked with the version that specified the encoding explicitly. I found that decode_utf8() does one additional thing that the regular decode('utf8') does not, and that is that before processing via the regular decode() function, decode_utf8 first checks the UTF-8 flag of the data that is being passed in, and if it is set it returns the data without further decoding*. My best guess is that this is to prevent errors if someone attempts to decode UTF-8 data in a string which is already in Perl's internal format, so in most cases this will provide a caller-friendly interface that will DWYM in many expected cases.

Armed with this knowledge, I verified that for some reason, the data that was being passed into the function had the UTF-8 flag set, so using the explicit decode('utf8') in lieu of decode_utf8() fixed the issue for me. (Tracing down the reason for the UTF-8 flag being set on this data was out of scope for this exercise, but is the true fix.) And just to verify that this was in fact the cause of the issue at hand, here's our example, modified slightly (we use the utf8::upgrade function to turn the UTF-8 flag on in the data and treat as actual encoded characters instead of raw octets):

$ perl -l -MHTML::Entities+encode_entities_numeric -MEncode -Mutf8 -e '$c=qq{\xE2\x82\xAC}; utf8::upgrade($c); print encode_entities_numeric(decode_utf8($c))'
--> €

* The UTF-8 flag is more-or-less an implementation detail of how Perl is able to deal with legacy 8-bit binary data in no particular encoding (i.e., raw octets, which it treats as latin-1) as well as the full range of Unicode data, and deal with both efficiently and in a backwards-compatible manner.

SearchToolbar and dropped Interchange sessions

A new update to Interchange's robots.cfg can be found here. This update adds "SearchToolbar" to the NotRobotUA directive which is used to exclude certain user agent strings when determining whether an incoming request is from a search engine robot or not. The SearchToolbar addon for IE and FireFox is being used more widely and we have received reports that users of this addon are unable to add items to their cart, checkout, etc. You may remember a similiar issue with the Ask.com toolbar that we discussed in this post. If you are using Interchange you should download the latest robots.cfg and restart Interchange.

Using "diff" and "git" to locate original revision/source of externally modified files

I recently ran into an issue where I had a source file of unknown version which had been substantially modified from its original form, and I wanted to find the version of the originating software that it had originally come from to compare the changes. This file could have come from any number of the 100 tagged releases in the repository, so obviously a hand-review approach was out of the question. While there were certainly clues in the source file (i.e., copyright dates to narrow down the range of commits to review) I thought up and used this technique:

Here are our considerations:

  • We know that the number of changes to the original file is likely small compared to the size of the file overall.
  • Since we're trying to uncover a likely match for the purposes of reviewing, exactness is not required; i.e., if there are lines in common with future releases, we're interested in the changes, so a revision with the fewest number of changes is preferred over finding the *exact* version of the file that this was originally based on.

The basic thought, then, is that we want to take the content of the unversioned file (i.e., the file that was changed) and find the revision of the corresponding file in the repository with the least number of changes, which we'll measure as the count of the lines in the source code diff. This struck me as similar to the copy detection that git does, insofar as it can detect content that is similar to some source content with a certain amount of tolerance for changes from the base. The difference in this case is that we're comparing content across a number of refs rather than across all of the blobs in a single ref. This recipe distilled down to the following bash command:

for ref in $(git tag);
do
    echo -n $ref;
    diff -w <(git show $ref:/path/to/versioned/file 2>/dev/null) modified_file | wc -l;
done | sort -k2 -n

The results of running this command is a list of the tags in the repository ordered by how similar they are to the target content (most similar first). A few comments:

  • We iterate through all tags in the project; while there could indeed be changes to the relevant file in intermediate versions, due to the way the release worked it's likely the original file was based on a released (aka tagged) version.
  • We're using diff's -w option, as the content may have changed spaces to tabs or vice versa, depending on the editor/editing habits of the original user. This helps us ensure that the changes that we're focusing on are the ones that change something substantial.
  • We're doing a numeric sort so the lines with the least number of changes show up at the top.
  • For the specific case I used this technique with, there were a number of revisions that had the least number of changed lines. Upon reviewing this smaller set of revisions (using the git diff rev1 rev2 -- path/to/content syntax), it turns out that the file in question had remained unchanged in each of these revisions, so any one of them was useful for my purposes.
  • The flexibility in the version detection works in this case because this was an isolated part of the system that did not have any changes or dependencies. If there had been important changes to the system as a whole independent of the changes to this file (but which had an affect on the operation of this specific part), we would need to have a more exact method of identifying the file.

Dissecting a Rails 3 Spree Extension Upgrade

A while back, I wrote about the release of Spree 0.30.* here and here. I didn't describe extension development in depth because I hadn't developed any Rails 3 extensions of substance for End Point's clients. Last month, I worked on an advanced reporting extension for a client running on Spree 0.10.2. I spent some time upgrading this extension to be compatible with Rails 3 because I expect the client to move in the direction of Rails 3 and because I wanted the extension to be available to the community since Spree's reporting is fairly lightweight.

Just a quick rundown on what the extension does: It provides incremental reports such as revenue, units sold, profit (calculated by sales minus cost) in daily, weekly, monthly, quarterly, and yearly increments. It reports Geodata to show revenue, units sold, and profit by [US] states and countries. There are also two special reports that show top products and customers. The extension allows administrators to limit results by order date, "store" (for Spree's multi-site architecture), product, and taxon. Finally, the extension provides the ability to export data in PDF or CSV format using the Ruport gem. One thing to note is that this extensions does not include new models – this is significant only because Rails 3 introduced significant changes to ActiveRecord, which are not described in this article.

Screenshots of the Spree Advanced Reporting extension.

To deconstruct the upgrade, I examined a git diff of the master and rails3 branch. I've divided the topics into Rails 3 and Spree specific categories.

Rails 3 Specific

SafeBuffers

In my report extension, I utilize Ruport's to_html method, which returns a ruport table object to an HTML table. With the upgrade to Rails 3, ruports to_html was spitting out escaped HTML with the addition of default XSS protection. The change is described here. and required the addition of a new helper method (raw) to yield unescaped HTML:

diff --git a/app/views/admin/reports/top_base.html.erb b/app/views/admin/reports/top_base.html.erb
index 6cc6b70..92f2118 100644
--- a/app/views/admin/reports/top_base.html.erb
+++ b/app/views/admin/reports/top_base.html.erb
@@ -1,4 +1,4 @@
-<%= @report.ruportdata.to_html %>
+<%= raw @report.ruportdata.to_html %>

Common Deprecation Messages

While troubleshooting the upgrade, I came across the following warning:

DEPRECATION WARNING: Using #request_uri is deprecated. Use fullpath instead. (called from ...)

I made several changes to address the deprecation warnings, did a full round of testing, and moved on.

diff --git a/app/views/admin/reports/_advanced_report_criteria.html.erb b/app/views/admin/reports/_advanced_report_criteria.html.erb
index ba69a2e..6d9c3f9 100644
--- a/app/views/admin/reports/_advanced_report_criteria.html.erb
+++ b/app/views/admin/reports/_advanced_report_criteria.html.erb
@@ -1,11 +1,11 @@
 <% @reports.each do |key, value| %>
-  <option <%= request.request_uri == "/admin/reports/#{key}" ? 'selected="selected" ' : '' %>
  value="<%= send("#{key}_admin_reports_url".to_sym) %>">
+  <option <%= request.fullpath == "/admin/reports/#{key}" ? 'selected="selected" ' : '' %>
  value="<%= send("admin_reports_#{key}_url".to_sym) %>">
     <%= t(value[:name].downcase.gsub(" ","_")) %>
   

Integrating Gems

An exciting change in Rails 3 is the advancement of Rails::Engines to allow easier inclusion of mini-applications inside the main application. In an ecommerce platform, it makes sense to break up the system components into Rails Engines. Extensions become gems in Spree and gems can be released through rubygems.org. A gemspec is required in order for my extension to be treated as a gem by my main application, shown below. Componentizing elements of a larger platform into gems may become popular with the advancement of Rails::Engines / Railties.

diff --git a/advanced_reporting.gemspec b/advanced_reporting.gemspec
new file mode 100644
index 0000000..71f00a8
--- /dev/null
+++ b/advanced_reporting.gemspec
@@ -0,0 +1,22 @@
+Gem::Specification.new do |s|
+  s.platform    = Gem::Platform::RUBY
+  s.name        = 'advanced_reporting'
+  s.version     = '2.0.0'
+  s.summary     = 'Advanced Reporting for Spree'
+  s.homepage    = 'http://www.endpoint.com'
+  s.author = "Steph Skardal"
+  s.email = "steph@endpoint.com"
+  s.required_ruby_version = '>= 1.8.7'
+
+  s.files        = Dir['CHANGELOG', 'README.md', 'LICENSE', 'lib/**/*', 'app/**/*']
+  s.require_path = 'lib'
+  s.requirements << 'none'
+
+  s.has_rdoc = true
+
+  s.add_dependency('spree_core', '>= 0.30.1')
+  s.add_dependency('ruport')
+  s.add_dependency('ruport-util') #, :lib => 'ruport/util')
+end

Routing Updates

With the release of Rails 3, there was a major rewrite of the router and integration of rack-mount. The Rails 3 release notes on Action Dispatch provide a good starting point of resources. In the case of my extension, I rewrote the contents of config/routes.rb:

Before
map.namespace :admin do |admin|
  admin.resources :reports, :collection => {
    :sales_total => :get,
    :revenue   => :get,
    :units   => :get,
    :profit   => :get,
    :count   => :get,
    :top_products  => :get,
    :top_customers  => :get,
    :geo_revenue  => :get,
    :geo_units  => :get,
    :geo_profit  => :get,
  }
  map.admin "/admin",
    :controller => 'admin/advanced_report_overview',
    :action => 'index'
end
After
Rails.application.routes.draw do
  #namespace :admin do
  #  resources :reports, :only => [:index, :show] do
  #    collection do
  #      get :sales_total
  #    end
  #  end
  #end
  match '/admin/reports/revenue' => 'admin/reports#revenue', :via => [:get, :post]
  match '/admin/reports/count' => 'admin/reports#count', :via => [:get, :post]
  match '/admin/reports/units' => 'admin/reports#units', :via => [:get, :post]
  match '/admin/reports/profit' => 'admin/reports#profit', :via => [:get, :post]
  match '/admin/reports/top_customers' => 'admin/reports#top_customers', :via => [:get, :post]
  match '/admin/reports/top_products' => 'admin/reports#top_products', :via => [:get, :post]
  match '/admin/reports/geo_revenue' => 'admin/reports#geo_revenue', :via => [:get, :post]
  match '/admin/reports/geo_units' => 'admin/reports#geo_units', :via => [:get, :post]
  match '/admin/reports/geo_profit' => 'admin/reports#geo_profit', :via => [:get, :post]
  match "/admin" => "admin/advanced_report_overview#index", :as => :admin 
end

Spree Specific

Transition extension to Engine

The biggest transition to Rails 3 based Spree requires extensions to transition to Rails Engines. In Spree 0.11.*, the extension class inherits from Spree::Extension and the path of activation for extensions in Spree 0.11.* starts in initializer.rb where the ExtensionLoader is called to load and activate all extensions. In Spree 0.30.*, extensions inherit from Rails:Engine which is a subclass of Rails::Railtie. Making an extension a Rails::Engine allows it to hook into all parts of the Rails initialization process and interact with the application object. A Rails engine allows you run a mini application inside the main application, which is at the core of what a Spree extension is – a self-contained Rails application that is included in the main ecommerce application to introduce new features or override core behavior.

See the diffs between versions here:

Before
diff --git a/advanced_reporting_extension.rb b/advanced_reporting_extension.rb
deleted file mode 100644
index f75d967..0000000
--- a/advanced_reporting_extension.rb
+++ /dev/null
@@ -1,46 +0,0 @@
-# Uncomment this if you reference any of your controllers in activate
-# require_dependency 'application'
-
-class AdvancedReportingExtension < Spree::Extension
-  version "1.0"
-  description "Advanced Reporting"
-  url "http://www.endpoint.com/"
-
-  def self.require_gems(config)
-    config.gem "ruport"
-    config.gem "ruport-util", :lib => 'ruport/util'
-  end
-
-  def activate
-    Admin::ReportsController.send(:include, AdvancedReporting::ReportsController)
-    Admin::ReportsController::AVAILABLE_REPORTS.merge(AdvancedReporting::ReportsController::ADVANCED_REPORTS)
-
-    Ruport::Formatter::HTML.class_eval do
-      # Override some Ruport functionality
-    end
-  end
-end
After
diff --git a/lib/advanced_reporting.rb b/lib/advanced_reporting.rb
new file mode 100644
index 0000000..4e6fee6
--- /dev/null
+++ b/lib/advanced_reporting.rb
@@ -0,0 +1,50 @@
+require 'spree_core'
+require 'advanced_reporting_hooks'
+require "ruport"
+require "ruport/util"
+
+module AdvancedReporting
+  class Engine < Rails::Engine
+    config.autoload_paths += %W(#{config.root}/lib)
+
+    def self.activate
+      #Dir.glob(File.join(File.dirname(__FILE__), "../app/**/*_decorator*.rb")) do |c|
+      #  Rails.env.production? ? require(c) : load(c)
+      #end
+
+      Admin::ReportsController.send(:include, Admin::ReportsControllerDecorator)
+      Admin::ReportsController::AVAILABLE_REPORTS.merge(Admin::ReportsControllerDecorator::ADVANCED_REPORTS)
+
+      Ruport::Formatter::HTML.class_eval do
+        # Override some Ruport functionality
+      end
+    end
+
+    config.to_prepare &method(:activate).to_proc
+  end
+end

Required Rake Tasks

Rails Engines in Rails 3.1 will allow migrations and public assets to be accessed from engine subdirectories, but a work-around is required to access migrations and assets in the main application directory in the meantime. There are a few options for accessing Engine migrations and assets; Spree recommends a couple rake tasks to copy assets to the application root, shown here:

diff --git a/lib/tasks/install.rake b/lib/tasks/install.rake
new file mode 100644
index 0000000..c878a04
--- /dev/null
+++ b/lib/tasks/install.rake
@@ -0,0 +1,26 @@
+namespace :advanced_reporting do
+  desc "Copies all migrations and assets (NOTE: This will be obsolete with Rails 3.1)"
+  task :install do
+    Rake::Task['advanced_reporting:install:migrations'].invoke
+    Rake::Task['advanced_reporting:install:assets'].invoke
+  end
+
+  namespace :install do
+    desc "Copies all migrations (NOTE: This will be obsolete with Rails 3.1)"
+    task :migrations do
+      source = File.join(File.dirname(__FILE__), '..', '..', 'db')
+      destination = File.join(Rails.root, 'db')
+      puts "INFO: Mirroring assets from #{source} to #{destination}"
+      Spree::FileUtilz.mirror_files(source, destination)
+    end
+
+    desc "Copies all assets (NOTE: This will be obsolete with Rails 3.1)"
+    task :assets do
+      source = File.join(File.dirname(__FILE__), '..', '..', 'public')
+      destination = File.join(Rails.root, 'public')
+      puts "INFO: Mirroring assets from #{source} to #{destination}"
+      Spree::FileUtilz.mirror_files(source, destination)
+    end
+  end
+end

Relocation of hooks file

A minor change with the extension upgrade is a relocation of the hooks file. Spree hooks allow you to interact with core Spree views, described more in depth here and here.

diff --git a/advanced_reporting_hooks.rb b/advanced_reporting_hooks.rb
deleted file mode 100644
index fcb5ab5..0000000
--- a/advanced_reporting_hooks.rb
+++ /dev/null
@@ -1,43 +0,0 @@
-class AdvancedReportingHooks < Spree::ThemeSupport::HookListener
-  # custom hooks go here
-end

diff --git a/lib/advanced_reporting_hooks.rb b/lib/advanced_reporting_hooks.rb
new file mode 100644
index 0000000..cca155e
--- /dev/null
+++ b/lib/advanced_reporting_hooks.rb
@@ -0,0 +1,3 @@
+class AdvancedReportingHooks < Spree::ThemeSupport::HookListener
+  # custom hooks go here
+end

Adoption of "Decorator" naming convention

A common behavior in Spree extensions is to override or extend core controllers and models. With the upgrade, Spree adopts the "decorator" naming convention:

Dir.glob(File.join(File.dirname(__FILE__), "../app/**/*_decorator*.rb")) do |c|
  Rails.env.production? ? require(c) : load(c)
end

I prefer to extend the controllers and models with module includes, but the decorator convention also works nicely.

Gem Dependency Updates

With the transition to Rails 3, I found that there were changes related to dependency upgrades. Spree 0.11.* uses searchlogic 2.3.5, and Spree 0.30.1 uses searchlogic 3.0.0.*. Searchlogic is the gem that performs the search for orders in my report to pull orders between a certain time frame or tied to a specific store. I didn't go digging around in the searchlogic upgrade changes, but I referenced Spree's core implementation of searchlogic to determine the required updates:

diff --git a/lib/advanced_report.rb b/lib/advanced_report.rb
@@ -13,11 +15,26 @@ class AdvancedReport
     self.params = params
     self.data = {}
     self.ruportdata = {}
+
+    params[:search] ||= {}
+    if params[:search][:created_at_greater_than].blank?
+      params[:search][:created_at_greater_than] =
+        Order.first(:order => :completed_at).completed_at.to_date.beginning_of_day
+    else
+      params[:search][:created_at_greater_than] =
+        Time.zone.parse(params[:search][:created_at_greater_than]).beginning_of_day rescue ""
+    end
+    if params[:search][:created_at_less_than].blank?
+      params[:search][:created_at_less_than] =
+        Order.last(:order => :completed_at).completed_at.to_date.end_of_day
+    else
+      params[:search][:created_at_less_than] =
+        Time.zone.parse(params[:search][:created_at_less_than]).end_of_day rescue ""
+    end
+
+    params[:search][:completed_at_not_null] ||= "1"
+    if params[:search].delete(:completed_at_not_null) == "1"
+      params[:search][:completed_at_not_null] = true
+    end
     search = Order.searchlogic(params[:search])
-    search.checkout_complete = true
     search.state_does_not_equal('canceled')
-
-    self.orders = search.find(:all)
+    self.orders = search.do_search 
 
     self.product_in_taxon = true
     if params[:advanced_reporting]

Rakefile diffs

Finally, there are substantial changes to the Rakefile, which are related to rake tasks and testing framework. These didn't impact my development directly. Perhaps when I get into more significant testing on another extension, I'll dig deeper into the code changes here.

diff --git a/Rakefile b/Rakefile
index f279cc8..f9e6a0e 100644
# lots of stuff

For those interested in learning more about the upgrade process, I recommend the reviewing the Rails 3 Release Notes in addition to reading up on Rails Engines as they are an important part of Spree's core and extension architecture. The advanced reporting extension described in this article is available here.

PostgreSQL 9.0 High Performance Review

I recently had the privilege of reading and reviewing the book PostgreSQL 9.0 High Performance by Greg Smith. While the title of the book suggests that it may be relevant only to PostgreSQL 9.0, there is in fact a wealth of information to be found which is relevant for all community supported versions of Postgres.

Acheiving the highest performance with PostgreSQL is definitely something which touches all layers of the stack, from your specific disk hardware, OS and filesystem to the database configuration, connection/data access patterns, and queries in use. This book gathers up a lot of the information and advice that I've seen bandied about on the IRC channel and the PostgreSQL mailing lists and presents it in one place.

While seemingly related, I believe some of the main points of the book could be summed up as:

  1. Measure, don't guess. From the early chapters which cover the lowest-level considerations, such as disk hardware/configuration to the later chapters which cover such topics as query optimization, replication and partitioning, considerable emphasis is placed on determining the metrics by which to measure performance before/after specific changes. This is the only way to determine the impact the changes you make have.
  2. Tailor to your specific needs/workflows. While there are many good rules of thumb out there when it comes to configuration/tuning, this book emphasizes the process of determining/refining those more general numbers to tailoring configuration/setup to your specific database's needs.
  3. Review the information the database system itself gives you. Information provided by the pg_stat_* views can be useful in identifying bottlenecks in queries, unused/underused indexes.

This book also introduced me to a few goodies which I had not encountered previously. One of the more interesting ones is the pg_buffercache contrib module. This suite of functions allows you to peek at the internals of the shared_buffers cache to get a feel for which relations are heavily accessed on a block-by-block basis. The examples in the book show this being used to more accurately size shared_buffers based on the actual number of accesses to specific portions of different relations.

I found the book to be well-written (always a plus when reading technical books) and felt it covered quite a bit of depth given its ambitious scope. Overall, it was an informative and enjoyable read.

PostgreSQL 9.0 Admin Cookbook

I've been reading through the recently published book PostgreSQL 9.0 Admin Cookbook of late, and found that it satisfies an itch for me, at least for now. Every time I get involved in a new project, or work with a new group of people, there's a period of adjustment where I get introduced to new tools and new procedures. I enjoy seeing new (and not uncommonly, better) ways of doing the things I do regularly. At conferences I'll often spend time playing "What's on your desktop" with people I meet, to get an idea of how they do their work, and what methods they use. Questions about various peoples' favorite window manager, email reader, browser plugin, or IRC client are not uncommon. Sometimes I'm surprised by a utility or a technique I'd never known before, and sometimes it's nice just to see minor differences in the ways people do things, to expand my toolbox somewhat. This book did that for me.

As the title suggests, authors Simon Riggs and Hannu Krosing have organized their book similarly to a cookbook, made up of simple "recipes" organized in subject groups. Each recipe covers a simple topic, such as "Connecting using SSL", "Adding/Removing tablespaces", and "Managing Hot Standby", with detail sufficient to guide a user from beginning to end. Of course in many of the more complex cases some amount of detail must be skipped, and in general this book probably won't provide its reader with an in depth education, but it will provide a framework to guide further research into a particular topic. It includes a description of the manuals, and locations of some of the mailing lists to get the researcher started.

I've used PostgreSQL for many different projects and been involved in the community for several years, so I didn't find anything in the book that was completely unfamiliar. But PostgreSQL is an open source project with a large community. There exists a wide array of tools, many of which I've never had occasion to use. Reading about some of them, and seeing examples in print, was a pleasant and educational experience. For instance, one recipe describes "Selective replication using Londiste". My tool of choice for such problems is generally Bucardo, so I'd not been exposed to Londiste's way of doing things. Nor have I used pgstatspack, a project for collecting various statistics and metrics from database views which is discussed under "Collecting regular statistics from pg_stat_* views".

In short, the book gave me the opportunity to look over the shoulder of experienced PostgreSQL users and administrators to see how they go about doing things, and compare to how I've done them. I'm glad to have had the opportunity.

Ruby on Rails versus CakePHP: A Syntax Comparison Guide

My time is typically split between Interchange and Spree development, but in a recent project for JackThreads, I jumped back into CakePHP code. CakePHP is one of the more popular PHP MVC frameworks and is inspired by Rails. I decided to put together a quick syntax comparison guide between CakePHP and Rails since I occasionally have to look up how to do some Rails-y thing in CakePHP.

Basic

Ruby on Rails CakePHP
MVC Code Inclusion Rails is typically installed as a gem and source code lives in the user's gem library. In theory, a modified version of the Rails source code can be "frozen" to your application, but I would guess this is pretty rare. CakePHP is typically installed in the application directory in a "cake/" directory. The "app/" directory contains application specific code. From my experience, this organization has allowed me to easily debug CakePHP objects, but didn't do much more for me.
Application Directory Structure
app/
  controllers/ models/ views/ helpers/ 
lib/
config/
public
  javascripts/ images/ stylesheets/
vendors/
  plugins/ extensions/
controllers/
models/
views/
  layouts/ elements/ ...
config/
webroot/
tmp/
plugins/
vendors/
Notes: In Rails, layouts live in app/views/layouts/. In CakePHP, layouts live in views/layouts/ and helpers lie in views/helpers/.
Creating an Application
rails new my_app # Rails 3 after gem installation
rails my_app # Rails <3
Download the compressed source code and create an application with the recommended directory structure.

Models

Ruby on Rails CakePHP
Validation
class Zone < ActiveRecord::Base
  validates_presence_of :name
  validates_uniqueness_of :name
end
class User extends AppModel {
  var $name = 'User';
  var $validate = array(
    'email' => array(
      'email-create' => array(
        'rule' => 'email',
        'message' => 'Invalid e-mail.',
        'required' => true,
        'on' => 'create'
      )
    )
  );
}
Relationships
class Order < ActiveRecord::Base
  belongs_to :user
  has_many :line_items
end
class Invite extends AppModel {
  var $name = 'Invite';
  var $belongsTo = 'User';
  var $hasMany = 'Campaigns';
}
Special Relationships
class Address < ActiveRecord::Base
  has_many :billing_checkouts,
    :foreign_key => "bill_address_id",
    :class_name => "Checkout"
end
class Foo extends AppModel {
  var $name = 'Foo';
  var $hasMany = array(
    'SpecialEntity' => array(
      'className' => 'SpecialEntity',
      'foreignKey' => 'entity_id',
      'conditions' =>
  array('Special.entity_class' => 'Foo'),
      'dependent' => true
    ),
  );
}

Controllers

Ruby on Rails CakePHP
Basic Syntax
class FoosController < ActionController::Base
  helper :taxons
  actions :show, :index

  include Spree::Search

  layout 'special'
end
class FooController extends AppController {
  var $name = 'Foo';
  var $helpers = array('Server', 'Cart');
  var $uses = array('SpecialEntity','User');
  var $components = array('Thing1', 'Thing2');
  var $layout = 'standard';
}
Notes: CakePHP and Rails use similar helper and layout declarations. In CakePHP, the $uses array initiates required models to be used in the controller, while in Rails all application models are available without an explicit include. In CakePHP, the $components array initiates required classes to be used in the controller, while in Rails you will use "include ClassName" to include a module.
Filters
class FoosController < ActionController::Base
  before_filter :load_data, :only => :show
end
class FooController extends AppController {
  var $name = 'Foo';

  function beforeFilter() {
    parent::beforeFilter();
    //do stuff
  } 
}
Setting View Variables
class FoosController < ActionController::Base
  def index
    @special_title = 'This is the Special Title!'
  end
end
class FooController extends AppController {
  var $name = 'Foo';

  function index() {
    $this->set('title',
      'This is the Special Title!');
  }
}

Views

Ruby on Rails CakePHP
Variable Display
<%= @special_title %>
<?= $special_title ?>
Looping
<% @foos.each do |foo| -%>
<%= foo.name %>
<% end -%>
<?php foreach($items as $item): ?>
<?= $item['name']; ?>
<?php endforeach; ?>
Partial Views or Elements
<%= render :partial => 'shared/view_name',
  :locals => { :b => "abc" } %>
<?php echo $this->element('account_menu',
  array('page_type' => 'contact')); ?>
Notes: In Rails, partial views typically can live anywhere in the app/views directory. A shared view will typically be seen in the app/views/shared/ directory and a model specific partial view will be seen in the app/views/model_name/ directory. In CakePHP, partial views are referred to as elements and live in the views/elements directory.
CSS and JS
<%= javascript_include_tag
  'my_javascript',
  'my_javascript2' %>
<%= stylesheet_link_tag
  'my_style' %>
<?php
  $html->css(array('my_style.css'),
    null, array(), false);
  $javascript->link(array('my_javascript.js'),
    false);
?>

Routing

Ruby on Rails CakePHP
Basic
# Rails 3
match '/cart',
  :to => 'orders#edit',
  :via => :get,
  :as => :cart
# Rails <3
map.login '/login',
  :controller => 'user_sessions',
  :action => 'new'
Router::connect('/refer',
  array('controller' => 'invites',
        'action' => 'refer'));
Router::connect('/sales/:sale_id',
  array('controller' => 'sale',
        'action' => 'show'),
  array('sale_id' => '[0-9]+')); 
Nested or Namespace Routing
# Rails 3
namespace :admin do
  resources :foos do
    collection do
      get :revenue
      get :profit
    end
  end
end

# Rails <3
map.namespace :admin do |admin|
  admin.resources :foos, :collection => {
    :revenue            => :get,
    :profit             => :get,
  }
end
-

Logging

Ruby on Rails CakePHP
Where to? tmp/log/production.log or tmp/log/debug.log tmp/logs/debug.log or tmp/logs/error.log
Logging Syntax
Rails.logger.warn "steph!" # Rails 3
logger.warn "steph!" # Rails <3
or
RAILS_DEFAULT_LOGGER.warn "steph!"
$this->log('steph!', LOG_DEBUG);

If you are looking for guidance on choosing one of these technologies, below are common arguments. In End Point's case, we choose whatever technology makes the most sense for the client. We implemented a nifty solution for JackThreads to avoid a complete rewrite, described here in detail. We also work with existing open source ecommerce platforms such as Interchange and Spree and try to choose the best fit for each client.

Pick Me!

Ruby on Rails CakePHP
  • Ruby is prettier than PHP.
  • Rails Object Oriented Programming implementation is more elegant than in CakePHP.
  • Rails routing is far superior to CakePHP routing.
  • Deployment and writing migrations are simpler with built in or peripheral tools.
  • Documentation of Rails is better than CakePHP.
  • CakePHP has better performance than Rails. UPDATE: This appears to be a rumor. Benchmark data suggests that Rails performs better than CakePHP.
  • PHP is supported on hosting providers better than Rails.
  • PHP developers are typically less expensive than Ruby/Rails developers.

Mongol Rally

This summer, End Point was pleased to be one of several sponsors of team One Steppe Beyond in the 2010 Mongol Rally. Team member Christopher Letters is the son of the owners of Crotchet Classical, a longtime End Point ecommerce client. Chris reports that they had a great time on the rally driving 10,849 miles to their destination in Mongolia!

You can read their dispatches from the road on their Mongol Rally team page.

Each team raises money for a charity, a minimum of £1000 per team. Team One Steppe Beyond chose the Christina Noble Children's Foundation which has a project in Ulaanbaatar, Mongolia.

Congratulations to team members Christopher, Dominic, and Thomas for finishing the race! It was obviously quite an adventure and for a good cause.

Liquid Galaxy Sysadmin+ Wanted

End Point Corporation is hiring for a motivated and creative GNU/Linux systems administrator. The work will primarily involve installing, supporting, maintaining and developing infrastructure improvements for Google Liquid Galaxy systems. Liquid Galaxy is an impressive panoramic system for Google Earth and other applications. Check it out!

Responsibilities:

  • Set up and upgrade Liquid Galaxy Systems at client locations. (Some travel is required, including internationally.)
  • Do on site and remote troubleshooting and support.
  • Participate in ongoing work to improve the system with automation, monitoring, and customizing configurations to clients' needs.
  • Provide first-class customer service.

Requirements:

  • BS degree or equivalent experience
  • At least 3 years of experience with Linux systems administration
  • Strong scripting skills in shell, and also Python, Ruby, Perl or PHP
  • Proven technical troubleshooting and performance tuning experience
  • Excellent analytical abilities along with a strong sense of ownership and urgency, plus the drive and ability to rise to new challenges and master new skills
  • Awareness and knowledge about security issues
  • Good communication skills
  • The basic physical fitness for putting together and breaking down the hardware components of the system

If you have experience with any of the following it is likely to be useful:

  • Geospatial systems
  • Sketchup, Building Maker, Blender, general 3D modelling
  • OpenGL application development
  • Image processing
  • Video capture, processing, and production technologies
  • Puppet.

While we have a strong preference that this position be a hire for our New York City office where most of our Liquid Galaxy team is located, we don't entirely rule out the possibility of hiring someone who works out of his or her home office if the fit is right.

Please email jobs@endpoint.com to apply.

Utah Open Source Conference 2010 part 1

It's been about a little over a month since the 2010 Utah Open Source Conference, and I decided to take a few minutes to review talks I enjoyed and link to my own talk slides.

Magento: Mac Newbold of Code Greene spoke on the Magento ecommerce framework for PHP. I've somewhat familiar with Magento, but a few things stood out:

  • He finds the Magento Enterprise edition kind of problematic because Varien won't support you if you have any unsupported extensions. Some of his customers had problems with Varien support and went back to the community edition.
  • Magento is now up to around 30 MB of PHP files!
  • As I've heard elsewhere, serious customization has a steep learning curve.
  • The Magento data model is an EAV (Entity-Attribute-Value) model. To get 50 columns of output requires 50+ joins between 8 tables (one EAV table for each value datatype).
  • There are 120 tables total in default install -- many core features don't use the EAV tables for performance reasons.
  • Another observation I've heard in pretty much every conversation about Magento: It is very resource intensive. Shared hosting is not recommended. Virtual servers should have a minimum of 1/2 to 1 GB RAM. Fast disk & database help most. APC cache recommended with at least 128 MB.
  • A lot of front-end things are highly adjustable from simple back-end admin options.
  • Saved credit cards are stored in the database, and the key is on the server. I didn't get a chance to ask for more details about this. I hope it's only the public part of a public/secret keypair!

It was a good overview for someone wanting to go beyond marketing feature lists.

Node.js: Shane Hansen of Backcountry.com spoke on Node, comparing it to Tornado and Twisted in Python. He calls JavaScript "Lisp in C's clothing", and says its culture of asynchronous, callback-driven code patterns makes Node a natural fit.

Performance and parallel processing are clearly strong incentives to look into Node. The echo server does 20K requests/sec. There are 2000+ Node projects on GitHub and 500+ packages in npm (Node Package Manager), including database drivers, web frameworks, parsers, testing frameworks, payment gateway integrations, and web analytics.

A few packages worth looking into further:

  • express - web microframework like Sinatra
  • Socket-IO - Web Sockets now; falls back to other things if no Web Sockets available
  • hummingbird - web analytics, used by Gilt.com
  • bespin - "cloud JavaScript editor"
  • yui3 - build HTML via DOM, eventbus, etc.
  • connect - like Ruby's Rack

I haven't played with Node at all yet, and this got me much more interested.

Metasploit: Jason Wood spoke on Metasploit, a penetration testing (or just plain penetrating!) tool. It was originally in Perl, and now is in Ruby. It comes with 590 exploits and has a text-based interactive control console.

Metasploit uses several external apps: nmap, Maltego (proprietary reconnaissance tool), Nessus (no longer open source, but GPL version and OpenVAS fork still available), Nexpose, Ratproxy, Karma.

The reconnaissance modules include DNS enumeration, and an email address collector that uses the big search engines.

It can backdoor PuTTY, PDFs, audio, and more.

This is clearly something you've got to experiment with to appreciate. Jason posted his Metasploit talk slides which have more detail.

So Many Choices: Web App Deployment with Perl, Python, and Ruby: This was my talk, and it was a lot of fun to prepare for, as I got to take time to see some new happenings I'd missed in these three languages communities' web server and framework space over the past several years.

The slides give pointers to a lot of interesting projects and topics to check out.

My summary was this. We have an embarrassment of riches in the open source web application world. Perl, Python, and Ruby all have very nice modern frameworks for developing web applications. They also have several equivalent solid options for deploying web applications. If you haven't tried the following, check them out:

That's about half of my notes on talks, but all I have time for now. I'll cover more in a later post.

(Image|Graphics)Magick trick for monitoring or visualizations

It's a good time for all when we start poking fun at the visual assault of stereotypical PPT Presentations. On the other hand, when data is presented in an effective visual format, human brains are able to quickly grasp the ideas involved and pick out important pieces of information, or "outliers".

Without getting into a long trumpeting session about the usefulness of data visualization (there are plenty of books on the subject), I'd like to jump directly into a Magick trick or two for creating simple visualizations.

Let's imagine we've got a group of machines serving a particular purpose. Now let's say I want quick insight into not only the internal activity of all 8 machines, but also what the systems believe they are sending to their displays.

With a little magick (of the ImageMagick or GraphicsMagick variety), we can save ourselves from running "ps" and "free" and from having to be in the same room (or the same country) as the system we're checking up on.

First, let's organize some simple output from the system:

$ echo -en "$(hostname) GPID: $( pgrep googleearth-bin ). APPID: $( pgrep -u root -f sbin/apache2 ).\nCRASH: $( ls -1 ${HOME}/.googleearth/crashlogs/ | wc -l ). MEMF: $( awk '/^MemFree/ {print $2$3}' /proc/meminfo )."

Which gives us output something like this:

lg1 - GPID: 5265. APPID: 10452.
CRASH: 3. MEMF: 4646240kB.

Cool, but we want to combine this with the imagery supposedly being currently displayed by X. So, we turn it into an image that we can overlay, like this:

$ echo -en "$(hostname) GPID: $( pgrep googleearth-bin ). APPID: $( pgrep -u root -f sbin/apache2 ).\nCRASH: $( ls -1 ${HOME}/.googleearth/crashlogs/ | wc -l ). MEMF: $( awk '/^MemFree/ {print $2$3}' /proc/meminfo )." | \
convert -pointsize 18 -background '#00000080' -fill white text:- -trim -bordercolor '#00000080' -border 5x5 miff:/tmp/text

This is one long command and might be hard to read, but it is simply using "convert" to turn the text output into a semi-transparent "miff" image for later use. It would be very easy to put the stat collection into a script on each host, but we're just going with quick and dirty at the moment.

Second, let's get our little overlay image composited with a screenshot from X:

$ DISPLAY=:0 import -window root miff:- | composite -gravity south -geometry +0+3 miff:/tmp/text miff:- -resize 600 miff:/tmp/$(hostname).miff

So, in a single pipeline we imported a screenshot of the root window, then used "composite" to overlay our semi-transparent stats image and resize the whole thing to be a bit more manageable.

Finally, we want to perform these things across all the systems and be left with something we can quickly glance at to see if there are obvious problems. So, let's create a quick shell loop and execute our commands via ssh, placing the resize/composite burden on the shoulders of each individual system (be sure to escape variables for remote interpolation!):

#!/bin/bash

#collect data first
for system in `seq 1 8`; do
 ssh user@$system "

echo -en \"\$(hostname) GPID: \$( pgrep googleearth-bin ). APPID: \$( pgrep -u root -f sbin/apache2 ).\nCRASH: \$( ls -1 \${HOME}/.googleearth/crashlogs/ | wc -l ). MEMF: \$( awk '/^MemFree/ {print \$2\$3}' /proc/meminfo )." | \
convert -pointsize 18 -background '#00000080' -fill white text:- -trim -bordercolor '#00000080' -border 5x5 miff:/tmp/text;

DISPLAY=:0 import -window root miff:- | \
composite -gravity south -geometry +0+3 miff:/tmp/text miff:- -resize 600 miff:-" >/tmp/system${system}.miff;

done

#make a montage of the data
montage -monitor -background black -tile 8x1 -geometry +5+0 \
 /tmp/system{6,7,8,1,2,3,4,5}.miff \
 /tmp/system-montage.png && rm -f /tmp/system?.miff

With something so simple, we can quickly view from New York what's happening on systems installed in California, like so:

montage example

Speeding up the Spree demo site

There's a lot that can be done to speed up Spree, and Rails apps in general. Here I'm not going to deal with most of that. Instead I want to show how easy it is to speed up page delivery using standard HTTP server tuning techniques, demonstrated on demo.spreecommerce.com.

First, let's get a baseline performance measure from the excellent webpagetest.org service using their remote Internet Explorer 7 tests:

  • First page load time: 2.1 seconds
  • Repeat page load time: 1.5 seconds

The repeat load is faster because the browser has images, JavaScript, and CSS cached, but it still has to check back with the server to make sure they haven't changed. Full details are in this initial report.

The demo.spreecommerce.com site is run on a Xen VPS with 512 MB RAM, CentOS 5 i386, Apache 2.2, and Passenger 2.2. There were several things to tune in the Apache httpd.conf configuration:

  • mod_deflate was already enabled. Good. That's a big help.
  • Enable HTTP keepalive: KeepAlive On and KeepAliveTimeout 3
  • Limit Apache children to keep RAM available for Rails: StartServers 5, MinSpareServers 2, MaxSpareServers 5
  • Limit Passenger pool size to 2 child processes (down from the default 6), to queue extra requests instead of using slow swap memory: PassengerMaxPoolSize 2
  • Enable browser & intermediate proxy caching of static files: ExpiresActive On and ExpiresByType image/jpeg "access plus 2 hours" etc. (see below for full example)
  • Disable ETags which aren't necessary once Expires is enabled: FileETag None and Header unset ETag
  • Disable unused Apache modules: free up memory by commenting out LoadModule proxy, proxy_http, info, logio, usertrack, speling, userdir, negotiation, vhost_alias, dav_fs, autoindex, most authn_* and authz_* modules
  • Disable SSLv2 (for security and PCI compliance, not performance): SSLProtocol all -SSLv2 and SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP

After making these changes, without tuning Rails, Spree, or the database at all, a new webpagetest.org run reports:

  • First page load time: 1.2 seconds
  • Repeat page load time: 0.4 seconds

That's an easy improvement, a reduction of 0.9 seconds for the initial load and 1.1 seconds for a repeat load! Complete details are in this follow-on report.

The biggest wins came from enabling HTTP keepalive, which allows serving multiple files from a single HTTP connection, and enabling static file caching which eliminates the majority of requests once the images, JavaScript, and CSS are cached in the browser.

Note that many of the resource-limiting changes I made above to Apache and Passenger would be too restrictive if more RAM or CPU were available, as is typical on a dedicated server with 2 GB RAM or more. But when running on a memory-constrained VPS, it's important to put such limits in place or you'll practically undo any other tuning efforts you make.

I wrote about these topics a year ago in a blog post about Interchange ecommerce performance optimization. I've since expanded the list of MIME types I typically enable static asset caching for in Apache. Here's a sample configuration snippet to put in the <VirtualHost> container in httpd.conf:

    ExpiresActive On
    ExpiresByType image/gif   "access plus 2 hours"
    ExpiresByType image/jpeg  "access plus 2 hours"
    ExpiresByType image/png   "access plus 2 hours"
    ExpiresByType image/tiff  "access plus 2 hours"
    ExpiresByType text/css    "access plus 2 hours"
    ExpiresByType image/bmp   "access plus 2 hours"
    ExpiresByType video/x-flv "access plus 2 hours"
    ExpiresByType video/mpeg  "access plus 2 hours"
    ExpiresByType video/quicktime "access plus 2 hours"
    ExpiresByType video/x-ms-asf  "access plus 2 hours"
    ExpiresByType video/x-ms-wm   "access plus 2 hours"
    ExpiresByType video/x-ms-wmv  "access plus 2 hours"
    ExpiresByType video/x-ms-wmx  "access plus 2 hours"
    ExpiresByType video/x-ms-wvx  "access plus 2 hours"
    ExpiresByType video/x-msvideo "access plus 2 hours"
    ExpiresByType application/postscript        "access plus 2 hours"
    ExpiresByType application/msword            "access plus 2 hours"
    ExpiresByType application/x-javascript      "access plus 2 hours"
    ExpiresByType application/x-shockwave-flash "access plus 2 hours"
    ExpiresByType image/vnd.microsoft.icon      "access plus 2 hours"
    ExpiresByType application/vnd.ms-powerpoint "access plus 2 hours"
    ExpiresByType text/x-component              "access plus 2 hours"

Of course you'll still need to tune your Spree application and database, but why not tune the web server to get the best performance you can there?

Keep the Aisles Clean at Checkout

It's no mystery in ecommerce that checkout processing must flow smoothly for an effective store. Providing products or services in high demand doesn't mean much if they cannot be purchased, or the purchase process is so burdensome that would-be customers give up in frustration.

Unfortunately, checkout also tends to include the most volatile elements of a web store. It virtually always involves database writes, which can be hindered by locking. It often involves real-time network access to 3rd-party providers, with payment transactions being at the top of the list. It can involve complex inventory assessments, where high concurrency can make what's normally routine highly unpredictable. Meanwhile, your customers wait, while the app sifts through complexity and waits on responses from various services. If they wait too long, you might lose sales; even worse, you might lose customers.

Even armed with the above knowledge, it's all too easy to fall into the trap of expediency. A particular action is so logically suited to be included as part of the checkout routine, and a superficial evaluation makes it seem like such a low-risk operation. That action can be tucked in there just after we've passed all the hurdles and are assured the order won't be rejected--why, it'll be so simple, and all the data we need for the action are readily at hand.

Just such expediency was at the heart of a checkout problem that had been plaguing an Interchange client of ours for months. The client would receive regular complaints that checkouts were timing out or taking so long that the customer was reloading and trying again. Many times, these customers would come to find that their orders had been placed, but that the time to complete them was exceeding the web server's timeout (or their patience). In far less common instances, but still occurring regularly, log and transaction evidence existed that showed an order attempt produced a valid payment transaction, but there was no hint of the order in their database or even in the application's system logs.

In the latter case of behavior, I had seen this before for other clients. If an action within order routing takes long enough, the Interchange server handling the request will be hammered by housekeeping. The telltale sign is the lack of log evidence for the attempt since order routes are logged at the end of the route's run; when that's interrupted, then no logging occurs.

I added considerably more explicit real-time logging and picked off some of the low-hanging fruit--code practices that had often been implicated before as the culprit in these circumstances. After collecting enough data for problematic order attempts, I was able to isolate the volatility to mail-list maintenance. The client utilizes a 3rd-party provider for managing their various mail lists, and that provider's API was contacted during order routing with all the data the provider needed for managing said lists. The data transfer for the API was very simple, and in most cases would process in sub-second time. Unfortunately, it turned out that, in enough cases, the calls to the API would take 10s to even 100s of seconds to process.

The placement of maintaining mail lists within order routing was merely convenience. The success or failure of adding to the mail lists was insignificant compared to the success or failure of the order itself. Once identified, the API calls were moved into a post-order processing routine, which was specifically built to anticipate the demonstrated volatility. As a result, complaints from customers on long or timed-out checkouts have dwindled to near zero, and the mail-list maintenance is more reliable since the background process is designed to catch excessively long process calls and retry until we receive an affirmative response from the list maintainers.

When deciding what belongs within checkout processing, ideally limit that activity to only those actions absolutely imperative to the success of the order. For each piece of functionality, ask yourself (or your client): is the outcome of this action worth adding to the wait a customer experiences placing an order? Should the outcome of this action affect whether the order attempt is successful? If the answer to those questions is "no", account for that action outside of checkout. It may be more work to do so, but keeping the checkout aisles clean, without obstruction, should be paramount.

Spree on Rails 3: Part Two

Yesterday, I discussed my experiences on getting Rails 3 based Spree up and running. I've explained in several blog articles (here and here) that customizing Spree through extensions will produce the most maintainable code – it is not recommended to work directly with source code and make changes to core classes or views. Working through extension development was one of my primary goals after getting Spree up and running.

To create an extension named "foo", I ran rails g spree:extension foo. Similar to pre-Rails 3.0 Spree, a foo directory is created (albeit inside the sandbox/) directory as a Rails Engine. The generator appends the foo directory details to the sandbox/ Gemfile. Without the Gemfile update, the rails project won't include the new foo extension directory (and encompassed functionality). I reviewed the extension directory structure and files and found that foo/lib/foo.rb was similar to the the *_extension.rb file.

New
require 'spree_core'

module Foo
  class Engine < Rails::Engine

    config.autoload_paths += %W(#{config.root}/lib)

    def self.activate
      # Activation logic goes here.  
      # A good use for this is performing
      # class_eval on classes that are defined
      # outside of the extension 
      # (so that monkey patches are not 
      # lost on subsequent requests in 
      # development mode.)
    end

    config.to_prepare &method(:activate).to_proc
  end
end
Old
class FooExtension < Spree::Extension
  version "1.0"
  description "Describe your extension here"
  url "http://www.endpoint.com/"

  def activate
    # custom application functionality here
  end 
end

I verified that the activate method was called in my extension with the following change:

require 'spree_core'

module Foo
  class Engine < Rails::Engine

    config.autoload_paths += %W(#{config.root}/lib)

    def self.activate
      Spree::BaseController.class_eval do
        logger.warn "inside base controller class eval"
      end
    end

    config.to_prepare &method(:activate).to_proc
  end
end

From here, The Spree Documentation on Extensions provides insight on further extension development. As I began to update an older extension, I ensured that my_extension/lib/my_extension.rb had all the necessary includes in the activate method and I copied over controller and library files to their new locations.

One issue that I came across was that migrations are not run with rake db:migrate and the public assets are not copied to the main project public directory on server restart. The documentation recommends building the migration within the application root (sandbox/), but this is not ideal to maintain modularity of extensions – each extension must include all of its migration files. To work-around this, it was recommended to copy over the install rake tasks from one of the core gems that copies migrations and public assets:

namespace :foo do
  desc "Copies all migrations and assets (NOTE: This will be obsolete with Rails 3.1)"
  task :install do
    Rake::Task['foo:install:migrations'].invoke
    Rake::Task['foo:install:assets'].invoke
  end

  namespace :install do

    desc "Copies all migrations (NOTE: This will be obsolete with Rails 3.1)"
    task :migrations do
      source = File.join(File.dirname(__FILE__), '..', '..', 'db')
      destination = File.join(Rails.root, 'db')
      puts "INFO: Mirroring assets from #{source} to #{destination}"
      Spree::FileUtilz.mirror_files(source, destination)
    end

    desc "Copies all assets (NOTE: This will be obsolete with Rails 3.1)"
    task :assets do
      source = File.join(File.dirname(__FILE__), '..', '..', 'public')
      destination = File.join(Rails.root, 'public')
      puts "INFO: Mirroring assets from #{source} to #{destination}"
      Spree::FileUtilz.mirror_files(source, destination)
    end

  end
end

After creating the extension based migration files and creating the above rake tasks, one would run the following from the application (sandbox/) directory:

steph@machine:/var/www/spree/sandbox$ rake foo:install
(in /var/www/spree/sandbox)
INFO: Mirroring assets from /var/www/spree/sandbox/foo/lib/tasks/../../db to /var/www/spree/sandbox/db
INFO: Mirroring assets from /var/www/spree/sandbox/foo/lib/tasks/../../public to /var/www/spree/sandbox/public

steph@machine:/var/www/spree/sandbox$ rake db:migrate
(in /var/www/spree/sandbox)
# migrations run

Some quick examples of differences in projeect setup and extension generation between Rails 3.* and Rails 2.*:

New
#clone project
#bundle install
rake sandbox
rails server
rails g spree:extension foo
rails g migration FooThing
Old
#clone project into "sandbox/"
rake db:bootstrap
script/server
script/generate extension Foo
script/generate extension_model Foo thing name:string start:date

Some of my takeaway comments after going through these exercises:

If there's anything I might want to learn about to work with edge Spree, it's Rails Engines. When you run Spree from source and use extensions, the architecture includes several layers of stacked Rails Engines:


Layers of Rails Engines in Spree with extensions.

After some quick googling, I found two helpful articles on Engines in Rails 3 here and here. The Spree API has been inconsistent until now - hopefully the introduction of Rails Engine will force the API to become more consistent which may improve the extension community.

I didn't notice much deviation of controllers, models, or views from previous versions of Spree, except for massive reorganization. Theme support (including Spree hooks) is still present in the core. Authorization in Spree still uses authlogic, but I heard rumors of moving to devise eventually. The spree_dash (admin dashboard) gem still is fairly lightweight and doesn't contain much functionality. Two fairly large code changes I noticed were:

  • The checkout state machine has been merged into order and the checkout model will be eliminated in the future.
  • The spree_promo gem has a decent amount of new functionality.

Browsing through the spree-user Google Group might reveal that there are still several kinks that need to be worked out on edge Spree. After these issues are worked out and the documentation on edge Spree is more complete, I will be more confident in making a recommendation to develop on Rails 3 based Spree.

Spree on Rails 3: Part One

A couple of weeks ago, I jumped into development on Spree on Rails 3. Spree is an open source Ruby on Rails ecommerce platform. End Point has been involved in Spree since its inception in 2008, and we continue to develop on Spree with a growing number of clients. Spree began to transition to Rails 3 several months ago. The most recent stable version of Spree (0.11.2) runs on Rails 2.*, but the edge code runs on Rails 3. My personal involvement of Rails 3 based Spree began recently; I waited to look at edge Spree until Rails 3 had a bit of momentum and until Rails 3 based Spree had more documentation and stability. My motivation for looking at it now was to determine whether End Point can recommend Rails 3 based Spree to clients and to share insight to my coworkers and other members of the Spree community.

First, I looked at the messy list of gems that have built up on my local machine throughout development of various Rails and Spree projects. I found this simple little script to remove all my old gems:

#!/bin/bash

GEMS=`gem list --no-versions`
for x in $GEMS; do sudo gem uninstall $x --ignore-dependencies -a; done

Then, I ran gem install rails to install Rails 3 and dependencies. The following gems were installed:

abstract (1.0.0)
actionmailer (3.0.1)
actionpack (3.0.1)
activemodel (3.0.1)
activerecord (3.0.1)
activeresource (3.0.1)
activesupport (3.0.1)
arel (1.0.1)
builder (2.1.2)
bundler (1.0.2)
erubis (2.6.6)
i18n (0.4.1)
mail (2.2.7)
mime-types (1.16)
polyglot (0.3.1)
rack (1.2.1)
rack-mount (0.6.13)
rack-test (0.5.6)
rails (3.0.1)
railties (3.0.1)
rake (0.8.7)
thor (0.14.3)
treetop (1.4.8)
tzinfo (0.3.23)

Next, I cloned the Spree edge with the following command from here:

git clone http://github.com/railsdog/spree.git

In most cases, developers will run Spree from the gem and not the source code (see the documentation for more details). In my case, I wanted to review the source code and identify changes. You might notice that the new spree core directory doesn't look much like the old one, which can be explained by the following: the Spree core code has been broken down into 6 separate core gems (api, auth, core, dash, promo, sample) that run as Rails Engines.

After checking out the source code, the first new task to run with edge Spree was bundle install. The bundler gem is intalled by default in Rails 3. It works out of the box in Rails 3, and can work in Rails 2.3 with additional file and configuration changes. Bundler is a dependency management tool. Gemfile and Gemfile.lock in the Spree core specify which gems are required for the application. Several gems were installed with Spree's bundler configuration, including:

Installing webrat (0.7.2.beta.1) 
Installing rspec-rails (2.0.0.beta.19) 
Installing ruby-debug-base (0.10.3) with native extensions 
Installing ruby-debug (0.10.3) 
Installing state_machine (0.9.4) 
Installing stringex (1.1.0) 
Installing will_paginate (3.0.pre2) 
Using spree_core (0.30.0.beta2) from source at /var/www/spree 
Using spree_api (0.30.0.beta2) from source at /var/www/spree 
Using spree_auth (0.30.0.beta2) from source at /var/www/spree 
Using spree_dash (0.30.0.beta2) from source at /var/www/spree 
Using spree_promo (0.30.0.beta2) from source at /var/www/spree
Using spree_sample (0.30.0.beta2) from source at /var/www/spree

The only snag I hit during bundle install was that the nokogiri gem required two dependencies be installed on my machine (libxslt-dev and libxml2-dev).

To create a project and run all the necessary setup, I ran rake sandbox, which completed the tasks listed below. The tasks created a new project, completed the basic gem setup, installed sample data and images, and ran the sample data bootstrap migration. In some cases, Spree sample data will not be used – the latter two steps can be skipped. The sandbox/ application directory contained a directory of folders that one might expect when developing in Rails (app, db, lib, etc.) and sandbox/ itself runs as a Rails Engine.

steph@machine:/var/www/spree$ rake sandbox
(in /var/www/spree)
         run  rails new sandbox -GJT from "."
      append  sandbox/Gemfile
         run  rails g spree:site -f from "./sandbox"
         run  rake spree:install from "./sandbox"
         run  rake spree_sample:install from "./sandbox"
         run  rake db:bootstrap AUTO_ACCEPT=true from "./sandbox"

After setup, I ran rails server, the new command for starting a server in Rails 3.*, and verified my site was up and running.


Hooray - it's up!

There wasn't much to getting a Rails 3 application up and running locally. I removed all my old gems, installed Rails 3, grabbed the repository, allowed bundler to install dependencies and worked through one snag. Then, I ran my Spree specific rake task to setup the project and started the server. Tomorrow, I share my experiences on extension development in Rails 3 based Spree.

check_postgres meets pgbouncer

Recently the already well-known PostgreSQL monitoring tool check_postgres gained an ability to monitor pgbouncer, the PostgreSQL connection pooling daemon more closely. Previously check_postgres could verify pgbouncer was correctly proxying connections, and make sure its settings hadn't been modified. The pgbouncer administrative console, reports many useful pgbouncer statistics and metrics; now check_postgres can monitor some of those as well.

pgbouncer's description of its pools consists of "client" elements and "server" elements. "Client" refers to connections coming from clients, and "server" to connections to the PostgreSQL server. The new check_postgres actions pay attention only to the pgbouncer "SHOW POOLS" command, which provides the following metrics:

  • cl_active: Connections from clients which are associated with a PostgreSQL connection. Use the pgb_pool_cl_active action.
  • cl_waiting: Connections from clients that are waiting for a PostgreSQL connection to service them. Use the pgb_pool_cl_waiting action.
  • sv_active: Connections to PostgreSQL that are in use by a client connection. Use the pgb_pool_sv_active action.
  • sv_idle: Connections to PostgreSQL that are idle, ready to service a new client connection. Use the pgb_pool_sv_idle action.
  • sv_used: PostgreSQL connections recently released from a client session. Use the pgb_pool_sv_used action.
  • sv_tested: PostgreSQL connections in process of being tested. Use the pgb_pool_sv_tested action.
  • sv_login: PostgreSQL connections currently logging in. Use the pgb_pool_sv_login action.
  • maxwait: The length of time the oldest waiting client has been waiting for a connection. Use the pgb_pool_maxwait action.

Most installations probably don't want any client connections stuck waiting for PostgreSQL connections to service them, meaning the cl_waiting and maxwait metrics ought to be zero. This example will check those two metrics and complain when they're nonzero, for a pgbouncer installation on port 5433 with pools "pgbouncer" and "maindb":

postgres@db:~$ ./check_postgres.pl --action=pgb_pool_cl_waiting -p 5433 -w 3 -c 8
POSTGRES_PGB_POOL_CL_WAITING OK: (port=5433) pgbouncer=0 * maindb=0 | time=0.01 time=0.01

postgres@db:~$ ./check_postgres.pl --action=pgb_pool_maxwait -p 5433 -w 5 -c 15 
POSTGRES_PGB_POOL_MAXWAIT OK: (port=5433) pgbouncer=0 * maindb=0 | time=0.01 time=0.01

The typical check_postgres filtering rules will work; to filter out a pool called "ignore_this_pool", for instance, add --exclude ignore_this_pool to the command line. Other connection options mean exactly what they would when connection to PostgreSQL directly.

These new actions are available in the latest version from git.

Youth Debate and other client news

I want to draw attention to several of our clients who have been in the news lately:

The Youth Debate 2010 site is live and currently accepting question submissions from youth interested in hearing video responses from the DNC & RNC chairmen before the November midterm elections. It's a simple site, developed and deployed quickly with an eye toward handling a very high amount traffic over a short period of time. We're excited to see what questions and answers come out of the project.

Jared Loftus, entrepreneur and owner of The College District, was profiled in a recent BusinessReport.com article about his business. We've written about Jared's business and some of the technical details underpinning his sites in past blog posts, including one upon launch of 4 additional sites and one comparing College District multi-site architecture to Spree.

Our client FLOR.com is a well-known retail modular carpet seller and a division of the public company Interface, Inc. We've been pleased to work with them to add new features to and support the operations of their ecommerce system for the past 3 years. Interface's founder, Ray Anderson, has been on a mission to reduce negative environmental impact made during their manufacturing process. He published a book, Confessions of a Radical Industrialist, and has been speaking about it and opening eyes to the possibilities for improvement.

We're always happy to see our clients doing interesting things and getting the attention they deserve!

Cross Browser Development: A Few CSS and JS Issues

Coding cross browser friendly JavaScript and CSS got you down? In a recent project, Ron, David, and I worked through some painful cross browser issues. Ron noted that he even banged his head against the wall over a couple of them :) Three of these issues come up frequently in my other projects full of CSS and JS development, so I wanted to share.

Variable Declaration in JS

In several cases, I noticed that excluding variable declaration ("var") resulted in broken JavaScript-based functionality in IE only. I typically include variable declaration when I'm writing JavaScript. In our project, we were working with legacy code and conflicting variable names may have be introduced, resulting in broken functionality. Examples of before and after:

Bad Better
var display_cart_popup = function() {
    popup_id = '#addNewCartbox';
    left = (parseInt($(window).width()) - 772) / 2;
    ...
};
var display_cart_popup = function() {
    var popup_id = '#addNewCartbox';
    var left = (parseInt($(window).width()) - 772) / 2;
    ...
};
...
address_display = '';

country = $(type+'_country').value;
address = $(type+'_address').value;
address2 = $(type+'_address2').value;
city = $(type+'_city').value;
state = $(type+'_state').value;
zip = $(type+'_zip').value;
...
...
var address_display = '';

var country = $(type+'_country').value;
var address = $(type+'_address').value;
var address2 = $(type+'_address2').value;
var city = $(type+'_city').value;
var state = $(type+'_state').value;
var zip = $(type+'_zip').value;
...

I researched this to gain more insight, but I didn't find much except a reiteration that when you create variables without the "var" declaration, they become global variables which may have resulted in conflicts. However, all the "learning JavaScript" documentation I browsed through includes variable declaration and there's no reason to leave it out for these lexically scoped variables.

Trailing Commas in JSON objects

According to JSON specifications, trailing commas are not permitted (e.g obj = { "1" : 2, }). From my experience, JSON objects with trailing commas might work in Firefox and WebKit browsers, but it dies silently in IE. Some recent examples:

Bad Better

//JSON response from an ajax call
// if $add_taxes is not true, the carttotal element will be the last element of the list and it will end with a comma

{
  "response_message"    : '<?= $response_message ?>',
  "subtotal"            : <?= $subtotal ?>, 
  "shipping_cost"       : <?= $shipping ?>, 
  "carttotal"           : <?= $carttotal ?>, 
<?php if($add_taxes) { ?>
  "taxes"               : <?= $taxes ?>
<?php } ?>
}

//JSON response from an ajax call
//No matter the value of $add_taxes, the carttotal element is the last element and it does not end in a comma

{
  "response_message"    : '<?= $response_message ?>',
  "subtotal"            : <?= $subtotal ?>, 
  "shipping_cost"       : <?= $shipping ?>,  
<?php if($add_taxes) { ?>
  "taxes"               : <?= $taxes ?>,
<?php } ?>
  "carttotal"           : <?= $carttotal ?>
}

//Page load JSON object defined
//Last element in array will end in a comma

var fonts = {
[loop list=`$Scratch->{fonts}`]
    '[loop-param name]' : {
      'bold' : "[loop-param bold]",
      'italic' : "[loop-param italic]"
    },[/loop]
};

//Page load JSON object defined
//A dummy object is appended to the fonts JSON object
//Additional logic is added elsewhere to determine if the object is a "dummy" or not

var fonts = {
[loop list=`$Scratch->{fonts}`]
    '[loop-param name]' : {
      'bold' : "[loop-param bold]",
      'italic' : "[loop-param italic]"
     },[/loop]
    'dummy' : {}
};

Additional solutions to avoid the trailing comma include using join (Perl, Ruby) or implode (PHP), conditionally excluding the comma on the last element of the array, or using library methods to serialize data to JSON.

Floating Elements in IE

Often times, you'll get a design like the one shown below. There will be a static width and repeating components to span the entire width. You may programmatically determine how many repeating elements will be displayed, but using CSS floating elements yields the cleanest code.


Example of a given design with repeating elements to span a static width.

You start working in Chrome or Firefox and apply the following CSS rules:


CSS rules for repeating floating elements.

When you think you're finished, you load the page in IE and see the following. Bummer!


Floating elements wrap incorrectly in IE.

This is a pretty common scenario. In IE, if the combined widths of consecutive floating elements is greater than or equal to 100% of the available width, the latter floating element will jump down based on the IE float model. Instead of using floating elements, you might consider using tables or CSS position rules, but my preference is to use tables only for elements that need vertical align settings and to stay away from absolute positioning completely. And I try to stay away from absolute positioning in general.

The simplest and minimalist change I've found to work can be described in a few steps. Let's say your floating elements are <div>'s inside a <div> with an id of "products":

<div id="products">
  <div class="product">product 1</div>
  <div class="product">product 2</div>
  <div class="product" class="last">product 3</div>
  <div class="product">product 4</div>
  <div class="product">product 5</div>
  <div class="product" class="last">product 6</div>
</div>

And let's assume we have the following CSS:

<style>
div#products { width: 960px; }
div.product { float: left; width: 310px; margin-right: 15px; height: 100px; }
div.last { margin-right: 0px; }
</style>

Complete these steps:

  • First, add another div to wrap around the #products div, with an id of "outer_products"
  • Next, update the 'div#products' width to be greater than 960 pixels by several pixels.
  • Next, add a style rule for 'div#outer_products' to have a width of "960px" and overflow equal to "hidden".

Yielding:

<div id="outer_products">
  <div id="products">
    <div class="product">product 1</div>
    <div class="product">product 2</div>
    <div class="product" class="last">product 3</div>
    <div class="product">product 4</div>
    <div class="product">product 5</div>
    <div class="product" class="last">product 6</div>
  </div>
</div>

And:

<style>
div#outer_products { width: 960px; overflow: hidden; }
div#products { width: 980px; }
div.product { float: left; width: 310px; margin-right: 15px; height: 100px; }
div.last { margin-right: 0px; }
</style>

The solution is essentially creating a "display window" (outer_products), where overflow is hidden, but the contents are allowed to span a greater width in the inside <div> (products).


The white border outlines the outer_products "display window".

Some other issues that I see less frequently include the double-margin IE6 bug, chaining CSS in IE, and using '#' vs. 'javascript:void(0);'.