End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

MySQL Integer Size Attributes

MySQL has those curious size attributes you can apply to integer data types. For example, when creating a table, you might see:

mysql> CREATE TABLE foo (
    -> field_ti tinyint(1),
    -> field_si smallint(2),
    -> field_int int(4),
    -> field_bi bigint(5)
    -> );
Query OK, 0 rows affected (0.05 sec)

mysql> desc foo;
+-----------+-------------+------+-----+---------+-------+
| Field     | Type        | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| field_ti  | tinyint(1)  | YES  |     | NULL    |       |
| field_si  | smallint(2) | YES  |     | NULL    |       |
| field_int | int(4)      | YES  |     | NULL    |       |
| field_bi  | bigint(5)   | YES  |     | NULL    |       |
+-----------+-------------+------+-----+---------+-------+
3 rows in set (0.03 sec)

mysql>

I had always assumed those size attributes were limiters, MySQL's way of providing some sort of constraint on the integers allowed in the field. While doing some recent work for a MySQL client, I attempted to enforce the range of a tinyint according to that assumption. In reality, I only wanted a sign field, and would have liked to have applied a "CHECK field IN (-1,1)", but without check constraints I figured at least keeping obviously incorrect data out would be better than nothing.

I wanted to see what MySQL's behavior would be on data entry that failed the limiters. I was hoping for an error, but expecting truncation. What I discovered was neither.

mysql> INSERT INTO foo (field_ti) VALUES (-1);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT field_ti FROM foo;
+----------+
| field_ti |
+----------+
|       -1 |
+----------+
1 row in set (0.00 sec)

mysql> INSERT INTO foo (field_ti) VALUES (1);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT field_ti FROM foo;
+----------+
| field_ti |
+----------+
|       -1 |
|        1 |
+----------+
2 rows in set (0.00 sec)

mysql> INSERT INTO foo (field_ti) VALUES (10);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT field_ti FROM foo;
+----------+
| field_ti |
+----------+
|       -1 |
|        1 |
|       10 |
+----------+
3 rows in set (0.00 sec)

mysql> INSERT INTO foo (field_ti) VALUES (100);
Query OK, 1 row affected (0.00 sec)

mysql> SELECT field_ti FROM foo;
+----------+
| field_ti |
+----------+
|       -1 |
|        1 |
|       10 |
|      100 |
+----------+
4 rows in set (0.00 sec)

mysql>

Two possible conclusions followed immediately: either the limiter feature was horribly broken, or those apparent sizes didn't represent a limiter feature. A full review of MySQL's Numeric Types documentation provided the answer:

MySQL supports an extension for optionally specifying the display width of integer data types in parentheses following the base keyword for the type. For example, INT(4) specifies an INT with a display width of four digits. This optional display width may be used by applications to display integer values having a width less than the width specified for the column by left-padding them with spaces. (That is, this width is present in the metadata returned with result sets. Whether it is used or not is up to the application.) The display width does not constrain the range of values that can be stored in the column.

And, so, the lesson is repeated: Beware assumptions.

A Product Variant Code Challenge

A while ago, I came across a cute little Ruby challenge that looked interesting:

Given an array of arrays of possible values, enumerate all combinations that can occur, preserving order. For instance:

Given: [[1,2,3], [4,5,6], [7,8,9]], calculate the same result as the code below, but do so with an arbitrary size array:

combos = []
[1,2,3].each do |v1|
  [4,5,6].each do |v2|
    [7,8,9].each do |v3|
      combos << [v1, v2, v3]
    end
  end
end
combos

Entries can be written using one or more functions, and may optionally be written as a class extension (i.e. Array)....

And now some context to why I thought this was applicable to ecommerce: Let's imagine you have a product. Then, let's imagine that the product has variations, or variants. We'll say we have an arbitrary size array of "option types", each with an arbitrary number of items in the array. Here, we have option types of size, color, and printed logo, which yields multiple variations, variants or combinations of a single product:

Size Color Logo
Large
 
Red
Medium
 
Blue
Small

And let's give a real-life example data model, this one from a previous article on Spree's product data model:

Given this context, we can say that the outer array dimension represents product option types with individual values and that this challenge is asking for a simple method to create a list of all possible variants or variations of the product. In our t-shirt example, the solution to the example is:

[["large", "blue", "twitter"],
["large", "blue", "facebook"],
["large", "blue", "tumblr"],
["large", "blue", "blogger"],
["large", "red", "twitter"],
["large", "red", "facebook"],
["large", "red", "tumblr"],
["large", "red", "blogger"],
["medium", "blue", "twitter"],
["medium", "blue", "facebook"],
["medium", "blue", "tumblr"],
["medium", "blue", "blogger"],
["medium", "red", "twitter"],
["medium", "red", "facebook"],
["medium", "red", "tumblr"],
["medium", "red", "blogger"],
["small", "blue", "twitter"],
["small", "blue", "facebook"],
["small", "blue", "tumblr"],
["small", "blue", "blogger"],
["small", "red", "twitter"],
["small", "red", "facebook"],
["small", "red", "tumblr"],
["small", "red", "blogger"]]

Unfortunately, the original contest only received 2 submissions, so I wanted to open the door here to allow people to submit more submissions in Ruby and any other language. Please include a link to the gist or code solution and in a few weeks, I'll update the post with several submissions and links to the submissions, including my own solution. It's a nice little puzzle to take on in your desired language.

Three Liquid Galaxy Projects Accepted for Google Summer of Code 2011

Yesterday the student proposals accepted for the Google Summer of Code program were announced. The Liquid Galaxy Project, participating in the program for the first time, accepted three proposals. These projects will dramatically extend the functionality of the Liquid Galaxy system:

New Control Input Devices and Distributed GL Rendering -- Paul Hunkin, mentored by Andrew Leahy
Paul is a PhD student and has been running Liquid Galaxy on his university display wall for some time now. This project will initially target Microsoft's Kinect as an input device for Liquid Galaxy, and may also leverage ClusterGL for distributed GL Rendering of Google Earth and other OpenGL applications.

Network Sync in Irrlicht -- Ben Wright, mentored by Ben Goldstein
Google Earth is certainly a "killer app" for the Liquid Galaxy platform, but there are many other applications that could be easily enhanced, coordinating multiple instances rendering portions of a panoramic view. This project will modify the 3D Open-Source graphics engine Irrlicht, enabling many 3D applications using this platform.

Android Phone Accelerometer as Liquid Galaxy Input Device -- Reese Butler, mentored by Adam Vollrath
Currently, the most popular method of controlling Liquid Galaxy setups is through the use of a 3D Connexion Space Navigator and a touchscreen. This project aims to improve Liquid Galaxy’s accessibility by adding Android phones to the list of input devices. Since many people own Android phones already, this project would effectively eliminate the need to purchase additional peripherals in order to control a Liquid Galaxy setup, while enabling innovative ways of interacting with the system.

The GSoC 2011 coding period officially begins May 24th and wraps up in August. Follow these projects and other development on the project's mailing list.

Google's Summer of Code program makes an enormous contribution to the Open Source community and provides great (paid!) experience with Open Source development for many talented students. At End Point we are thrilled that The Liquid Galaxy Project will benefit from this great program.

Postgres query caching with DBIx::Cache

A few years back, I started working on a module named DBIx::Cache which would add a caching layer at the database driver level. The project that was driving it got put on hold indefinitely, so it's been on my long-term todo list to release what I did have to the public in the hope that someone else may find it useful. Hence, I've just released version 1.0.1 of DBIx::Cache. Consider it the closest thing Postgres has at the moment for query caching. :) The canonical webpage:

http://bucardo.org/wiki/DBIx-Cache

You can also grab it via git, either directly:

git clone git://bucardo.org/dbixcache.git/

or through the indispensable github:

https://github.com/bucardo/dbixcache

So, what does it do exactly? Well, the idea is that certain queries that are either repeated often and/or are very expensive to run should be cached somewhere, such that the database does not have to redo all the same work, just to return the same results over and over to the client application. Currently, the best you can hope for with Postgres is that things are in RAM from being run recently. DBIx::Cache changes this by caching the results somewhere else. The default destination is memcached.

DBIx::Cache acts as a transparent layer around your DBI calls. You can control which queries, or classes of queries get cached. Most of the basic DBI methods are overridden so that rather than query Postgres, they actually query memcached as needed (or other caching layer - could even query back into Postgres itself!). Let's look at a simple example:

use strict;
use warnings;
use Data::Dumper;
use DBIx::Cache;
use Cache::Memcached::Fast;

## Connect to an existing memcached server, 
## and establish a default namespace
my $mc = Cache::Memcached::Fast->new(
  {
    servers   => [ { address => 'localhost:11211' } ],
    namespace => 'joy',
  });

## Rather than DBI->connect, use DBIx->connect
## Tell it what to use as our caching source
## (the memcached server above)
my $dbh = DBIx::Cache->connect('', '', '',
  { RaiseError => 1,
    dxc_cachehandle => $mc
});

## This is an expensive query, that takes 30 seconds to run:
my $SQL = 'SELECT * FROM analyze_sales_data()';

## Prepare this query
my $sth = $dbh->prepare($SQL);

## Run it ten times in a row.
## The first time takes 30 seconds, the other nine return instantly.
for (1..10) {
    my $count = $sth->execute();
 my $info = $sth->fetchall_arrayref({});
    print Dumper $info;
} 

In the above, the prepare($SQL) is actually calling the DBIx::Class::prepare method. This parses the query and tries to determine if it is cacheable or not, then stores that decision internally. Regardless of the result, it calls DBI::prepare (which is techincally DBD::Pg::prepare), and returns the result.The magic comes in the call to execute() later on. As you might imagine, this is also actually the DBIx::Class::execute() method. If the query is not cacheable, it simply runs it as normal and returns. If it is cacheable, and this is the first time it is run, DBIx::Class runs an EXPLAIN EXECUTE on the original statement, and parses out a list of all tables that are used in this query. Then it caches all of this information into memcached, so that subsequent runs using the same list of arguments to execute() don't need to do that work again.

Finally, we come to fetchall_arrayref(). The first time it is run, we simply call the parent methods and get the data back. Then we build unique keys and store the results of the query into memcached. Finally, we mark the execute() as fully cached. Thus, on subsequent calls to execute(), we don't actually execute anything on the database server, but simply return the count as stashed inside of memcached (in the case of execute, this is the number of affected rows). For the various fetch() methods, we do the same thing - rather than fetch things from the database (via DBI, DBD::Pg, and libpq), we get the results from memcached (frozen via Data::Dumper), and then unpack and return them. Since we don't actually need to do any work against the database, everything returns as fast as we can query memcached - which is in general very fast indeed.

Most of the above is working, but the piece that is not written is the cache invalidation. DBIx::Cache knows which tables go to which queries, so in theory you could have (for example), an UPDATE/INSERT/DELETE trigger on table X which calls DBIx::Cache and tells it to invalidate all items related to table X, so that the next call to prepare() or execute() or fetch() will not find any memcached matches and re-run the whole query and store the results. You could also simply handle that in your application, of course, and have it decide when to invalidate items.

It's been a while since I've really looked at the code, but as far as I can tell it is close to being able to actually use somewhere. :) Patches and questions welcome!

RHEL 5 SELinux initscripts problem

I ran into a strange problem updating Red Hat Enterprise Linux 5 a few months ago, and just ran into it again and this time better understood what went wrong.

The problem was serious: After a `yum upgrade` of a RHEL 5 x86_64 server with SELinux enforcing, it never came back after a reboot. Logging into the console I could see that it was stuck in single user mode because there were no init scripts! Investigation showed that indeed the initscripts package was completely missing.

I searched on bugzilla.redhat.com looking for any reported problems and didn't find any. I reinstalled initscripts, rebooted, and the server was fine, but it was not happytimes to have that unexpected downtime.

Tonight I ran into the problem again, this time on a build server where downtime didn't matter so I could investigate more leisurely.

The error from yum looked like this (the same problem affected the screen package as affected initscripts):

Downloading Packages:
screen-4.0.3-4.el5.i386.rpm          | 559 kB      00:00
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
groupadd: unable to open group file
error: %pre(screen-4.0.3-4.el5.i386) scriptlet failed, exit status 10
error:   install: %pre scriptlet failed (2), skipping screen-4.0.3-4.el5

Updated:
  screen.i386 0:4.0.3-4.el5

Complete!
# cat /selinux/enforce
1

The way I dealt with this initially was to temporarily disable SELinux enforcing, update the package, then reboot (to also load a kernel update):

# setenforce 0
# yum -y upgrade
# shutdown -r now

But following up on the specific error message showed:

# ls -lFaZ /etc/group
-rw-r--r--  root root system_u:object_r:file_t:s0      /etc/group

Aha! The SELinux context is wrong. Given that this has happened a couple of different machines, I'm guessing some past upgrade broke the context. What should it be? Let's check /etc/passwd for reference:

# ls -lFaZ /etc/passwd
-rw-r--r--  root root system_u:object_r:etc_t:s0       /etc/passwd

That's confirmed the correct context for /etc/group on another working server. To fix:

# chcon system_u:object_r:etc_t:s0 /etc/group

Then the updates proceed fine. It would be nice to find out what past action set the context wrong on /etc/group.

Determining dominant image color

This grew out of a misunderstanding of a client’s request, so it never saw the light of day, but I thought it was an interesting problem.

The request was to provide a “color search” of products, i.e., “show me all the orange products”. Finding the specific products was not a challenge since it was just a database query. Instead I was interested in how to choose a “representative image” from among the available images for that product. (And as it turns out, the image filename gave me that information, but let’s assume you don’t have that luxury: how do you tell, from a group of images, which one is “more orange” than the others?)

Of course, this depends on the composition of the image. In this case, I knew that the majority were of solid-color (or two- or three-color at most) products on a white background. The approach that was settled on was to severely pixellate the image into something like 20x20 (arbitrary; this could be very dependent on the images under study, or the graphics library in use). If you also supply a color palette restricted to the colors you are interested in matching (e.g., primary, and secondary colors, plus perhaps black, white, and gray), you would have a roster of the colors represented.

Another approach makes use of the powerful ImageMagick library. There’s a huge list of examples and instructions at http://www.imagemagick.org/Usage/quantize/, but for my purposes this short sample will do:


$ convert Waffle.jpg -scale 1x1\! -format '%[pixel:u]' info:-
rgb(219,166,94)

Here we reduce an image to a single pixel, then report the RGB value of that color. After that it’s just a matter of determining how close this “average” is to your desired color.

Use ZIP+4, except when you shouldn't

The USPS provides a handy API for looking up postal rates on the fly. Recently it started failing for code that had been working for a while, so I investigated. I found a couple of different problems with it:

  • First, the "service description" field had been "augmented" by including copyright symbols via HTML mark-up. That meant internal comparisons started to fail, so I "canonicalized" all the responses by stripping out various things from both sides of my comparison.
        $string =~ s{&(?:[a-z/;&])+}{}gis;
        $string =~ s/[^a-z]//gis;
        $string =~ s/^\s+//;
        $string =~ s/\s+$//;
        $string =~ s/\s+/ /gis;
    
  • Second, I found that the API inexplicably rejects 9-digit ZIP codes, the "ZIP+4" format. That's right, you can't look up a domestic shipping rate for a 9-digit ZIP. The documentation linked above specifically calls for 5-digit ZIPs. If you pass a 9-digit ZIP to the API, it doesn't smartly recognize that you've given it too much info and just use what it needs. Instead, it throws an error.

So the API got too clever in one regard, and not clever enough where it counts.

Virtual Page Tracking and Goals with Google Analytics

Sometimes I come across websites that don't use RESTful URLs or have too unique (with an order number) URLs during checkout and I need to implement Goal Tracking in Google Analytics on these user interactions. I've also had to implement Goal Tracking in a non-ecommerce web application where tabbed on-page browsing guides users through a 3-step process. Examples of situations that pose challenges to traditional page tracking in Google Analytics include:

  • Throughout Interchange's checkout, URLs are posts to "/process", which makes the user interactions difficult to distinguish.
  • Throughout Spree's checkout, URLs are posts to "/order/:id/edit", which are distinct and can be difficult to aggregate.
  • In a Sinatra application we developed recently, the single page URL is "/locate.html", but tabbed browsing occurs through three unique steps.

Google Analytics Tagging

To add Goal Tracking by URL, pages must first be tagged as "virtual pages". To implement virtual page tracking in Google, it's as simple as including a new virtual page URL in the _trackPageview action:

<script type="text/javascript">

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXXXX-1']);
- _gaq.push(['_trackPageview']);
+ _gaq.push(['_trackPageview', '/cart/basket']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
  })();

</script>

In the cases I described above, here are the tags used for each:

Interchange Spree Sinatra App
On a three page checkout process, the following virtual pages were tagged:
  • Shopping Basket: /cart/basket
  • Checkout: /cart/checkout
  • Receipt: /cart/receipt
On a multi-page checkout process, the following virtual pages were tagged:
  • Shopping Basket: /checkout/cart
  • Registration (optional): /checkout/register
  • Address: /checkout/address
  • Delivery: /checkout/delivery
  • Payment: /checkout/payment
  • Confirm: /checkout/confirm
  • Receipt: /checkout/complete
On a three page process, the following virtual pages were tagged:
  • Step 1: /step1
  • Step 2: /step2
  • Step 3: /step3

Goal Definition

After tagging is complete, you'll need to define a Goal in Google Analytics to begin tracking it.

First, you'll want to name your goal. Here I've named the goal "Purchase". In this case, the Goal is a URL Destination based on our new tags.

The Goal URL is set to "/checkout/complete" for the Spree example described above. In this case, I have it set to a Regular Expression Match, but an Exact Match will work as well. I haven't assigned a Goal Value and I've left the match case insensitive for now.

Next, I add the steps involved leading up to the goal URL. This will allow you to visualize a goal funnel and see where users may be abandoning checkout. In this example, each of the steps of the funnel correspond to a checkout step (shopping basket, optional register page, individual checkout steps, receipt/complete page).

Note that this screenshot was taken from the new Google Analytics interface. The old Google Analytics interface may appear slightly different, but contains similar content.

Looking at the Data

With virtual page tracking and goal tracking, you can see how many goals have been completed during a certain time frame:



You can also compare regular segments of traffic: e.g. users that found your site through search versus referrals:



You can compare advanced segments of visitors: e.g. marketing campaign #1 versus marketing campaign #2. This type of comparison may give you more actionable metrics to determine which marketing campaign to put more resources into.



Last but not least, you can visualize the funnel of conversion to see where you might be losing customers. This might also be actionable in that you may be able to A/B test checkout process (or another type of conversion) changes to improve conversion.

DBD::Pg query cancelling in Postgres

A new version of DBD::Pg, the Perl driver for PostgreSQL, has just been released. In addition to fixing some memory leaks and other minor bugs, this release (version 2.18.0) introduces support for the DBI method known as cancel(). A giant thanks to Eric Simon, who wrote this new feature. The new method is similar to the existing pg_cancel() method, except it works on synchronous rather than asynchronous queries. I'll show an example of both below.

DBD::Pg has been able to handle asynchronous queries for a while now. Basically, that means you don't have to wait around for the database to finish a query. Your application can do other things while the query runs, then check back later to see if it has completed and grab the results. The way to cancel an already kicked-off asynchronous query is with the pg_cancel() method (the other asynchronous methods are pg_ready and pg_result, which have no synchronous equivalents).

The prefix "pg_" is used because there is no corresponding built-in DBI method to override, and the convention is to prefix everything custom to a driver with the driver's prefix, in our case 'pg'. Here's an example showing one possible use of asynchronous queries using DBD::Pg in some Perl code:

  ## We are connecting to two servers and running expensive 
  ## queries on both. We kick both off right away, then wait 
  ## for them both to finish. Our total wait time is thus
  ## max(server1,server2) rather than sum(server1,server2)

  use strict;
  use warnings;
  use DBI;
  use DBD::Pg qw{ :async };

  my $dsn1 = 'dbi:Pg:dbname=sales;host=example1.com';
  my $dsn2 = 'dbi:Pg:dbname=sales;host=example2.com';

  my $dbh1 = DBI->connect($dsn1, '', '', {AutoCommit=>0, RaiseError=>1});
  my $dbh2 = DBI->connect($dsn2, '', '', {AutoCommit=>0, RaiseError=>1});

  my $SQL = 'SELECT gather_yearly_sales_data()';
  print "Kicking off a long, expensive query on database one\n";
  ## Normally, a do() will not return until the query is complete
  ## However, the async flag causes it to return immediately
  $dbh1->do($SQL, {pg_async => PG_ASYNC});

  print "Kicking off a long, expensive query on database two\n";
  $dbh2->do($SQL, {pg_async => PG_ASYNC});

  ## Both queries are running in the 'background'
  ## We have to wait for both, so it doesn't matter which one we wait for here
  ## However, if it's been over 2 minutes, we'll cancel both and quit
  my $time = 0;
  while ( ! $dbh1->pg_ready() ) {
    sleep 1;
    if ($time++ > 120) {
      print "Taking too long, let's cancel the queries\n";
      $dbh1->pg_cancel();
      $dbh2->pg_cancel();
      $dbh1->rollback();
      $dbh2->rollback();
      die "No sales data was retrieved\n";
    }
  }

  ## We know that database 1 has finished, so we read in the results
  my $rows1 = $dbh1->pg_result();
  ## We then grab results from database 2
  ## This will block until done, which is okay
  my $rows2 = $dbh2->pg_result();

The new method, simply known as cancel(), will kill any synchronously running query. One of the main uses for this is to timeout a query by using the builtin Perl alarm function. However, since the builtin alarm function has some quirks, we will instead use the much safer POSIX::SigAction method. Another example:

  ## We are running a series of queries against a database, but if
  ## the whole thing is taking over 30 seconds, we want to cancel
  ## the currently running query and move on to something else.

  use strict;
  use warnings;
  use DBI;
  use DBD::Pg qw{ :async };

  my $dsn = 'dbi:Pg:dbname=dq';

  my $dbh = DBI->connect($dsn, '', '', {AutoCommit=>0, RaiseError=>1});

  ## Setup all the POSIX alarm plumbing
  my $mask = POSIX::SigSet->new(SIGALRM);
  my $action = POSIX::SigAction->new(
    sub { die "TIMEOUT\n" },
    $mask,
  );
  my $oldaction = POSIX::SigAction->new();
  sigaction( SIGALRM, $action, $oldaction );

  ## Prepare the queries
  my $upd = $dbh->prepare('UPDATE foobar SET x=? WHERE y=?');
  my $inv = $dbh->prepare('SELECT refresh_inventory(?)');

  ## Yes, a double eval. Async is looking better all the time :)
  eval {
    eval {
          alarm 30;
          for my $y (12,24,48) {
              print "Adjusting widget #$y\n";
              $upd->execute(555,$y);
              print "Recalculating inventory\n";
              $inv->execute($y);
          }
        };
        alarm 0; ## Turn off our alarm
        die "$@\n" if $@; ## Bubble the error to the outer eval
    };
    if ($@) { ## Something went wrong
      if ($@ =~ /TIMEOUT/) {
        print "Queries are taking too long! Cancelling\n";
        ## We don't know which one is still running, and don't care
        ## It's safe to cancel a non-active statement handle
        $upd->cancel() or die qq{Failed to cancel the query!\n};
        $inv->cancel() or die qq{Failed to cancel the query!\n};
        $dbh->rollback();
        die "Who has time to wait 30 seconds anymore?";
      }
      ## Some other non-alarm error, so we simply:
      die $@;
    }

    print "Updates are complete\n";
    $dbh->commit();
    exit;

Got an interesting use case for asynchronous queries or the new $dbh‑>cancel()? Let me know!

Ruby, Rails, and Ecommerce

I'm a big fan of Ruby. And I like Rails too. Lately, I've been investigating several Ruby, Rails, and Rails ecommerce framework options (follow-up to discussing general ecommerce options). I've also recently written about developing ecommerce on Sinatra (one, two, and three). Most of End Point's clients are ecommerce clients, so we've seen it all in terms of feature requests (third party integration like QuickBooks, search, PayPal, product features like best sellers, recommended items, related items, checkout features like one-page checkout, guest checkout, backend features like advanced reporting, sales management, inventory management). Our services also include hosting and database consulting for many of our ecommerce clients, so we have a great understanding of what it takes to run an ecommerce solution.

When it comes to ecommerce development, someone who likes coding in Ruby (like me) has a few options:

  • Ruby DSL (e.g. Sinatra)
  • Pure Rails
  • Open Source Ecommerce on Rails: Spree, ROR-Ecommerce, Substruct. End Point admittedly has the most experience with Spree.

Here's a run down of some pros and cons of each option:

Pros Cons

Ruby DSL
  • Removes the bloat of Rails and may have better performance at the onset
  • API friendly (i.e. writing APIs to handle both web and mobile apps is pleasant)
  • Ruby gems are available, such as ActiveMerchant, ActiveShipping, Paperclip.
  • Routing in DSL might be easier to work with than Rails.
  • Extremely flexible
  • Minimal view and form helpers
  • A smaller community than Rails (likely)
  • As your application grows, you essentially need to implement conventions that mimic those of Rails to keep code organized.
  • Too much flexibility can result in convoluted, disorganized code.

Rails
  • Large and active community
  • Ruby gems are available, such as ActiveMerchant, ActiveShipping, Paperclip.
  • A lot of documentation is available.
  • The elegant, modern, MVC conventions allow for efficient developoment.
  • Performance can be an issue at the onset of the project (relative to Ruby DSL), but there are plenty of caching options that can address this.
  • The bloat in Rails might be stuff your application doesn't need which can add complication.
  • You are essentially reinventing the wheel if you are building a standard ecommerce application.

Open Source Ecommerce on Rails
  • Community code, expertise and experience can be leveraged.
  • You aren't reinventing the wheel.
  • The elegant, modern, MVC conventions in Rails allow for efficient developoment.
  • The framework has been shaped by the clients on it. If it's a young framework, aspects of the ecommerce application can result in under or overdeveloped components (e.g. purchase order system, inventory management, ability to scale).
  • The framework can be overly complicated at the cost of genericizing it, which can cause problems in development, performance.
  • You are at the mercy of the maintainers of the framework to make decisions on the ecommerce framework involving extensibility, long-term maintenance and support, and feature inclusion in the project.

Or, here's another way to look at things:


Generalizations I've made after developing in each of the scenarios.

Another thing to consider is what might happen when application changes are required a year after the project was initially implemented. Here's what might happen in each of the scenarios:

  • Ruby DSL: You still have to write custom code. You may pick up using the conventions and organization where you left off. You may be able to use open source gems in your application. Ruby probably hasn't changed a ton. The Ruby DSL (e.g. Sinatra) may have changed.
  • Rails: You still have to write custom code in Rails. You may be able to use open source gems in your application. Rails may have changed if there was a major release, but chances are your current version of Rails is still supported.
  • Open Source Ecommerce on Rails: You may or may not have to write custom code because the community may provide the functionality you need. The platform may have changed significantly, and there's a chance that your current version of the platform is no longer supported.

Any one of the options presented in this article may be suitable for our clients: Sinatra for a small, simple, custom app, Rails for a complex application with custom needs, open source ecommerce on Rails for a client that follows the standard ecommerce mold. As consultants, our goal is to choose the best tool for a client to meed their business needs, long-term goals and budget.

One final note to add is that my generalizations discussed here regarding open source ecommerce on Rails and open source projects in general are highly dependent on the maturity of the framework. As an open source project matures, its stability and flexibility may improve.