End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

NoSQL at RailsConf 2010: An Ecommerce Example

Even more so than Rails 3, NoSQL was a popular technical topic at RailsConf this year. I haven't had much exposure to NoSQL except for reading a few articles written by Ethan (Quick Thoughts on NoSQL Live Boston Conference, NoSQL Live: The Dynamo Derivatives (Cassandra, Voldemort, Riak), and Cassandra, Thrift, and Fibers in EventMachine), so I attended a few sessions to learn more.

First, it was reinforced several times that if you can read JSON, you should have no problem comprehending NoSQL. So, it shouldn't be too hard to jump into code examples! Next, I found it helpful when one of the speakers presented high-level categorization of NoSQL, whether or not the categories meant much to me at the time:

  • Key-Value Stores: Advantages include that this is the simplest possible data model. Disadvantages include that range queries are not straightforward and modeling can get complicated. Examples include Redis, Riak, Voldemort, Tokyo Cabinet, MemcacheDB.
  • Document stores: Advantages include that the value associated with a key is a document that exposes a structure that allows some database operations to be performed on it. Examples include CouchDB, MongoDB, Riak, FleetDB.
  • Column-based stores: Examples include Cassandra, HBase.
  • Graph stores: Advantages include that this allows for deep relationships. Examples include Neo4j, HypergraphDB, InfoGrid.

In one NoSQL talk, Flip Sasser presented an example to demonstrate how an ecommerce application might be migrated to use NoSQL, which was the most efficient (and very familiar) way for me to gain an understanding of NoSQL use in a Rails application. Flip introduced the models and relationships shown here:

In the transition to NoSQL, the transaction model stays as is. As a purchase is created, the Notification.create method is called.

class Purchase < ActiveRecord::Base
  after_create :create_notification

  # model relationships
  # model validations

  def total
    quantity * product.price
  end

  protected
  def create_notification
    notifications.create({
      :action => "purchased #{quantity == 1 ? 'a' : quantity} #{quantity == 1 ? product.name : product.name.pluralize}",
      :description => "Spent a total of #{total}",
      :item => self,
      :user => user
    }
    )
  end
end

Flip moves the product class to Document store because it needs a lot of flexibility to handle the diverse product metadata. The structure of the product class is defined in the product class and nowhere else.

Before

class Product < ActiveRecord::Base
  serialize :info, Hash
end

After

class Product
  include MongoMapper::Document

  key :name, String
  key :image_path, String

  key :info, Hash

  timestamps!
end

The Notification class is moved to a Key-Value store. After a user completes a purchase, the create method is called to store a notification against the user that is to receive the notification.

Before

class Notification < ActiveRecord::Base
  # model relationships
  # model validations
end

After

require 'ostruct'

class Notification < OpenStruct
  class << self
    def create(attributes)
      message = "#{attributes[:user].name} #{attributes[:action]}"
      attributes[:user].follower_ids.each do |follower_id|
        Red.lpush("user:#{follower_id}:notifications", {:message => message, :description => attributes[:description], :timestamp => Time.now}.to_json)
      end
    end
  end
end

The user model remains an ActiveRecord model and uses the devise gem for user authentication, but is modified to retrieve the notifications, now an OpenStruct. The result is that whenever a user's friend makes a purchase, the user is notified of the purchase. In this simple example, a purchase contains one product only.

Before

class User < ActiveRecord::Base
  # user authentication here
  # model relationships

  def notifications
    Notification.where("friend_relationships.friend_id = notifications.user_id OR notifications.user_id = #{id}").
      joins("LEFT JOIN friend_relationships ON friend_relationships.user_id = #{id}")
  end
end

After

class User < ActiveRecord::Base
  # user authentication here
  # model relationships

  def followers
    User.where('users.id IN (friend_relationships.user_id)').
      joins("JOIN friend_relationships ON friend_relationships.friend_id = #{id}")
  end

  def follower_ids
    followers.map(&:id)
  end

  def notifications
    (Red.lrange("user:#{id}:notifications", 0, -1) || []).map{|notification| Notification.new(ActiveSupport::JSON.decode(notification))}
  end
end

The disadvantages to the NoSQL and RDBMS hybrid is that data portability is limited and ActiveRecord plugins can no longer be used. But the general idea is that performance justifies the move to NoSQL for some data. In several sessions I attended, the speakers reiterated that you will likely never be in a situation where you'll only use NoSQL, but that it's another tool available to suit performance-related business needs. I later spoke with a few Spree developers and we concluded that the NoSQL approach may work well in some applications for product and variant data for improved performance with flexibility, but we didn't come to an agreement on where else this approach may be applied.

Learn more about End Point's Ruby on Rails Development or Ruby on Rails Ecommerce Services.

4 comments:

Ethan Rowe said...

Steph, thanks for your usual informative posting. :)

It's interesting the degree to which MongoDB is of interest to the Rails community. I think people find the idea of working with deeply nested objects inside of a single "row" or "document" appealing. But what problem is it really solving? It makes sharding marginally less hideous than with a traditional RDBMS, because you cram the related data into a single logical entity and thus don't have to worry about crafting the data partitioning strategy to be foreign-key-sensitive. But sharding is still fundamentally hideous, rather like the query syntax MongoDB expects people to use. I can't help but think that MongoDB's apparent mass appeal owes largely to the ease with which one can work with it from the application layer. It's rather like MySQL in that respect, and indeed the project itself feels like the intellectual offspring of MySQL. Who cares if your data is badly organized and tends towards meaninglessness if it's easy right now?

Cassandra gives some good modeling flexibility that, with some finessing, can feel rather like the nested document approach Mongo promotes, while at the same time actually giving something of meaningful value. Control over consistency, true high availability, write scaling, etc. Of course, how many systems actually need these things?

It should be noted that a lot of NoSQL solutions use Thrift as their interface, and the Ruby Thrift client can leave one underwhelmed. And Ruby itself is just not the best for high-volume data processing. Don't expect to instantly process huge volumes of data in your Rails app just because you started using some distributed NoSQL database; you'll have a variety of application-level bottlenecks and may well be better off implementing a data management service in one of the JVM-based languages. I've been trying to pick my favorite Cassandra client in Scala for exactly that reason.

Thanks again.
- Ethan

Steph Powell said...

Thanks for the supplementary information and comments, Ethan.

FWIW, I attended Michael Koziarski's talk on an Introduction to Cassandra and CassandraObject at the conference, where infrastructure was discussed, but I didn't think it fit into this article even though it was interesting.

Ethan Rowe said...

A talk on CassandraObject? That's cool. I tried working with it a couple months back. A lot of good work has gone into it. However, I ultimately opted against using it. Specifically, it disregards (or did at the time; this may have changed) columns that you haven't told it about. I opted instead to build an attribute-mapping class that lets you perform simple column-to-attribute mappings, or more complex ones where you can map column names to collection attributes via pattern matching. This seemed to me like a more effective way to address the situation in which you may have any number of some structure within a single row, which is not an uncommon situation in these loosely-structured schemas.

Anyway, CassandraObject seems like it has some good stuff to offer; it just didn't meet my needs at the time.

- Ethan

Flip Sasser said...

Holy crap, a vanity Google search turned up a blog post about my presentation! Wow! I'm so happy my presentation actually HELPED someone - I thought I totally bombed!