End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

PostgreSQL 9.0 High Performance Review

I recently had the privilege of reading and reviewing the book PostgreSQL 9.0 High Performance by Greg Smith. While the title of the book suggests that it may be relevant only to PostgreSQL 9.0, there is in fact a wealth of information to be found which is relevant for all community supported versions of Postgres.

Acheiving the highest performance with PostgreSQL is definitely something which touches all layers of the stack, from your specific disk hardware, OS and filesystem to the database configuration, connection/data access patterns, and queries in use. This book gathers up a lot of the information and advice that I've seen bandied about on the IRC channel and the PostgreSQL mailing lists and presents it in one place.

While seemingly related, I believe some of the main points of the book could be summed up as:

  1. Measure, don't guess. From the early chapters which cover the lowest-level considerations, such as disk hardware/configuration to the later chapters which cover such topics as query optimization, replication and partitioning, considerable emphasis is placed on determining the metrics by which to measure performance before/after specific changes. This is the only way to determine the impact the changes you make have.
  2. Tailor to your specific needs/workflows. While there are many good rules of thumb out there when it comes to configuration/tuning, this book emphasizes the process of determining/refining those more general numbers to tailoring configuration/setup to your specific database's needs.
  3. Review the information the database system itself gives you. Information provided by the pg_stat_* views can be useful in identifying bottlenecks in queries, unused/underused indexes.

This book also introduced me to a few goodies which I had not encountered previously. One of the more interesting ones is the pg_buffercache contrib module. This suite of functions allows you to peek at the internals of the shared_buffers cache to get a feel for which relations are heavily accessed on a block-by-block basis. The examples in the book show this being used to more accurately size shared_buffers based on the actual number of accesses to specific portions of different relations.

I found the book to be well-written (always a plus when reading technical books) and felt it covered quite a bit of depth given its ambitious scope. Overall, it was an informative and enjoyable read.

PostgreSQL 9.0 Admin Cookbook

I've been reading through the recently published book PostgreSQL 9.0 Admin Cookbook of late, and found that it satisfies an itch for me, at least for now. Every time I get involved in a new project, or work with a new group of people, there's a period of adjustment where I get introduced to new tools and new procedures. I enjoy seeing new (and not uncommonly, better) ways of doing the things I do regularly. At conferences I'll often spend time playing "What's on your desktop" with people I meet, to get an idea of how they do their work, and what methods they use. Questions about various peoples' favorite window manager, email reader, browser plugin, or IRC client are not uncommon. Sometimes I'm surprised by a utility or a technique I'd never known before, and sometimes it's nice just to see minor differences in the ways people do things, to expand my toolbox somewhat. This book did that for me.

As the title suggests, authors Simon Riggs and Hannu Krosing have organized their book similarly to a cookbook, made up of simple "recipes" organized in subject groups. Each recipe covers a simple topic, such as "Connecting using SSL", "Adding/Removing tablespaces", and "Managing Hot Standby", with detail sufficient to guide a user from beginning to end. Of course in many of the more complex cases some amount of detail must be skipped, and in general this book probably won't provide its reader with an in depth education, but it will provide a framework to guide further research into a particular topic. It includes a description of the manuals, and locations of some of the mailing lists to get the researcher started.

I've used PostgreSQL for many different projects and been involved in the community for several years, so I didn't find anything in the book that was completely unfamiliar. But PostgreSQL is an open source project with a large community. There exists a wide array of tools, many of which I've never had occasion to use. Reading about some of them, and seeing examples in print, was a pleasant and educational experience. For instance, one recipe describes "Selective replication using Londiste". My tool of choice for such problems is generally Bucardo, so I'd not been exposed to Londiste's way of doing things. Nor have I used pgstatspack, a project for collecting various statistics and metrics from database views which is discussed under "Collecting regular statistics from pg_stat_* views".

In short, the book gave me the opportunity to look over the shoulder of experienced PostgreSQL users and administrators to see how they go about doing things, and compare to how I've done them. I'm glad to have had the opportunity.

Ruby on Rails versus CakePHP: A Syntax Comparison Guide

My time is typically split between Interchange and Spree development, but in a recent project for JackThreads, I jumped back into CakePHP code. CakePHP is one of the more popular PHP MVC frameworks and is inspired by Rails. I decided to put together a quick syntax comparison guide between CakePHP and Rails since I occasionally have to look up how to do some Rails-y thing in CakePHP.

Basic

Ruby on Rails CakePHP
MVC Code Inclusion Rails is typically installed as a gem and source code lives in the user's gem library. In theory, a modified version of the Rails source code can be "frozen" to your application, but I would guess this is pretty rare. CakePHP is typically installed in the application directory in a "cake/" directory. The "app/" directory contains application specific code. From my experience, this organization has allowed me to easily debug CakePHP objects, but didn't do much more for me.
Application Directory Structure
app/
  controllers/ models/ views/ helpers/ 
lib/
config/
public
  javascripts/ images/ stylesheets/
vendors/
  plugins/ extensions/
controllers/
models/
views/
  layouts/ elements/ ...
config/
webroot/
tmp/
plugins/
vendors/
Notes: In Rails, layouts live in app/views/layouts/. In CakePHP, layouts live in views/layouts/ and helpers lie in views/helpers/.
Creating an Application
rails new my_app # Rails 3 after gem installation
rails my_app # Rails <3
Download the compressed source code and create an application with the recommended directory structure.

Models

Ruby on Rails CakePHP
Validation
class Zone < ActiveRecord::Base
  validates_presence_of :name
  validates_uniqueness_of :name
end
class User extends AppModel {
  var $name = 'User';
  var $validate = array(
    'email' => array(
      'email-create' => array(
        'rule' => 'email',
        'message' => 'Invalid e-mail.',
        'required' => true,
        'on' => 'create'
      )
    )
  );
}
Relationships
class Order < ActiveRecord::Base
  belongs_to :user
  has_many :line_items
end
class Invite extends AppModel {
  var $name = 'Invite';
  var $belongsTo = 'User';
  var $hasMany = 'Campaigns';
}
Special Relationships
class Address < ActiveRecord::Base
  has_many :billing_checkouts,
    :foreign_key => "bill_address_id",
    :class_name => "Checkout"
end
class Foo extends AppModel {
  var $name = 'Foo';
  var $hasMany = array(
    'SpecialEntity' => array(
      'className' => 'SpecialEntity',
      'foreignKey' => 'entity_id',
      'conditions' =>
  array('Special.entity_class' => 'Foo'),
      'dependent' => true
    ),
  );
}

Controllers

Ruby on Rails CakePHP
Basic Syntax
class FoosController < ActionController::Base
  helper :taxons
  actions :show, :index

  include Spree::Search

  layout 'special'
end
class FooController extends AppController {
  var $name = 'Foo';
  var $helpers = array('Server', 'Cart');
  var $uses = array('SpecialEntity','User');
  var $components = array('Thing1', 'Thing2');
  var $layout = 'standard';
}
Notes: CakePHP and Rails use similar helper and layout declarations. In CakePHP, the $uses array initiates required models to be used in the controller, while in Rails all application models are available without an explicit include. In CakePHP, the $components array initiates required classes to be used in the controller, while in Rails you will use "include ClassName" to include a module.
Filters
class FoosController < ActionController::Base
  before_filter :load_data, :only => :show
end
class FooController extends AppController {
  var $name = 'Foo';

  function beforeFilter() {
    parent::beforeFilter();
    //do stuff
  } 
}
Setting View Variables
class FoosController < ActionController::Base
  def index
    @special_title = 'This is the Special Title!'
  end
end
class FooController extends AppController {
  var $name = 'Foo';

  function index() {
    $this->set('title',
      'This is the Special Title!');
  }
}

Views

Ruby on Rails CakePHP
Variable Display
<%= @special_title %>
<?= $special_title ?>
Looping
<% @foos.each do |foo| -%>
<%= foo.name %>
<% end -%>
<?php foreach($items as $item): ?>
<?= $item['name']; ?>
<?php endforeach; ?>
Partial Views or Elements
<%= render :partial => 'shared/view_name',
  :locals => { :b => "abc" } %>
<?php echo $this->element('account_menu',
  array('page_type' => 'contact')); ?>
Notes: In Rails, partial views typically can live anywhere in the app/views directory. A shared view will typically be seen in the app/views/shared/ directory and a model specific partial view will be seen in the app/views/model_name/ directory. In CakePHP, partial views are referred to as elements and live in the views/elements directory.
CSS and JS
<%= javascript_include_tag
  'my_javascript',
  'my_javascript2' %>
<%= stylesheet_link_tag
  'my_style' %>
<?php
  $html->css(array('my_style.css'),
    null, array(), false);
  $javascript->link(array('my_javascript.js'),
    false);
?>

Routing

Ruby on Rails CakePHP
Basic
# Rails 3
match '/cart',
  :to => 'orders#edit',
  :via => :get,
  :as => :cart
# Rails <3
map.login '/login',
  :controller => 'user_sessions',
  :action => 'new'
Router::connect('/refer',
  array('controller' => 'invites',
        'action' => 'refer'));
Router::connect('/sales/:sale_id',
  array('controller' => 'sale',
        'action' => 'show'),
  array('sale_id' => '[0-9]+')); 
Nested or Namespace Routing
# Rails 3
namespace :admin do
  resources :foos do
    collection do
      get :revenue
      get :profit
    end
  end
end

# Rails <3
map.namespace :admin do |admin|
  admin.resources :foos, :collection => {
    :revenue            => :get,
    :profit             => :get,
  }
end
-

Logging

Ruby on Rails CakePHP
Where to? tmp/log/production.log or tmp/log/debug.log tmp/logs/debug.log or tmp/logs/error.log
Logging Syntax
Rails.logger.warn "steph!" # Rails 3
logger.warn "steph!" # Rails <3
or
RAILS_DEFAULT_LOGGER.warn "steph!"
$this->log('steph!', LOG_DEBUG);

If you are looking for guidance on choosing one of these technologies, below are common arguments. In End Point's case, we choose whatever technology makes the most sense for the client. We implemented a nifty solution for JackThreads to avoid a complete rewrite, described here in detail. We also work with existing open source ecommerce platforms such as Interchange and Spree and try to choose the best fit for each client.

Pick Me!

Ruby on Rails CakePHP
  • Ruby is prettier than PHP.
  • Rails Object Oriented Programming implementation is more elegant than in CakePHP.
  • Rails routing is far superior to CakePHP routing.
  • Deployment and writing migrations are simpler with built in or peripheral tools.
  • Documentation of Rails is better than CakePHP.
  • CakePHP has better performance than Rails. UPDATE: This appears to be a rumor. Benchmark data suggests that Rails performs better than CakePHP.
  • PHP is supported on hosting providers better than Rails.
  • PHP developers are typically less expensive than Ruby/Rails developers.

Mongol Rally

This summer, End Point was pleased to be one of several sponsors of team One Steppe Beyond in the 2010 Mongol Rally. Team member Christopher Letters is the son of the owners of Crotchet Classical, a longtime End Point ecommerce client. Chris reports that they had a great time on the rally driving 10,849 to their destination in Mongolia!

You can read their dispatches from the road on their Mongol Rally team page.

Each team raises money for a charity, a minimum of £1000 per team. Team One Steppe Beyond chose the Christina Noble Children's Foundation which has a project in Ulaanbaatar, Mongolia.

Congratulations to team members Christopher, Dominic, and Thomas for finishing the race! It was obviously quite an adventure and for a good cause.

Liquid Galaxy Sysadmin+ Wanted

End Point Corporation is hiring for a motivated and creative GNU/Linux systems administrator. The work will primarily involve installing, supporting, maintaining and developing infrastructure improvements for Google Liquid Galaxy systems. Liquid Galaxy is an impressive panoramic system for Google Earth and other applications. Check it out!

Responsibilities:

  • Set up and upgrade Liquid Galaxy Systems at client locations. (Some travel is required, including internationally.)
  • Do on site and remote troubleshooting and support.
  • Participate in ongoing work to improve the system with automation, monitoring, and customizing configurations to clients' needs.
  • Provide first-class customer service.

Requirements:

  • BS degree or equivalent experience
  • At least 3 years of experience with Linux systems administration
  • Strong scripting skills in shell, and also Python, Ruby, Perl or PHP
  • Proven technical troubleshooting and performance tuning experience
  • Excellent analytical abilities along with a strong sense of ownership and urgency, plus the drive and ability to rise to new challenges and master new skills
  • Awareness and knowledge about security issues
  • Good communication skills
  • The basic physical fitness for putting together and breaking down the hardware components of the system

If you have experience with any of the following it is likely to be useful:

  • Geospatial systems
  • Sketchup, Building Maker, Blender, general 3D modelling
  • OpenGL application development
  • Image processing
  • Video capture, processing, and production technologies
  • Puppet.

While we have a strong preference that this position be a hire for our New York City office where most of our Liquid Galaxy team is located, we don't entirely rule out the possibility of hiring someone who works out of his or her home office if the fit is right.

Please email jobs@endpoint.com to apply.

Utah Open Source Conference 2010 part 1

It's been about a little over a month since the 2010 Utah Open Source Conference, and I decided to take a few minutes to review talks I enjoyed and link to my own talk slides.

Magento: Mac Newbold of Code Greene spoke on the Magento ecommerce framework for PHP. I've somewhat familiar with Magento, but a few things stood out:

  • He finds the Magento Enterprise edition kind of problematic because Varien won't support you if you have any unsupported extensions. Some of his customers had problems with Varien support and went back to the community edition.
  • Magento is now up to around 30 MB of PHP files!
  • As I've heard elsewhere, serious customization has a steep learning curve.
  • The Magento data model is an EAV (Entity-Attribute-Value) model. To get 50 columns of output requires 50+ joins between 8 tables (one EAV table for each value datatype).
  • There are 120 tables total in default install -- many core features don't use the EAV tables for performance reasons.
  • Another observation I've heard in pretty much every conversation about Magento: It is very resource intensive. Shared hosting is not recommended. Virtual servers should have a minimum of 1/2 to 1 GB RAM. Fast disk & database help most. APC cache recommended with at least 128 MB.
  • A lot of front-end things are highly adjustable from simple back-end admin options.
  • Saved credit cards are stored in the database, and the key is on the server. I didn't get a chance to ask for more details about this. I hope it's only the public part of a public/secret keypair!

It was a good overview for someone wanting to go beyond marketing feature lists.

Node.js: Shane Hansen of Backcountry.com spoke on Node, comparing it to Tornado and Twisted in Python. He calls JavaScript "Lisp in C's clothing", and says its culture of asynchronous, callback-driven code patterns makes Node a natural fit.

Performance and parallel processing are clearly strong incentives to look into Node. The echo server does 20K requests/sec. There are 2000+ Node projects on GitHub and 500+ packages in npm (Node Package Manager), including database drivers, web frameworks, parsers, testing frameworks, payment gateway integrations, and web analytics.

A few packages worth looking into further:

  • express - web microframework like Sinatra
  • Socket-IO - Web Sockets now; falls back to other things if no Web Sockets available
  • hummingbird - web analytics, used by Gilt.com
  • bespin - "cloud JavaScript editor"
  • yui3 - build HTML via DOM, eventbus, etc.
  • connect - like Ruby's Rack

I haven't played with Node at all yet, and this got me much more interested.

Metasploit: Jason Wood spoke on Metasploit, a penetration testing (or just plain penetrating!) tool. It was originally in Perl, and now is in Ruby. It comes with 590 exploits and has a text-based interactive control console.

Metasploit uses several external apps: nmap, Maltego (proprietary reconnaissance tool), Nessus (no longer open source, but GPL version and OpenVAS fork still available), Nexpose, Ratproxy, Karma.

The reconnaissance modules include DNS enumeration, and an email address collector that uses the big search engines.

It can backdoor PuTTY, PDFs, audio, and more.

This is clearly something you've got to experiment with to appreciate. Jason posted his Metasploit talk slides which have more detail.

So Many Choices: Web App Deployment with Perl, Python, and Ruby: This was my talk, and it was a lot of fun to prepare for, as I got to take time to see some new happenings I'd missed in these three languages communities' web server and framework space over the past several years.

The slides give pointers to a lot of interesting projects and topics to check out.

My summary was this. We have an embarrassment of riches in the open source web application world. Perl, Python, and Ruby all have very nice modern frameworks for developing web applications. They also have several equivalent solid options for deploying web applications. If you haven't tried the following, check them out:

That's about half of my notes on talks, but all I have time for now. I'll cover more in a later post.

(Image|Graphics)Magick trick for monitoring or visualizations

It's a good time for all when we start poking fun at the visual assault of stereotypical PPT Presentations. On the other hand, when data is presented in an effective visual format, human brains are able to quickly grasp the ideas involved and pick out important pieces of information, or "outliers".

Without getting into a long trumpeting session about the usefulness of data visualization (there are plenty of books on the subject), I'd like to jump directly into a Magick trick or two for creating simple visualizations.

Let's imagine we've got a group of machines serving a particular purpose. Now let's say I want quick insight into not only the internal activity of all 8 machines, but also what the systems believe they are sending to their displays.

With a little magick (of the ImageMagick or GraphicsMagick variety), we can save ourselves from running "ps" and "free" and from having to be in the same room (or the same country) as the system we're checking up on.

First, let's organize some simple output from the system:

$ echo -en "$(hostname) GPID: $( pgrep googleearth-bin ). APPID: $( pgrep -u root -f sbin/apache2 ).\nCRASH: $( ls -1 ${HOME}/.googleearth/crashlogs/ | wc -l ). MEMF: $( awk '/^MemFree/ {print $2$3}' /proc/meminfo )."

Which gives us output something like this:

lg1 - GPID: 5265. APPID: 10452.
CRASH: 3. MEMF: 4646240kB.

Cool, but we want to combine this with the imagery supposedly being currently displayed by X. So, we turn it into an image that we can overlay, like this:

$ echo -en "$(hostname) GPID: $( pgrep googleearth-bin ). APPID: $( pgrep -u root -f sbin/apache2 ).\nCRASH: $( ls -1 ${HOME}/.googleearth/crashlogs/ | wc -l ). MEMF: $( awk '/^MemFree/ {print $2$3}' /proc/meminfo )." | \
convert -pointsize 18 -background '#00000080' -fill white text:- -trim -bordercolor '#00000080' -border 5x5 miff:/tmp/text

This is one long command and might be hard to read, but it is simply using "convert" to turn the text output into a semi-transparent "miff" image for later use. It would be very easy to put the stat collection into a script on each host, but we're just going with quick and dirty at the moment.

Second, let's get our little overlay image composited with a screenshot from X:

$ DISPLAY=:0 import -window root miff:- | composite -gravity south -geometry +0+3 miff:/tmp/text miff:- -resize 600 miff:/tmp/$(hostname).miff

So, in a single pipeline we imported a screenshot of the root window, then used "composite" to overlay our semi-transparent stats image and resize the whole thing to be a bit more manageable.

Finally, we want to perform these things across all the systems and be left with something we can quickly glance at to see if there are obvious problems. So, let's create a quick shell loop and execute our commands via ssh, placing the resize/composite burden on the shoulders of each individual system (be sure to escape variables for remote interpolation!):

#!/bin/bash

#collect data first
for system in `seq 1 8`; do
 ssh user@$system "

echo -en \"\$(hostname) GPID: \$( pgrep googleearth-bin ). APPID: \$( pgrep -u root -f sbin/apache2 ).\nCRASH: \$( ls -1 \${HOME}/.googleearth/crashlogs/ | wc -l ). MEMF: \$( awk '/^MemFree/ {print \$2\$3}' /proc/meminfo )." | \
convert -pointsize 18 -background '#00000080' -fill white text:- -trim -bordercolor '#00000080' -border 5x5 miff:/tmp/text;

DISPLAY=:0 import -window root miff:- | \
composite -gravity south -geometry +0+3 miff:/tmp/text miff:- -resize 600 miff:-" >/tmp/system${system}.miff;

done

#make a montage of the data
montage -monitor -background black -tile 8x1 -geometry +5+0 \
 /tmp/system{6,7,8,1,2,3,4,5}.miff \
 /tmp/system-montage.png && rm -f /tmp/system?.miff

With something so simple, we can quickly view from New York what's happening on systems installed in California, like so:

montage example

Speeding up the Spree demo site

There's a lot that can be done to speed up Spree, and Rails apps in general. Here I'm not going to deal with most of that. Instead I want to show how easy it is to speed up page delivery using standard HTTP server tuning techniques, demonstrated on demo.spreecommerce.com.

First, let's get a baseline performance measure from the excellent webpagetest.org service using their remote Internet Explorer 7 tests:

  • First page load time: 2.1 seconds
  • Repeat page load time: 1.5 seconds

The repeat load is faster because the browser has images, JavaScript, and CSS cached, but it still has to check back with the server to make sure they haven't changed. Full details are in this initial report.

The demo.spreecommerce.com site is run on a Xen VPS with 512 MB RAM, CentOS 5 i386, Apache 2.2, and Passenger 2.2. There were several things to tune in the Apache httpd.conf configuration:

  • mod_deflate was already enabled. Good. That's a big help.
  • Enable HTTP keepalive: KeepAlive On and KeepAliveTimeout 3
  • Limit Apache children to keep RAM available for Rails: StartServers 5, MinSpareServers 2, MaxSpareServers 5
  • Limit Passenger pool size to 2 child processes (down from the default 6), to queue extra requests instead of using slow swap memory: PassengerMaxPoolSize 2
  • Enable browser & intermediate proxy caching of static files: ExpiresActive On and ExpiresByType image/jpeg "access plus 2 hours" etc. (see below for full example)
  • Disable ETags which aren't necessary once Expires is enabled: FileETag None and Header unset ETag
  • Disable unused Apache modules: free up memory by commenting out LoadModule proxy, proxy_http, info, logio, usertrack, speling, userdir, negotiation, vhost_alias, dav_fs, autoindex, most authn_* and authz_* modules
  • Disable SSLv2 (for security and PCI compliance, not performance): SSLProtocol all -SSLv2 and SSLCipherSuite ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP

After making these changes, without tuning Rails, Spree, or the database at all, a new webpagetest.org run reports:

  • First page load time: 1.2 seconds
  • Repeat page load time: 0.4 seconds

That's an easy improvement, a reduction of 0.9 seconds for the initial load and 1.1 seconds for a repeat load! Complete details are in this follow-on report.

The biggest wins came from enabling HTTP keepalive, which allows serving multiple files from a single HTTP connection, and enabling static file caching which eliminates the majority of requests once the images, JavaScript, and CSS are cached in the browser.

Note that many of the resource-limiting changes I made above to Apache and Passenger would be too restrictive if more RAM or CPU were available, as is typical on a dedicated server with 2 GB RAM or more. But when running on a memory-constrained VPS, it's important to put such limits in place or you'll practically undo any other tuning efforts you make.

I wrote about these topics a year ago in a blog post about Interchange ecommerce performance optimization. I've since expanded the list of MIME types I typically enable static asset caching for in Apache. Here's a sample configuration snippet to put in the <VirtualHost> container in httpd.conf:

    ExpiresActive On
    ExpiresByType image/gif   "access plus 2 hours"
    ExpiresByType image/jpeg  "access plus 2 hours"
    ExpiresByType image/png   "access plus 2 hours"
    ExpiresByType image/tiff  "access plus 2 hours"
    ExpiresByType text/css    "access plus 2 hours"
    ExpiresByType image/bmp   "access plus 2 hours"
    ExpiresByType video/x-flv "access plus 2 hours"
    ExpiresByType video/mpeg  "access plus 2 hours"
    ExpiresByType video/quicktime "access plus 2 hours"
    ExpiresByType video/x-ms-asf  "access plus 2 hours"
    ExpiresByType video/x-ms-wm   "access plus 2 hours"
    ExpiresByType video/x-ms-wmv  "access plus 2 hours"
    ExpiresByType video/x-ms-wmx  "access plus 2 hours"
    ExpiresByType video/x-ms-wvx  "access plus 2 hours"
    ExpiresByType video/x-msvideo "access plus 2 hours"
    ExpiresByType application/postscript        "access plus 2 hours"
    ExpiresByType application/msword            "access plus 2 hours"
    ExpiresByType application/x-javascript      "access plus 2 hours"
    ExpiresByType application/x-shockwave-flash "access plus 2 hours"
    ExpiresByType image/vnd.microsoft.icon      "access plus 2 hours"
    ExpiresByType application/vnd.ms-powerpoint "access plus 2 hours"
    ExpiresByType text/x-component              "access plus 2 hours"

Of course you'll still need to tune your Spree application and database, but why not tune the web server to get the best performance you can there?