End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Building Xpdf on Ubuntu

It may happen that you need to use Xpdf, even though it no longer ships with Ubuntu and is considered ... outdated? buggy? insecure? In any case, it still renders some PDFs that Poppler-based viewers such as Evince don't, or allows some troublesome PDFs to print as fonts and line art instead of a rasterized mess.

Here's how I built and installed xpdf 3.02 on Ubuntu 11.04 (Natty Narwhal) x86_64:

sudo apt-get install libfreetype6-dev libmotif-dev
wget ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02.tar.gz  # now 3.03 is current
tar xzpf xpdf-3.02.tar.gz
cd xpdf-3.02
./configure --with-freetype2-library=/usr/lib/x86_64-linux-gnu \
    --with-freetype2-includes=/usr/include/freetype2 \
    --with-Xm-library=/usr/lib \
    --with-Xm-includes=/usr/include/Xm
make
# see lots of warnings!
sudo make install

That's it. Not as nice as the old native Debian/Ubuntu packages, but gets the job done.

Giraffes and Liquid Galaxy

We build lots and lots of Google Earth tours, typically for use with a Liquid Galaxy. These tours tell Google Earth to start at one point and fly to another, displaying various images along the way. They're written in an XML-based language called Keyhole Markup Language, or KML. Various KML elements describe points on the globe, 3D objects, tracks through space, or other items, and a series of Google-specific KML extensions can define an automated tour in which Google Earth flies through these objects.

Now for a bit of dogma: writing large volumes of XML by hand is a drag. Writing KML is no exception. So when I started building KML tours, I was keen to build or find tools to make it easier. For my first KML project, making a tour through each End Point employee's home city, I built a Perl script to convert a file of employee data to KML. This worked well enough, but I soon learned typing out all the KML was just half the pain. The other half was adjusting camera paths, camera angles, timing, and all sorts of other little things. Unfortunately all these adjustments had to be done by trial and error, and all by hand. Getting Google Earth's camera to circle a particular point, for instance, takes about three screens worth of KML code. If a point's coordinates were off or a tour's timing wasn't quite right, I had to find the right spot in a vast field of KML, tweak a number, and try it all again.

Enter Kamelopard. Named, as dictated by long-standing End Point tradition, after a strange animal, Kamelopard is a Ruby gem designed to make all this easier ("Camelopard" is the ancient English word for "giraffe" and several mythical and heraldic giraffe-like creatures; the various modern giraffe sub-species are grouped under the name Giraffa camelopardalis). Right now it's not much more than a set of classes which mirror the basic KML objects -- or most of them, anyway -- but already it has proven itself very useful. For instance, for one project we're working on, a Ruby script of about 150 lines uses Kamelopard to digest a set of data files and produce a 12,000 line KML file. Fortunately, Kamelopard is sufficiently flexible that I can finish adjusting all the timing and camera angles and other details in the Ruby code, without having to dig through the KML output itself; we can regenerate the KML with whatever tweaks we need, as often as we need it.

As a very simple example, here's a Kamelopard script to turn a CSV file into KML. The CSV contains the name, description, latitude, and longitude of a bunch of placemarks (e.g. "Mount Everest","The world's tallest mountain","27d58m37.15s N","86d55m15.93s E"), and the script turns them into KML, including a Google Earth tour that will fly past each in turn. The KML result amounts to about 45 lines per placemark, so the Ruby version is much more concise and much simpler to modify.

require 'rubygems'
require 'kamelopard'
require 'csv'

CSV.foreach('placemarks.csv') do |a|
    pl = Placemark.new a[0], point(a[3], a[2])
    pl.description = a[1]

    mod_popup_for pl, 1
    fly_to pl, 5
    pause 3
    mod_popup_for pl, 0
end

puts get_kml

Along with its library of classes, Kamelopard does have a (fairly unorganized, ad hoc) set of helper functions which I hope one day will evolve into a full-fledged domain-specific language. While they're currently loosely organized and none too intelligent, there are a few jewels worth noting. For instance, my multi-page KML to orbit a point takes one line of Kamelopard. Conversion from one latitude / longitude notation to another happens automatically (manually converting 24°10'44.3994" N, 55°44'25.7274" E to the decimal notation KML requires becomes a dreary task when performed manually for ten or twenty different points). Finally, Kamelopard includes a helper function for Google's Geocoding API, so users can translate addresses and location names to latitude and longitude.

Kamelopard's future development will most likely follow whatever lines our clients, and any contributors that care to help out, most wish to pursue, but I suspect it will include increased automation of tour timing, so developers don't need to figure out timings for each step in a tour manually, and more helper functions to create objects, so users don't have to instantiate classes directly, and can get more done in a line of code.

I imagine the audience is fairly small, but I'd welcome any feedback, suggestions, patches, and interest.

PostgreSQL log analysis / PGSI

End Point recently started working with a new client (a startup in stealth mode, cannot name names, etc.) who is using PostgreSQL because of the great success some of the people starting the company have had with Postgres in previous companies. One of the things we recommend to our clients is a regular look at the database to see where the bottlenecks are. A good way to do this is by analyzing the logs. The two main tools for doing so are PGSI (Postgres System Impact) and pgfouine. We prefer PGSI for a few reasons: the output is better, it considers more factors, and it does not require you to munge your log_line_prefix setting quite as badly.

Both programs work basically the same: given a large number of log lines from Postgres, normalize the queries, see how long they took, and produce some pretty output.If you only want to look at the longest queries, it's usually enough to set your log_min_duration_statement to something sane (such as 200), and then run a daily tail_n_mail job against it. This is what we are doing with this client, and it sends a daily report that looks like this:

Date: Mon Aug 29 11:22:33 2011 UTC
Host: acme-postgres-1
Minimum duration: 2000 ms
Matches from /var/log/pg_log/postgres-2011-08-29.log: 7

[1] (from line 227)
2011-08-29 08:36:50 UTC postgres@maindb [25198]
LOG: duration: 276945.482 ms statement: COPY public.sales 
(id, name, region, item, quantity) TO stdout;

[2] (from line 729)
2011-08-29 21:29:18 UTC tony@quadrant [17176]
LOG: duration: 8229.237 ms execute dbdpg_p29855_1: SELECT 
id, singer, track FROM album JOIN artist ON artist.id = 
album.singer WHERE id < 1000 AND track <> 1

However, the PGSI program was born of the need to look at all the queries in the database, not just the slowest-running ones; the cumulative effect of many short queries can have much more of an impact on the server than a smaller number of long-running queries. Thus, PGSI looks not only at how long a query takes to run, but how many times it has run in a certain period, as well as how often it runs. All of this information is put together to give a score to each normalized query, known as the "system impact". Like the costs on a Postgres explain plan, this is a unit-less number and of little importance in and of itself - the important thing is to compare it to the other queries to see the relative impact. We also have that report emailed out, it looks similar to this (this is a text version of the HTML produced):

Log file: /var/log/pg_log/postgres-2011-08-29.log

 * SELECT (24)
 * UPDATE (1)

Query System Impact : SELECT

 Log activity from 2011-08-29 11:00:01 to 2011-08-29 11:15:01

   +----------------------------------+
   |   System Impact: | 0.15          |
   |   Mean Duration: | 1230.95 ms    |
   | Median Duration: | 1224.70 ms    |
   |     Total Count: | 411           |
   |   Mean Interval: | 4195 seconds  |
   |  Std. Deviation: | 126.01 ms     |
   +---------------------------------+

 SELECT *
  FROM albums
  WHERE track <> ? AND artist = ?
  ORDER BY artist, track

At this point you may be wondering how we get all the queries into the log. This is done by setting log_min_duration_statement to 0. However, most (but not all!) clients do not want full logging 24 hours a day, as this creates some very large log files. So the solution we use is to analyze a slice of the day, only. It depends on the client, but we try for about 15 minutes during a busy time of day. Thus, the sequence of events is:

  1. Turn on "full logging" by dropping log_min_duration_statement to zero
  2. Some time later, set log_min_duration_statement back to what it was (e.g. 200)
  3. Extract the logs from the time it was set to zero to when it was flipped back.
  4. Run PGSI against the log subsection we pulled out
  5. Mail the results out

All of this is run by cron. The first problem is how to update the postgresql.conf file and have Postgres re-read it, all automatically. As covered previously, we use the modify_postgres.pl script for this.

The exact incantation looks like this:

0 11 * * * perl bin/modify_postgres_conf --quiet \
  --pgconf /etc/postgresql/9.0/main/postgresql.conf \
  --change log_min_duration_statement=0
15 11 * * * perl bin/modify_postgres_conf --quiet \
  --pgconf /etc/postgresql/9.0/main/postgresql.conf \
  --change log_min_duration_statement=200 --no-comment
## The above are both one line each, but split for readability here

This changes log_min_duration_statement to 0 at 11AM, and then back to 200 15 minutes later. We use the --quiet argument as this is run from cron so we don't want any output from modify_postgres_conf on success. We do want a comment when we flip it to 0, as this is the temporary state and we want people viewing the postgresql.conf file at that time to realize it (or someone just doing a "git diff"). We don't want a comment when we flip it back, as the timestamp in the comment would cause git to think the file had changed.

Now for the tricky bit: extracting out just the section of logs that we want and sending it to PGSI. Here's the recipe I came up with for this client:

16 11 * * * tac `ls -1rt /var/log/pg_log/postgres*log \
  | tail -1` \
  | sed -n '/statement" changed to "200"/,/statement" changed to "0"/ p' \
  | tac \
  | bin/pgsi.pl --quiet > tmp/pgsi.html && bin/send_pgsi.pl
## Again, the above is all one line

What does this do? First, it finds the latest file in the /var/log/pg_log directory that starts with 'postgres' and ends with 'log'. Then it uses the tac program to spool the file backwards, one line at a time ('tac' is the opposite of 'cat'). Then it pipes that output to the sed program, which prints out all lines starting with the one where we changed the log_min_duration_statement to 200, and ending with the one where we changed it to 0 (the reverse of what we actually did, as we are reading it backwards). Finally, we use tac again to put the lines back in the correct order, pipe the output to pgsi, write the output to a temporary file, and then call a quick Perl script named send_pgsi.pl which mails the temporary HTML file to some interested parties.

Why do we use tac? Because we want to read the file backwards, so as to make sure we get the correct slice of log files as delimited by the log_min_duration_statement changes. If we simply started at the beginning of the file, we might encounter other similar changes that were made earlier and not by us.

All of this is not foolproof, of course, but it does not have to be, as it is very easy to run manually is something (for example the sed recipe) goes wrong, as the log file will still be there. Yes, it's also possible to grab the ranges in other ways (such as perl), but I find sed the quickest and easiest. As tempting as it was to write Yet Another Perl Script to extract the lines, sometimes a few chained Unix programs can do the job quite nicely.

jQuery and hidden elements

Out of sight, not out of mind.

While extending some jQuery functionality on a page, I noted that a form element's "change" handler wasn't being invoked, which meant some of the page initialization code would be left out. So I helpfully added it:

$('input[name=foobar]').change(...).change();

What I failed to contemplate was how this would impact another page, which just happened to have some (but not all) of the same form elements, and referenced the same JS code at page load. Specifically, my page's sibling had:

<input name="foobar" type="hidden">

And of course this broke in production.

Well, that's interesting. A change handler on a hidden form input field isn't usually all that useful, so I figured out I really needed:

$('input[name=foobar]').filter(':visible').change(...).change();

It happens that the ".filter()" step is actually more efficient than doing it all in one selector ("input[name=foobar]:visible"), because of some obscurities within jQuery. That little discovery was of value to an End Point co-worker, who realized she could shave a little time off a page load elsewhere, so my minor page malfunction will redeem itself.

Remove specific CGI variables using Apache

Sometime you need to remove a single CGI variable from a query string in your published URLs; for instance, one of our client's site had gotten spidered with a session id in the generated links, so we wanted to ensure that those URLs would be appropriately updated when re-spidered. Apache's mod_rewrite to the rescue!

The following snippet serves to rewrite any URL which has a query string parameter named id to one the exact same without that CGI variable. Since mod_rewrite uses PCRE, we can use this to our advantage by using \b word break anchors to ensure we're only picking up a CGI variable named exactly the same, so (say) id=bar will be removed but tid=foo will pass on through the rewrite unaltered.

# already assume RewriteEngine on

RewriteCond %{QUERY_STRING} ^(.*)\bid=(\w*)\b(.*)
RewriteRule (.*) $1?%1%3 [R=301,L]

Note in the above example that we know the range of values that was present for the id variable, so \w is sufficient to determine the full value to remove in this case. $N is the replacement from the Nth RewriteRule group and %N is the replacement from the Nth RewriteCond group. We use [R=301] to trigger an external 301 redirect, which search engines should see as redirecting to the now canonical rewritten URL.

To validate this covered all cases, I tested against URLs without the id variable defined at all, ones with id=foo at the beginning, middle, end, and as the sole item in the query string and verified it worked as expected in all cases. I'd earlier had multiple cases to handle each of the above scenarios, but mod_rewrite is smart enough to handle the query string merging so you don't end up with URLs like http://example.com/?&foo=1 (note ampersand) when the query string portion is ?id=2&foo=1.

We put this rule in front of everything else so if the request matches, it'll trigger before any other special handling is considered. Hope this helps someone else!

Changing postgresql.conf from a script

The modify_postgres_conf script for Postgres allows you to change your postgresql.conf file from the command line, via a cron job, or any time when you want to automate the process.

Postgres runs as a background daemon. The configuration parameters it runs with are stored in a file named postgresql.conf. To change the behavior of Postgres, one must usually edit this file, and then tell Postgres that you have made the changes. Sometimes all that is needed is to 'HUP' or reload Postgres. Most changes fall into this category. Other changes require a full restart of Postgres, which entails disconnecting all current clients.

Thus, to make a change, one must edit the file, find the item to change (the file consists of "name = value" lines), change it, then send a signal to the main Postgres process so it picks up the change. Finally, you should then connect to Postgres to make sure it is still running and has accepted the latest change.

Doing this automatically (such as via a cron script) is very difficult. One method, if you are doing something simple like toggling between two known configuration files, is to simply store copies of both files and replace them, like this example cronjob:

30 10 * * * cp -f conf/postgresql.conf.1 /etc/postgresql.conf; /etc/init.d/postgresql reload
50 10 * * * cp -f conf/postgresql.conf.2 /etc/postgresql.conf; /etc/init.d/postgresql reload

The major problem with that approach, as I quickly learned when I tried it, is that despite nobody making changes to the postgresql.conf file in *years*, a few days after I put the above change in place, someone decided to edit postgresql.conf. At 10:30AM the next day, their changes were blown away. A better way is to simply write a program to make the change for you. Thus, the modify_postgres_conf.pl script.

The basic usage is to tell the script where the conf file is, and list what changes you want to make. Here's an example that will change the random_page_cost to 2 on a Debian system:

./modify_postgres_conf.pl --pgconf /etc/postgresql/9.0/main/postgresql.conf --change random_page_cost=2

Here is exactly what the script does for the above statement:

  • For each item to be changed, we:
    • Ask the database what the current value is (and die if that parameter does not exist)
    • If the current and new value are the same, do nothing
    • Otherwise, open (and flock) the configuration file and change the parameter
  • If no changes were made, exit
  • Otherwise, close the configuration file
  • Figure out the Postgres PID and send it a HUP signal
  • Reconnect to the database and confirm each change has taken effect

By default, it adds a comment after the changed value as well, to help in tracking down who made the change. A diff of the postgresql.conf file after running the example above produces:

diff -r1.1 postgresql.conf
499c499
< random_page_cost = 4
---
> random_page_cost = 2 ## changed by modify_postgres_conf.pl on Wed Aug 10 13:31:34 2011

The addition of the comment can be stopped by added a --no-comment argument. If the script runs successfully, it also returns two items of information: the size and name of the current Postgres log file. This is useful so you can know exactly where in the log this change took place. Note that this only works for items that are already explicitly set in your configuration file. However, as discussed before, you should already have all the items that you may possibly change explicitly listed out at the bottom of the file already. Whitespace is preserved as well, for those (like me) who like to keep things lined up neatly inside the file (see examples in the link above).

Here are some more examples of the script in action:

$ ./modify_postgres_conf.pl --pgconf /etc/postgresql/9.0/main/postgresql.conf --change random_page_cost=2
114991 /var/log/postgres/postgres-2011-08-10.log

$ ./modify_postgres_conf.pl --pgconf /etc/postgresql/9.0/main/postgresql.conf --change random_page_cost=2
No change made: value of "random_page_cost" is already 2

$ ./modify_postgres_conf.pl --pgconf /etc/postgresql/9.0/main/postgresql.conf \
> --change random_page_cost=2 \
> --change log_statement=ddl \
> --change log_min_duration_statement=100

No change made: value of "random_page_cost" is already 2
118459 /var/log/postgres/postgres-2011-08-10.log

$ ./modify_postgres_conf.pl --pgconf /etc/postgresql/9.0/main/postgresql.conf \
> --change default_statitics_target=200 --no-comment
There is no Postgres variable named "default_statitics_target"!

$ ./modify_postgres_conf.pl --pgconf /etc/postgresql/9.0/main/postgresql.conf \
> --change default_statistics_target=200 --no-comment
123396 /var/log/postgres/postgres-2011-08-10.log

Note that we make no attempt to automatically check changes in to version control: as you will see in an upcoming blog post on a real-life use case, such a checkin is usually not wanted, as we are making temporary changes.

This is a fairly simple Perl script, but I thought I would put it out there in the hopes of helping others out (and preventing the reinventing of wheels). Of course, if you find a bug or want to write a patch for it, those are welcome additions at any time! The code can be found on github:

git clone git://git@github.com:bucardo/modify_postgres_config.git

The rails_admin gem in Ecommerce

Update: Since writing this article, this gem has had several updates. All rails_admin configuration now must be pulled out of ActiveRecord models and into the initializer. The look and feel has also been updated. Read about it more here. Additionally, End Point has released a Ruby on Rails Ecommerce Engine with a sleek admin interface driven by rails_admin. Read more about it here.

I recently installed the rails_admin gem into a new Rails project. This particular site is currently running on Interchange, but it is not an ecommerce business, so Interchange is overkill for the site. The client has recently decided to make the switch to Rails (and DevCamps) with our help, and they have a moderate budget to do so. For the first increment, we considered installing phpMyAdmin for them to work directly with the data until a nice admin interface was built, but instead I spent a bit of time installing the rails_admin gem which has been on my mind for the last 6 months. The result: I'm extremely impressed with what rails_admin has to offer.

To show some examples of rails_admin in action, I'll use the data model from a Sinatra ecommerce setup I recently wrote about, because it has a basic ecommerce data model that should be universally understood by my coworkers and clients.

To create a new site, I went through the standard Rails 3.0 installation steps:

  • gem install rails -v=3.0.9 to install Rails
  • rails new mystore to create a new Rails application
  • In my gemfile, I added:
    gem 'devise'
    gem 'rails_admin', :path => 'vendor/gems/rails_admin'
    gem 'ckeditor' # for WYSIWYG editing
    gem 'paperclip' # and installed imagemagick, for image attachments
    
  • sudo bundle install to install the project gems
  • rake rails_admin:install to install rails_admin, which also runs devise setup and migrations
  • rails generate ckeditor:install to install ckeditor
  • Copied my Sinatra data migrations and models over to my Rails application (except for the User, which is now replaced with devise's User)
  • rake db:migrate to apply all migrations
  • rails s to start my server
  • Also, I created a user from the rails console

Note that I also tried to use Rails 3.1 and the master branch of rails_admin, but I was experiencing a few bugs, so I decided to stick with Rails 3.0.9 for this article. Coincidentally, the rails_admin master branch just jumped to Rails 3.1 very recently. There are a few Rails 3.1 things to be aware of with the Rails 3.1 Asset Pipeline.

Out of the Box

After I got everything up and running, my first look at /admin looks nice. It shows the number of records per model, recent history, and a list of models for navigation on the right side:

rails_admin has a nice DSL to modify the admin interface. I applied a few updates to my application, described below.

Remove Model / Tab

First, I wanted to hide Users from the /admin, which required the change below. Note that you can also pass a block to determine the visibility of a model, which may be valuable if you want to limit the visibility of models to specific roles or users.

class User < ActiveRecord::Base
  ...

  rails_admin do
    visible false # or visible { some block }
  end
end


Users Tab removed from navigation

List Views


Product List View, rails_admin Out of the Box

Next, I updated my products list view with the following changes, to limit the list view to the name and price field, sort by the name, and format the price:

Limit listing of fields. Add default sort by :name field. Add formatting to price field.
class Product < ActiveRecord::Base
  rails_admin do
    list do
      field :name
      field :price
    end
  end
end
class Product < ActiveRecord::Base
  rails_admin do
    list do
      sort_by :name
      field :name
      field :price
    end
  end
end
class Product < ActiveRecord::Base
  rails_admin do
    list do
      sort_by :name
      field :name
      field :price do
        formatted_value do
          sprintf "$%.2f",
        end
      end
    end
  end
end

Here's a screenshot of the Products list view after these updates:

WYSIWIG-ing It

Next, I want to add a WYSIWYG editor for one of my fields, which can be accomplished with the following change:

class Page < ActiveRecord::Base
  rails_admin do
    edit do
      include_all_fields
      field :content, :text do
        ckeditor true
      end
    end
  end
end


ckeditor used for content field.

Paperclip Image Attachments

rails_admin also works nicely with Paperclip, a very popular image attachment Rails gem. Paperclip requires that imagemagick be installed on the server. I add the following code to my product model, which already had the Paperclip reference to an attached image, and the migration to introduce the Paperclip required attachment fields.

class Product < ActiveRecord::Base
  rails_admin do
    edit do
      include_all_fields
      field :image do
        thumb_method :thumb
      end
    end
  end
end


The products edit view is now a multitype form to allow for image upload.

And to update the show view, I made this change:

class Product < ActiveRecord::Base
  ...
  rails_admin do
    show do
      include_all_fields
      field :image do
        thumb_method :thumb
      end
    end
  end
end


Thumb image added to product show view.

Overriding Views

Another common update in rails_admin might be the need to override views to change the look of the backend interface. I accomplished this by copying the rails_admin partial views into my Rails application and updating them to include "My Store" branding:

# app/views/rails_admin/main/_title.html.haml
%h1.title
  = link_to rails_admin_dashboard_path do
    %span.red My
    %span.white Store


After overriding the header view, "My Store" now appears in the header.


And an override of app/views/layouts/rails_admin/main.html.haml removes "Rails Admin | 2011" from the bottom of the page.

Conclusion

There's tons(!) more you can do with the gem, and it's documented thoroughly here. rails_admin comes out of the box with export to CSV, JSON, and XML logic, which makes it a nice base for building simple APIs. It recognizes the Rails associations has_many, belongs_to, has_and_belongs_to_many, etc. It also includes user history and filtering of items and does authentication with devise, which has become a very popular user authentication choice in Rails. I found a few potential disadvantages:

  • Some type of import functionality is missing (needed for my client)
  • ActiveRecord is the only ORM supported, but that's fine with me.
  • It's right on the bleeding edge of Rails, which makes it ideal for new Rails applications and can't always be used for older apps.
  • I am curious to see if performance suffers on applications with a large number of records.

Despite the potential disadvantages, I've been extremely impressed with it's functionality and how much development time can be saved here to allow more time for custom business-centric functionality for a client.

Note that a couple of popular alternatives to rails_admin are ActiveAdmin and Typus

What next?

Since I've already built a Sinatra front-end ecommerce application, I might try to get my Rails admin running with a Sinatra frontend by following tips in this article. It'll be a little more complex here since I need to map user and admin and user routes to Rails and other routes to Sinatra, but the article covers the general idea for dispatching the routes. Why do it this way? Because you get the best of both worlds: a nice Rails backend for the CRUD interface and API management, and a speedy Sinatra driven frontend with simple paths to define product navigation, product pages, content pages, the cart and checkout process (which is not standard RESTful behavior). You can also leverage Ruby gem functionality in both Sinatra and Rails.

Rails Optimization: Digging Deeper

I recently wrote about raw caching performance in Rails and advanced Rails performance techniques. In the latter article, I explained how to use a Rails low-level cache to store lists of things during the index or list request. This technique works well for list pages, but it doesn't necessarily apply to requests to an individual thing, or what is commonly referred to as the "show" action in Rails applications.

In my application, the "show" action loaded at ~200 ms/request with low concurrency, with the use of Rails fragment caching. And with high concurrency, the requests shot up to around 2000 ms/request. This wasn't cutting it! So, I pursued implementing full-page caching with a follow-up AJAX request, outlined by this diagram:

First, the fully-cached is loaded (quickly). Next, an AJAX request is made to retrieve access information. The access information returns a JSON object with information on whether or not there is a user, and if that user has edit access to that thing. If there is no user, the page stays as is. If there is a user, but he does not have edit permissions, the log out button is shown and the username is populated. If there is a user and he has edit permissions, the log out button is shown, the username is populated, and additional buttons requiring edit access are shown.

The Code

To cache the full page, I use the caches_page method, and cache only on requests of HTML format (other formats are not cached):

class ThingsController < ApplicationController
 caches_page :show, :if => Proc.new { |c| c.request.format.html? }
 ...
end

My access level request looks something like this:

def accessibility
  respond_to do |format|
    format.json do
      render :json => {
        :logged_in => current_user ? current_user.to_json(:only => [:id, :username]) : false,
        :can_edit => current_user ? Thing.find(params[:id]).can_edit?(current_user) : false }
    end
  end 
end 

My HTML has some bits of code sprinkled throughout it:

...
<a href="#" id="edit_thing" class="requires_editability">Edit</a>
...
<a href="#" id="my_account" class="requires_logged_in"><!-- no username yet --></a>
...

My jQuery AJAX request looks something like the code shown below. Note that I remove elements that do not apply to the current request:

$.ajax({
  type: 'GET',
  cache: false,
  url: editability_path,  //editability_path is defined in the HTML (a JavaScript variable)
  dataType: "JSON",
  error: function(xhr){
    $('.require_editability,.require_loggged_in').remove();
  },
  success: function(results) {
    if(results.logged_in) {
      $('.require_logged_in').show();
      $('#my_account').html(results.logged_in.username);
      if(results.can_edit) {
        $('.require_editability').show();
      } else {
        $('.require_editability').remove();
      }
    } else {
      $('.require_editability,.require_loggged_in').remove();
    }
  }
});

And don't forget the sweeper to clear the fully cached page after edits (or other ActiveRecord callbacks):

class ThingSweeper < ActionController::Caching::Sweeper

  observe Thing

  def after_save(record)
    expire_page :controller => :things, :action => :show, :id => record.id
  end
end

Additional Notes

There are some additional notes to mention:

  • If a user were to hack the AJAX or JavaScript, server-side validation is still being performed when an "edit" action is submitted. In other words, if a hacker somehow enabled an edit button to show up and post an edit, a server-side response would prohibit the update because the hacker does not have appropriate accessibility.
  • HTML changes were made to accommodate this caching behavior, which was a bit tricky. HTML has to handle all potential use cases (no user, user & no edit access, user & edit access). jQuery itself can also be used to introduce new elements per use case.
  • The access level AJAX request is also hitting more low-level Rails caches: For example, the array of things that a user has edit permissions is cached and the cache is cleared with standard Rails sweepers. With this additional caching component, the access level AJAX request is hitting the database minimally.
  • Performance optimization scenarios such as this make an argument against inline editing of resources. If there were a backend admin interface to allow editing of things, full-page caching would be more straight-forward to implement.

Conclusion

With this functionality, fully cached pages are served with an average of less than 5 ms/request, and the AJAX accessibility request appears to be around 20 ms/request (although this is harder to test with simple command line tools). This is an improvement over the 200 ms/request initially implemented. Additionally, requests at a high concurrency don't bog down the system as much.

DevCamps news

DevCamps is a system for managing development, integration, staging, and production environments. It was developed by End Point for, and with the help of, some of our ecommerce clients. It grew over the space of several years, and really started to become its own standalone project in 2007.

Camps are a behind-the-scenes workhorse of our web application development at End Point, and don't always get much attention because everyone's too busy using camps to get work done! But this summer a few things are happening.

In early July we unveiled a redesign of the devcamps.org website that features a more whimsical look, a better explanation of what camps are all about, and endorsements by business and developer users. Marko Bijelic of Hipinspire did the design. Take a look:


www.devcamps.org

In less than two weeks, on August 17, I'm going to be giving a talk on camps at YAPC::EU in Riga, Latvia. YAPC::EU is Europe's annual Perl conference, and will be a nice place to talk about camps.

Many Perl developers are doing web applications, which is camps' main focus, so that's reason enough. But camps also started around the Interchange application server, which is written in Perl. And the camp system is currently implemented in Perl as well.

We've set up a lot of camp systems for Perl web applications. So even though we've also set up camp systems for web applications using Ruby on Rails, Sinatra, Django, and PHP, it's a nice homecoming to talk about camps to Perl enthusiasts.

Debian Postgres readline psql problem and the solutions

There was a bit of a controversy back in February as Debian decided to replace libreadline with libedit, which affected a number of apps, the most important of which for Postgres people is the psql utility. They did this because psql links to both OpenSSL and readline, and although psql is compatible with both, they are not compatible with each other!

By compatible, I mean that the licenses they use (OpenSSL and readline) are not, in one strict interpretation, allowed to be used together. Debian attempts to live by the letter and spirit of the law as close as possible, and thus determined that they could not bundle both together. Interestingly, Red Hat does still ship psql using OpenSSL and readline; apparently their lawyers reached a different conclusion. Or perhaps they, as a business, are being more pragmatic than strictly legal, as it's very unlikely there would be any consequence for violating the licenses in this way.

While libreadline (the library for GNU readline) is a feature rich, standard, mature, and widely used library, libedit (sadly) is not as developed and has some important bugs and shortcomings (including no home page, apparently, and no Wikipedia page!). This resulted in frustration for many Debian users, who found that their command-line history commands in psql no longer worked, and worse, psql no longer supported non-ASCII input! Since I came across this problem recently on a client machine, I thought I would lay out the current solutions.

The first and easiest solution is to simply upgrade. Debian has made a "workaround" by forcing psql to use the readline library when it is invoked.

The next best solution, for those rare cases when you cannot upgrade, is to apply Debian's solution yourself by patching the 'pg_wrapper' program that Debian uses. In order to support running different versions of Postgres on the same box in a sane and standard fashion, Debian uses some wrapper scripts around some of the Postgres command-line utilities such as psql. Thus, the psql command in /usr/bin/psql is actually a symlink to the shell script pg_wrapper, which parses some arguments and then calls the actual psql binary, which is no longer in the default path. So, to apply the Debian fix, just patch your pg_wrapper file like so:

*** pg_wrapper  2011/07/18 03:46:49     1.1
--- pg_wrapper  2011/07/18 03:48:23
***************
*** 94,100 ****
  }
  
  error 'Invalid PostgreSQL cluster version' unless -d "/usr/lib/postgresql/$version";
! my $cmd = get_program_path (((split '/', $0)[-1]), $version);
  error 'pg_wrapper: invalid command name' unless $cmd;
  unshift @ARGV, $cmd;
  exec @ARGV;
--- 94,110 ----
  }
  
  error 'Invalid PostgreSQL cluster version' unless -d "/usr/lib/postgresql/$version";
! my $cmdname = (split '/', $0)[-1];
! my $cmd = get_program_path ($cmdname, $version);
! 
! # libreadline is a lot better than libedit, so prefer that                                                                  
! if ($cmdname eq 'psql') {
!     my @readlines = sort();
!     if (@readlines) {
!       $ENV{'LD_PRELOAD'} = ($ENV{'LD_PRELOAD'} or '') . ':' . $readlines[-1];
!     }
! }
! 
  error 'pg_wrapper: invalid command name' unless $cmd;
  unshift @ARGV, $cmd;
  exec @ARGV;

As you can see, what Debian has done is set the LD_PRELOAD environment variable to point to the libreadline shared object, which means that when psql is started, it uses the libreadline library instead of libedit. This is great news for Debian users. I'm unconvinced of how "legal" this is per Debian's standards, but then I'm in the camp that think they are interpreting all the licensing around this in the wrong way, and should have just left libreadline alone.

The second best solution, after patching pg_wrapper, is to simply define LD_PRELOAD yourself, either globally or per user.

Another solution is to use the 'rlwrap' program, which is a wrapper around some arbitrary program (in this case, psql) which routes the user input through readline. So a quick alias would be:

alias p='rlwrap psql --no-readline'

(Yes, we could also use -n, but it's an alias and thus we don't have to type it out each time, so it's better to be more verbose). The rlwrap solution is a quick hack, and I do not recommend it, as it still leaves out many psql features, such as autocompletion and ctrl-c support.

All of this is not strictly Debian's fault. If you read the various Debian bug reports as well as some of the Postgres mailing list threads about this topic, you will find there is plenty of finger pointing going around. It seems to me the least guilty party here is readline itself, whose only fault is that it is GPL and not a better license ;). Debian should take a little blame, both for being too strict in what is obviously a very uncharted legal licensing mess, and for making this change so quickly without any announcement and apparently without realizing how many things would break. The worst offender appears to be OpenSSL, which apparently is being stubborn about changing its license to allow linking with the GPL readline. I'll throw a little bit of blame towards libedit as well, merely for its inability to keep up with 20th century ideas like Unicode (because whose database doesn't need more 麟?).

The current Debian "solution" has stilled the waters a little bit, but we (Postgres) really need a long-term solution. Or solutions, as the case may be. As with my previous post, the big question there is "who shall put the bell on the cat"? I'd like to see Debian itself fund some work into improving libedit, since they are strongly encouraging use of it over libreadline. That's solution one: improve libedit such that it becomes a decent readline replacement. This is nice because as great as libreadline is, it's one of the only pieces of Postgres that used the GPL, and it would be nice to get rid of it for that reason alone (the other big one is PostGIS).

Another solution is to replace OpenSSL, since they apparently are never going to change their license, despite it being in everyone's best interest. GnuTLS is an oft-mentioned replacement, which seems to be production ready, unlike libedit. The problem here is that psql has a lot of "openssl-isms" in the code. However, that is something that can be accomplished by the Postgres community.

Another option is to get readline to make an exception so it can play nicely with OpenSSL. Not only is this unlikely to happen, I think it's a band-aid and I'd rather see the above two actions happen instead.

So, in summary, there are really two ways out of this mess: fix up libedit (hello Debian community) and allow Postgres support for GnuTLS (or other non-OpenSSL system for that matter) (hello Postgres community).

For those wanting to dig into this some more, Greg Smith's excellent summation in this thread is a great read.