End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Rails 3.1: Upgrading a Simple App - Part 2

I recently wrote about upgrading a simple Rails app, which involved applying routing, mailer, ActiveRecord, etc. updates to my Rails 2.1.2 application. An equally important part of the upgrade is working with the asset pipeline, a framework that creates an architecture for managing JavaScript, CSS, and image assets in your Rails 3.1 application.

File Reorganization

Prior to the upgrade, my assets were organized in the following structure:

RAILS_ROOT/
  public/
    javascripts/
      jquery.site.js
      jquery.home.js
      jquery.services.js
      jquery.team.js
      jquery.bios.js
      ...
    stylesheets/
      site.css
    images/
      .. a lot of images ..

As you can see, the JavaScript files were already split into page specific code that was included where it was needed. But the application had one global stylesheet which included styles for the entire site. In general, the site followed performance best practices of minimizing http requests with this organization.

In Rails 3.1, the generators encourage you to build out individual JavaScript and CSS files for each controller by creating the files upon each controller instantiation. While in development, those files are served individually, but production serves compiled files application.js and application.css by default (Note: you have the option of controlling the compiled file name and you have the option of including more than one compiled file). With this organization structure in mind, I reorganized my JavaScript and stylesheet assets to the following structure:

RAILS_ROOT/
  app/
    assets/
      javascripts/
        application.js
        bios.js
        clients.js
        contact.js
        home.js
        services.js
        team.js
      stylesheets/
        application.css
        bios.css
        clients.css
        contact.css
        home.css
        services.css
        sitemap.css
        team.css

I also moved external JavaScript code to the vendor/assets directory, which can be explicitly included in application.js, to add separation between application specific JavaScript and external libraries. Note that Rails will look in app/assets, vendor/assets, and lib/assets for assets by default, and additional locations may be added by updating the Rails.application.config.assets.paths variable.

RAILS_ROOT/
  vendor/
    assets/
      javascripts/
        excanvas.min.js
        jquery.flot.js
        jquery.lightbox.js

To enforce loading order of JavaScript files, application.js contains the following. Note that jQuery-ujs is not included because the application does not have any AJAX form submissions.

//= require jquery
//= require excanvas.min
//= require jquery.flot
//= require jquery.lightbox
//= require_tree .

I did not decide to reorganize images at this point in time, since many of our blog articles references on-site images and I didn't want to address blog article updates with this upgrade.

Learning the Asset Tasks

After I reorganized my assets, I was ready to go (sort of)! I read up on rake tasks for working with assets. rake assets:clean removes all compiled assets and rake assets:precompile compiles all assets named in config.assets.precompile. I precompiled my assets and then was ready to go. I tried running the application both in development and production to verify assets were served separately in development and assets were served compiled in production.

JavaScript Namespacing

As soon as I got my app up and running, I noticed there were issues with interactive functionality that relied on JavaScript. After I debugged the functionality a bit, I determined that conflicting JavaScript functions in the compiled version of application.js was the cause of the issues, which was not an issue before the upgrade since these files were served on distinct pages. I added basic JavaScript namespacing to sort this out.

Instead of:

/* home.js */
var shift_right = function() {
  //...
};
var shift_left = function() {
  //...
};

/* clients.js */
var shift_right = function() {
  //...
};
var shift_left = function() {
  //...
};

I tried:

/* home.js */
var home = {
  shift_right: function() {
    //...
  },
  shift_left: function() {
    //...
  };
};

/* clients.js */
var clients = {
  shift_right: function() {
    //...
  },
  shift_left: function() {
    //...
  };
};

These updates sorted out the JavaScript errors.

Sass

One of the great things about Rails 3.1 is that it makes using Sass or scss very easy. I've written about Sass a couple of times before (here and here) and am happy to leverage its functionality. The stylesheets were updated to have a *.scss extension to force scss template rendering. I introduced variables, which allow you to easily represent and update colors used globally:

$blue: #195065;
a { text-decoration: none; color: $blue; }
.menu { border-top: 1px solid $blue; }
...

And I introduced nesting, which is a great tool for minimizing retyping style definitions and therefore reduces the risk of mislabeling styles:

/* Sitemap */
.sitemap { background: #FFF url(/images/manhattan.jpg) right 100px no-repeat; 
  p { margin: 0px 0px 3px 25px; }
  p.indent { margin-left: 35px; }
  a { color: #404040; font-weight: normal; }
}

Read more about other great Sass features like Mixins, Selector Inheritance, Functions, and Operations here.

Conclusion

Closing thoughts:

  • I spent equal time on asset pipeline updates and non-asset pipeline updates described in the previous article. A coworker commented that he was surprised it took so long to work through the upgrade. The upgrade to Rails 3.1 is not trivial, but there are plenty of great resources out there. I found the Rails Guides to be particularly helpful.
  • I did not leverage CoffeeScript, which is a meta language that compiles into JavaScript much like how scss compiles into css. I don't have experience working with CoffeeScript and I didn't feel like this was the project to start learning it.
  • I did not reorganize any image assets. However, I've done this in another Rails 3.1 application and have found it to be relatively painless.

PostgreSQL Serializable and Repeatable Read Switcheroo

PostgreSQL allows for different transaction isolation levels to be specified. Because Bucardo needs a consistent snapshot of each database involved in replication to perform its work, the first thing that the Bucardo daemon does when connecting to a remote PostgreSQL database is:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE READ WRITE;

The 'READ WRITE' bit sets us in read/write mode, just in case the entire database has been set to read only (a quick and easy way to make your slave databases non-writeable!). It also sets the transaction isolation level to 'SERIALIZABLE'. At least, it used to. Now Bucardo uses 'REPEATABLE READ' like this:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ READ WRITE;

Why the change? In version 9.1 of PostgreSQL the concept of SSI (Serializable Snapshot Isolation) was introduced. How it actually works is a little complicated (follow the link for more detail), but before 9.1 PostgreSQL was only *sort of* doing serialized transactions when you asked for serializable mode. What it was really doing was repeatable read and not trying to really serialize the transactions. In 9.1, PostgreSQL is doing *true* serializable transactions. It also adds a new distinct 'internal' transaction mode, 'repeatable read', which does exactly what the old 'serializable' used to do. Finally, if you issue a 'repeatable read' on a pre-9.1 database, it silently upgrades it to the old 'serializable' mode.

So in summary, if your application was using 'SERIALIZABLE' before, you can now replace that with 'REPEATABLE READ' and get the exact same behavior as before, regardless of the version. Of course, if you want *true* serializable transactions, use SERIALIZABLE. It will continue to mean the same as 'REPEATABLE READ' in pre-9.1 databases, and provide true serializability in 9.1 and beyond. (I haven't determined yet if Bucardo is going to use this new level, as it comes with a little bit of overhead)

Since this can be a little confusing, here's a handy chart showing how version 9.1 changed the meaning of SERIALIZABLE, and added a new 'internal' isolation level:

Postgres version 9.0 and earlierPostgres version 9.1 and later
Requested isolation levelActual internal isolation levelVersion comparisonActual internal isolation levelRequested isolation level
READ UNCOMMITTEDRead committedExact sameRead committedREAD UNCOMMITTED
READ COMMITTEDREAD COMMITTED
REPEATABLE READSerializableFunctionally identicalRepeatable readREPEATABLE READ
SERIALIZABLE
 9.1 only!Serializable (true)SERIALIZABLE

Congratulations and thanks to Kevin Grittner and Dan Ports for making true serializability a reality!

Another Post-Postgres Open Post

Well, that was fun! I've always found attending conferences to be an invigorating experience. The talks are generally very informative, it's always nice to put a face to names seen online in the community, and between the "hall track", lunches, and after-session social activities it's difficult to not find engaging discussions.

My favorite presentations:

  • Scaling servers with Skytools -- seeing what it takes to balance several high-velocity nodes was intriguing.
  • Mission Impossible -- lots of good arguments for why Postgres can be an equivalent, nay, better replacement for an enterprise database.
  • The PostgreSQL replication protocol -- even if I never intend to write something that'll interact with it directly, knowing how something like the new streaming replication works under the hood goes a long way to keeping it running at a higher level.
  • True Serializable Transactions Are Here! -- I'll admit I haven't had a chance to fully check out the changes to Serializable, so getting to hear some of the reasoning and stepping through some of the use cases was quite helpful.

But what of my talks? Monitoring went well -- it seemed to get the message out. There was a lot of "gee, I have Postgres, and Nagios, but they're not talkin'. Now they can!" So hopefully, with a little more visibility into how the database is standing, the tools can boost confidence within business environments that aren't as sure about Postgres and help keep existing installations in place. I think the Bucardo presentation had me a bit more animated for some reason. That one also led to some interesting questions from the audience, and a couple challenges for the Bucardo project.

All in all, great work everyone!

Rails 3.1: Upgrading a Simple App - Part 1

Here at End Point, I've worked with a few Rails 3 applications in production and a couple of Rails 3.1 apps in development, so I've become familiar with the new features and functionality including the Rails 3.1 Asset Pipeline that I mentioned earlier this year. I thought it was a good time to upgrade our website to Rails 3.1 and share the experience.

To start, here's a quick summary of our website:

  • Simple Rails application running on Rails 2.1.2 with no database
  • Static pages throughout the site, fully cached
  • Rake tasks to generate partials throughout the site to display dynamic blog content
  • Site uses a moderate amount of jQuery and jQuery plugins.
  • Site is optimized in terms of asset serving (ETags, Expires headers, CSS sprites, etc.)

While I've worked with a few Rails 3 apps, I haven't been involved in the actual upgrade process myself. There are plenty of resources out there with upgrade advice, including a few RailsCasts (one, two, and three). My favorite resource was the rails_upgrade gem, a gem that is now officially supported by Rails to help with the upgrade process. I followed the instructions to install the gem (script/plugin install git://github.com/rails/rails_upgrade.git) to install it as a plugin in our site's application in a fresh git branch (on a camp, of course!).

The rails_upgrade provides a few new rake tasks for checking compatibility, upgrading the routes, creating a Gemfile, and upgrading configuration. For me, the most valuable task was the rake rails:upgrade:check task. Here's what the output looked like for this app:

Deprecated session secret setting
Previously, session secret was set directly on ActionController::Base; it's now config.secret_token.
More information: http://lindsaar.net/2010/4/7/rails_3_session_secret_and_session_store

The culprits: 
 - config/initializers/session_store.rb

Old router API
The router API has totally changed.
More information: http://yehudakatz.com/2009/12/26/the-rails-3-router-rack-it-up/

The culprits: 
 - config/routes.rb

New file needed: config/application.rb
You need to add a config/application.rb.
More information: http://omgbloglol.com/post/353978923/the-path-to-rails-3-approaching-the-upgrade

The culprits: 
 - config/application.rb

Deprecated constant(s)
Constants like RAILS_ENV, RAILS_ROOT, and RAILS_DEFAULT_LOGGER are now deprecated.
More information: http://litanyagainstfear.com/blog/2010/02/03/the-rails-module/

The culprits: 
 - app/views/layouts/application.rhtml
 - ...

Soon-to-be-deprecated ActiveRecord calls
Methods such as find(:all), find(:first), finds with conditions, and the :joins option will soon be deprecated.
More information: http://m.onkey.org/2010/1/22/active-record-query-interface

The culprits: 
 - app/views/blog_archive/_ruby_on_rails.html.erb
        - ...

Deprecated AJAX helper calls
AJAX javascript helpers have been switched to be unobtrusive and use :remote => true instead of having a seperate function to handle remote requests.
More information: http://www.themodestrubyist.com/2010/02/24/rails-3-ujs-and-csrf-meta-tags/

The culprits: 
 - app/views/blog_archive/_company.html.erb
        - ...

Deprecated ActionMailer API
You're using the old ActionMailer API to send e-mails in a controller, model, or observer.
More information: http://lindsaar.net/2010/1/26/new-actionmailer-api-in-rails-3

The culprits: 
 - app/controllers/contact_controller.rb

Old ActionMailer class API
You're using the old API in a mailer class.
More information: http://lindsaar.net/2010/1/26/new-actionmailer-api-in-rails-3

The culprits: 
 - app/models/contact_form.rb

As you can see, the upgrade check spits out a list of necessary and recommended upgrades and the corresponding *culprits*. It's also nice that the task provides documentation in the form of a link for each message. Studying the source of the plugin, I found additional examples of upgrade messages: named_scope updates, validate_on_* syntax, test_help path updates, gem bundling configuration, Rails generator API syntax updates, messaging on known broken plugins (e.g. searchlogic, cucumber, nifty-generators), and depracation on ERb helper and AJAX calls.

I went through and applied my updates, according to the checklist. Notable updates were:

Routing updates

Before

ActionController::Routing::Routes.draw do |map|
  map.root :controller => 'home', :action => 'index'
  map.connect 'contact/submit', :controller => 'contact', :action => 'submit'
  map.connect ':controller/:id'
  map.connect '*path', :controller => 'redirect' 
end

After

Endpoint::Application.routes.draw do
  root :to => 'home#index'
  match 'contact/submit' => 'contact#submit'
  match ':controller(/:id)', :action => :index
  match '*path' => 'redirect#index'
end
Introduction of a Gemfile
source 'http://rubygems.org'

gem 'rails', '3.1.0'
gem 'json'

# Gems used only for assets and not required
# in production environments by default.
group :assets do
  gem 'sass-rails', "  ~> 3.1.0"
  gem 'coffee-rails', "~> 3.1.0"
  gem 'uglifier'
end

gem 'jquery-rails'
gem 'fastercsv'
gem 'execjs'
gem 'therubyracer'
gem 'rake', '0.8.7'
Renaming rhtml files

Something that didn't come up in the rails upgrade check that is required to have a working app is renaming all rhtml files to html.erb, briefly described here.

Basic Asset Management

To get the basic app working, I moved the public/stylesheets and public/javascripts to the new app/assets directories to start. I did not move the images out of the public/ directory because several of the images in the application are referenced by blog articles.

Database-less Application

I followed the directions here combined with a bit of troubleshooting to configure a Rails 3.1 app that does not require a database.

Conclusion

The upgrade was a relatively painless process, although it still took a few hours for even the most basic application with only a handful of controllers, routes, and one mailer. My experience suggests that with a more complex application, the upgrade will take at least a few hours, if not much more. This simple app doesn't do much with remote forms and links, so I didn't spend any time upgrading the app to work with the jquery-ujs gem. Also, I obviously didn't mess around with Rails 3.1 ActiveRecord issues since the application is database-less. Both of these items may add significant overhead to the upgrade process.

I spent a significant amount of time working with the new asset pipeline and restructuring the assets, which I plan to describe in Part 2 of the upgrade. Stay tuned!

Headed out to PgWest next week

I'm gearing up to go out to San Jose to attend and speak at the PG West PostgreSQL conference in sunny San Jose. (Does anyone have directions...?)

I'm excited to again meet and mingle with more PostgreSQL experts and enthusiasts and look forward to the various talks, technical discussions, and social opportunities. My talk will be on Bucardo and many uses for it as a general tool. It'll also cover additional changes coming down the pipe in Bucardo 5.

I look forward to seeing everyone!

Bucardo, 9.1, and you!

A little bit of bad news for Bucardo fans, Greg Sabino Mullane won't be making Postgres Open due to scheduling conflicts. But not to worry, I'll be giving the "Postgres masters, other slaves" talk in the meantime in his place.

In looking over the slides, one thing that catches my eye is how quickly Bucardo is adopting PostgreSQL 9.1 features. Specifically, Unlogged Tables will be very useful in boosting performance where Bucardo stages information about changed rows for multi-database updates. I also wonder if the enhanced Serializable Snapshot Isolation would be helpful in some situations. Innovation encouraging more innovation, gotta love open source!

If I hadn't said it before, thanks to everyone that made Postgres 9.1 possible. Some of the other enhancements are just as exciting. For instance, I'm eager to see some creative uses for Writable CTE's. And it'll be very interesting to see what additional Foreign Data Wrappers pop up over time.

Now, back to packing...

OpenSSH known_hosts oddity

A new version of the excellent OpenSSH was recently released, version 5.9. As you'd expect from such widely-used mature software, there are lots of minor improvements to enjoy rather than anything too major.

But what I want to write about today is a little surprise in how ssh handles multiple cached host keys in its known_hosts files.

I had wrongly thought that ssh stopped scanning known_hosts when it hit the first hostname or IP address match, such as happens with lookups in /etc/hosts. But that isn't how it works. The sshd manual reads:

It is permissible (but not recommended) to have several lines or different host keys for the same names. This will inevitably happen when short forms of host names from different domains are put in the file. It is possible that the files contain conflicting information; authentication is accepted if valid information can be found from either file.

The "files" it refers to are the global /etc/ssh/known_hosts and the per-user ~/.ssh/known_hosts.

The surprise was that if there are multiple host key entries in ~/.ssh/known_hosts, say, for 10.0.0.1. If the first one has a non-matching host key, the ssh client tries the second one, and so on until it runs out of matching IP address entries to check. If none have a matching host key, the ssh client error reports the offending line number for the last matching IP address, but gives no indication there are earlier mismatches as well.

This is actually kind of convenient if you have scripts that simply append new host keys to the end of the known_hosts file, and it also makes sense since hostname wildcards and multiple hostnames per line are allowed. It's fine, but it isn't what I expected and is nice to know.

CSS Fixed, Static Position Toggle

In a recent Rails project, I had to implement a simple but nifty CSS trick. A request came in to give a DOM element fixed positioning, meaning as the user navigates throughout the page, the DOM element stays in one place while the rest of the page updates. This is pretty common behavior for menu bars that show up along one of the borders of a window while a user navigates throughout the site. However, this situation was a bit trickier because the menu bar that needed fixed positioning was already a few hundred pixels down the page below header and navigation content:

I came up with a nifty way of using jQuery to toggle the menu bar CSS between fixed and static positioning. The code uses jQuery's scroll event handler to adjust the CSS position setting of the menu bar as users scroll through the page. If the window scroll position is below it's original top offset, the menu has fixed positioning at the top of the window. If the window scroll position is above it's original top offset, the menu has static positioning. Here's what the code looks like:

var head_offset = jQuery('#fixed_header').offset();
jQuery(window).scroll(function() {
    if(jQuery(window).scrollTop() < head_offset.top) {
        jQuery('#fixed_header').css({ position: "static"}); 
    } else {
        jQuery('#fixed_header').css({ position: "fixed", top: "0px" }); 
    }   
}); 

And perhaps the most effective demonstration of this behavior comes in the form of a video, created with Screencast-O-Matic. I also tried capturing with Jing, which is another handy tool for quick Screenshots and Screencasts. Note that the header content has CSS adjustments for demo purposes only:

Postgres Open: One week to go!

Wow, time flies, Postgres Open is almost upon us!

I'll be there giving a talk Thursday morning on monitoring tools and techniques, and possibly helping with the Bucardo 5 replication session Friday afternoon. Sadly I'll need need to catch a flight shortly after that, so there won't be much time to explore Chicago around everything going on. But at least it'll be nice to get out to a conference again!

SQL errors in Interchange

Interchange has a little feature whereby errors in a [query] tag are reported back to the session just like form validation errors. That is, given the intentional syntax error here:

[query ... sql="select 1 from foo where 1="]

Interchange will paste the error from your database in

  $Session->{errors}{'table foo'}

That's great, but it comes with a price: now you have a potential for a page with SQL in it, which site security services like McAfee will flag as "SQL injection failures". Sometimes you just don't want your SQL failures plastered all over for the world to see.

Simple solution:

  DatabaseDefault LOG_SESSION_ERROR 0

in your Interchange configuration file, possibly constrained so it only affects production (because you'd love to see your SQL errors when you are testing, right?).

Ruby on Rails Performance Overview

Over the last few months, I've been involved in a Ruby on Rails (version 2.3) project that had a strong need to implement performance improvements using various methods. Here, I'll summarize some the methods and tools used for performance optimization on this application.

Fragment Caching

Before I started on the project, there was already a significant amount of fragment caching in use throughout the site. In it's most basic form, fragment caching wraps a cache method around existing view code:

<%= cache "product-meta-#{product.id}" %>
#insert view code
<% end %>

And Rails Sweepers are used to clear the cached fragments, which looks something like the code shown below. In our application, the Sweeper attaches cache clearing methods to object callbacks, such as after_save, after_create, before_update.

class ProductSweeper < ActionController::Caching::Sweeper
  observe Product

  def after_save(record)
    expire_fragment "product-meta-#{product.id}"
  end
end

Fragment caching is a good way to reuse small modular view components throughout the site. In this application, fragment caches tended to contain object meta data that was shown on various index list pages and single item show pages.

Page Caching

I did not initially add page caching to the application because the system has complex role management where users can have edit access at an object, class, or super level. However, later I investigated advanced techniques to leverage full page caching, described in depth here. The benefit gained here was that the application server was not hit during full page requests, and a quick AJAX request was made after the page loaded to determine user access level.

Raw SQL methods

Another performance technique I employed on this application was using raw SQL rather than use standard ActiveRecord methods to lookup association data. The application uses ActsAsTaggable, a gem that enables you to tag objects. The simplified data model looks like this, which includes a polymorphic relationship in the taggings table to the items tagged (products, categories):

In the application, the front-end required that we pull the most popular 25 tags for a specific class. Working with the objects and their associations, one might use the following code:

def self.tag_list
  b = Product.all.collect { |p| p.tag_list }.flatten
  c = b.inject({}) { |h, a| h[a] ||= 0; h[a]+=1; h } 
  c.sort_by { |k, v| v }.reverse[0..25]
end

However, this request is quite sluggish because it has to iterate through each object and it's tags. I wrote raw SQL to generate the Tag objects, which runs at least 10 times faster than using standard ActiveRecord association lookup:

def self.tag_list
  Tag.find_by_sql("SELECT ts.tag_id AS id, t.name FROM taggings ts
    JOIN tags t ON ts.tag_id = t.id
    WHERE taggable_type = 'Product'
    GROUP BY ts.tag_id, t.name
    ORDER BY COUNT(*) DESC LIMIT 25")
end

Typically, using ActiveRecord find methods and the item associations may yield more readable code and require minimal knowledge of the underlying database structure. But in this example, having an understanding of the database model and how to work with it gave a significant performance bump. This technique was also combined with fragment caching.

Rails Low Level Caching

Next up, there were several opportunities through the site to use Rails low level caching. Here's one example of a simple use of Rails low level caching, which pulls a list of products that the user has owner or creator rights to:

class User < ActiveRecord::Base 
  def products
    Rails.cache.fetch("user-products-#{self.id}") do
      self.roles
        .find(:all, :conditions => {:authorizable_type => 'Product', :name => ['owner','creator']})
        .collect(&:authorizable)
        .uniq
        .compact
        .sort_by{|a| a.updated_at}
    end
  end
end

Rails low level caching makes sense for data that's pulled throughout various actions but additional computations are applied to this data. We are unable to cache this at the page request or action level, but we can cache the data retrieved with low level caching. I also used Rails low level caching on the search index pages, which is described more in depth here.

HTML Asset related Performance

In addition to server-side optimization, I investigated several avenues of HTML asset related performance optimization:

  • Extensive use of CSS Sprites
  • Consolidation and minification of JS, CSS. Note that Rails 3.1 introduces new functionality to improve the process of serving minified and consolidated JS and CSS.
  • HTML caching, gzipping, and Expires headers

Tools Used

Throughout performance tweaking, I used the following tools:

Conclusion

There are a few Rails caching techniques that I did not use in the application, such as action caching and SQL Caching. The Rails caching overview provides a great summary of caching techniques, but does not cover Rails low level caching. Another great resources for performance optimization is Yahoo's Best Practices for Speeding Up Your Web Site, but it focuses on asset related optimization opportunities. I typically recommend pursuing optimization on both the server-side and asset related fronts.

Bucardo PostgreSQL replication to other tables with customname


Image by Flickr user Soggydan

(Don't miss the Bucardo5 talk at Postgres Open in Chicago)

Work on the next major version of Bucardo is wrapping up (version 5 is now in beta), and two new features have been added to this major version. The first, called customname, allows you to replicate to a table with a different name. This has been a feature people have been asking for a long time, and even allows you to replicate between differently named Postgres schemas. The second option, called customcols, allows you replicate to different columns on the target: not only a subset, but different column names (and types), as well as other neat tricks.

The "customname" options allows changing of the table name for one or more targets. Bucardo replicates tables from the source databases to the target databases, and all tables must have the same name and schema everywhere. With the customname feature, you can change the target table names, either globally, per database, or per sync.

We'll go through a full example here, using a stock 64-bit RedHat 6.1 EC2 box (ami-5e837b37). I find EC2 a great testing platform - not only can you try different operating systems and architectures, but (as my own personal box is very customized) it is great to start afresh from a stock configuration.

First, let's turn off SELinux, install the EPEL rpm, update the box, and install a few needed packages.

echo 0 > /selinux/enforce
wget http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm        
rpm -ivh epel-release-6-5.noarch.rpm
yum update
yum install emacs-nox perl-DBIx-Safe perl-DBD-Pg git postgresql-plperl
cpan boolean

The yum update takes a while to run, but I always feel better when things are up to date. Next, we will create a new database cluster, create the /var/run/bucardo directory that Bucardo uses to store its PIDs, adjust the ultraconservative stock pg_hba.conf file, and start Postgres up:

service postgresql initdb
mkdir /var/run/bucardo
chown postgres.postgres /var/run/bucardo
emacs /var/lib/pgsql/data/pg_hba.conf                                        
service postgresql start

For the pg_hba.conf configuration file, because we want to be able to connect to the database as the bucardo user without actually logging into that account, we will allow access using the 'md5' (password) method instead of 'ident'. But we don't want to bother creating a password for the postgres user, we will still allow those connections via ident. The relevant lines in the pg_hba.conf will end up like this:

# TYPE   DATABASE   USER       METHOD
local    all        postgres   ident                          
local    all        all        md5                          

At this point, we (as the postgres user) download and install Bucardo itself:

su - postgres
git clone git://bucardo.org/bucardo.git
cd bucardo
perl Makefile.PL
make
sudo make install                                      
bucardo install# (enter 'p' and keep the default values)

We are now ready to start testing out the new customname feature. First we will need some data to replicate! For this demo we are going to use one of the handy sample datasets from the dbsamples project. The one we will use has a few small tables with information about towns in France. Note that the tarball does not (sadly) contain a top-level directory, so we have to create one ourselves. We will then create three identical databases holding the data from that file.

wget http://pgfoundry.org/frs/download.php/935/french-towns-communes-francaises-1.0.tar.gz                
mkdir frenchtowns
cd frenchtowns
tar xvfz ../french-towns-communes-francaises-1.0.tar.gz
psql -c 'create database french1'
psql french1 -q -f french-towns-communes-francaises.sql
psql -c 'create database french2 template french1'
psql -c 'create database french3 template french1'
psql -c 'create database french4 template french1'

Bucardo is installed but does not know what to do yet, so we will teach Bucardo about each of the databases, and add in all the tables, grouping then into a herd in the process. Finally, we create a sync in which french1 and french2 are both source (master) databases, and french3 and french4 will be target (slave) databases.

bucardo add db f1 db=french1
bucardo add db f2 db=french2
bucardo add db f3 db=french3
bucardo add db f4 db=french4
bucardo add all tables herd=fherd
bucardo add sync wildstar herd=fherd dbs=f1=source,f2=source,f3=target,f4=target

Before starting it up, I usually raise the debug level, as it gives a much clearer picture of what is going on in the logs. It does make the logs a lot more crowded, so it is not recommended for production use:

echo log_level=DEBUG >> ~/.bucardorc

Next, we start Bucardo up and make sure everything is working as it should. Scanning the log.bucardo file that is generated is a great way to do this:

bucardo start
sleep 3
tail log.bucardo

If all goes well, you should see something very similar to this in the last lines of your log.bucardo file:

(972) [Sat Sep  3 16:18:54 2011] KID Total time for sync "wildstar" (0 rows): 0.05 seconds
(966) [Sat Sep  3 16:18:55 2011] CTL Got NOTICE ctl_syncdone_wildstar from 973 (line 1624)
(966) [Sat Sep  3 16:18:55 2011] CTL Kid 973 has reported that sync wildstar is done
(966) [Sat Sep  3 16:18:55 2011] CTL Sending NOTIFY "syncdone_wildstar" (line 1709)
(954) [Sat Sep  3 16:18:55 2011] MCP Got NOTICE syncdone_wildstar from 967 (line 749)
(954) [Sat Sep  3 16:18:55 2011] MCP Sync wildstar has finished
(954) [Sat Sep  3 16:18:55 2011] MCP Sending NOTIFY "syncdone_wildstar" (line 812)
(954) [Sat Sep  3 16:18:56 2011] MCP Got NOTICE syncdone_wildstar from 957 (Bucardo DB) (line 749)

From the above, we see that a KID finished running the sync we created, without finding any changed rows to replicate. Then there is some chatter between the different Bucardo processes. Now to test out the customname feature. We'll rename one of the tables, tell Bucardo about the change, reload the sync, and verify that all is still being replicated.

psql french3 -c 'ALTER TABLE regions RENAME TO tesla'
bucardo add customname regions tesla db=f3
bucardo reload wildstar
psql french3 -c 'truncate table tesla cascade'
TRUNCATE
psql french3 -t -c 'select count(*) from tesla'
0
psql french1 -c 'update regions set name=name'
UPDATE 26
psql french3 -t -c 'select count(*) from tesla'
26

In the above, the update on the regions table inthe french1 database calls a trigger that notifies Bucardo that some rows have changed; Bucardo then has a KID copy the rows from the source databases french1 to the other source database french2, as well as the targets french3 and french4. The final internal DELETE and COPY that it performs is done on database french3 to the tesla table rather than the regions table.

The customname feature cannot be used to change the tables in a source database, as they must all be the same (for obvious reasons). We can, however, specify that a different schema be used for a target, as well as a different table. This only applies to Postgres targets, as other database types (e.g. MySQL) do not use schemas. Let's see that in action:

psql french4 -c 'create schema banana'
psql french4 -c 'alter table regions set schema banana'
psql french4 -c 'truncate table banana.regions cascade'
bucardo add customname regions banana.regions db=f4
bucardo reload wildstar
psql french4 -t -c 'select count(*) from banana.regions'
0
psql french2 -c 'update regions set name=name'
UPDATE 26
psql french4 -t -c 'select count(*) from banana.regions'
26

As before, the update on a source causes the changes to propagate to the other source database, as well as both targets. Note that the ALTER TABLE also mutated the associated sequence for the table, so there will be warnings in Bucardo's logs about the DEFAULT values for the primary keys in the regions' tables being different. Since this post is getting long, I will save the discussion of customcols for another day.