End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Saving time with generate_series()

I was giving a presentation once on various SQL constructs, and, borrowing an analogy I'd seen elsewhere, described PostgreSQL's generate_series() function as something you might use in places where, in some other language, you'd use a FOR loop. One attendee asked, "So, why would you ever want a FOR loop in a SQL query?" A fair question, and one that I answered using examples later in the presentation. Another such example showed up recently on a client's system where the ORM was trying to be helpful, and chose a really bad query to do it.

The application in question was trying to display a list of records, and allow the user to search through them, modify them, filter them, etc. Since the ORM knew users might filter on a date-based field, it wanted to present a list of years containing valid records. So it did this:

SELECT DISTINCT DATE_TRUNC('year', some_date_field) FROM some_table;

In fairness to the ORM, this query wouldn't be so bad if some_table only had a few hundred or thousand rows. But in our case it has several tens of millions. This query results in a sequential scan of each of those records, in order to build a list of, as it turns out, about fifty total years. There must be a better way...

The better way we chose turns out to be, in essence, this: find the years of the maximum and minimum date values in the date field, construct a list of all years between the minimum and maximum, inclusive, and see which ones exist in the table. This date field is indexed, so finding its maximum and minimum is very fast:

SELECT
    DATE_TRUNC('year', MIN(some_date_field)) AS mymin,
    DATE_TRUNC('year', MAX(some_date_field)) AS mymax
FROM some_table

Here's where the FOR loop idea comes in, though it's probably better described as an "iterator" rather than a FOR loop specifically: for each year between mymin and mymax inclusive, I want a database row. The analogy may not hold terribly well, but the technique is very useful, because it will create a list of all the possible years I might be interested in, and it will do it with just two scans of the some_date_field index, rather than a sequential scan of millions of rows.

SELECT
    generate_series(mymin::INTEGER, mymax::INTEGER) AS yearnum
FROM (
    SELECT
        DATE_TRUNC('year', MIN(some_date_field)) AS mymin,
        DATE_TRUNC('year', MAX(some_date_field)) AS mymax
    FROM some_table
) minmax_tbl

Now I simply have to convert these values to years, and see which ones exist in the underlying table:

SELECT
    yearbegin::timestamptz
FROM
    (
        SELECT
            yearnum * INTERVAL '1 year' + '0000-01-01'::date AS yearbegin
        FROM (
            SELECT
                generate_series(mymin::INTEGER, mymax::INTEGER) AS yearnum
            FROM (
                SELECT
                    DATE_TRUNC('year', MIN(some_date_field)) AS mymin,
                    DATE_TRUNC('year', MAX(some_date_field)) AS mymax
                FROM some_table
        ) yearnum_tbl
    ) beginend_tbl
WHERE
    EXISTS (
        SELECT 1 FROM some_table
        WHERE
            some_date_field BETWEEN yearbegin AND yearbegin + INTERVAL '1 year'
    )
ORDER BY yearbegin ASC
;

As expected, this probes the some_date_field index twice, to get the maximum and minimum date values, and then once for each year between those values. Because of some strangely-dated data in there, that means nearly 10,000 index probes, but that's still much faster than scanning the entire table.

Postgres Bug Tracking - Help Wanted!

Once again there is talk in the Postgres community about adopting the use of a bug tracker. The latest thread, on pgsql-hackers, was started by someone asking about the status of their patch. Or rather, asking an even better meta-question about how one finds out the status of a PostgreSQL bug report or patch. Sadly, the answer is that there is no standard way, other than sending emails until someone replies one way or another. The current process works something like this:

  1. Someone finds a bug
  2. They send an email to pgsql-bugs@postgresql.org OR they use the web form, which grabs a sequential number and mails the report to pgsql-bugs@postgresql.org. Nothing else is done/stored, it just sends the email.
  3. Someone replies about the bug OR nobody replies about the bug.
  4. After a fix is found, which may involve some emails on other mailing lists, someone replies that the bug is fixed on the original thread. Maybe.

As you can see, there is some room for improvement there. Some of the most major and glaring holes in the current system:

  • No way to search previous / existing bugs
  • No way to tell the status of a bug
  • No way to categorize and group bugs (per version, per platform, per component, per severity, etc.)
  • No way to know who is working on a bug
  • No way to prevent things from slipping through the cracks

Luckily, the above problems have been solved for many many years now but a wide variety of bug tracking software. There have traditionally been three problems to getting a bug tracker working for the Postgres project:

Inertia

The current system is, in a very literal sense, "good enough", so it's hard to impose the inevitable short-term pain of a new system when there always seem to be more pressing matters to attend to.

Doesn't Make Julienne Fries

Everyone wants a different set of features, and getting all the hackers involved to agree on even a simple subset of desired features is pretty difficult. This is sort of similar to the crusade by myself and others to get git as the replacement version control system; there were some strong voices for competing systems (e.g. mercurial).

Who Will Put the Bell on the Cat?

Everyone talks about the problem, and there have even been some attempts over the years to implement some sort of system, but the problem remains that setting up such a system, getting it smoothly integrated into the project's work flow, and then maintaining said system is a non-trivial task. Especially when you can't be assured of buy-in from some of the major players.

I'm hopeful that the recent thread indicates a slight shift of late in global acceptance of the need for a bug tracking system. The question is, which one, and who is going to take the time to write something? I'm really hoping someone who has been lurking in the background will step up and help create something wonderful (okay, we can start with 'decent' :) Perhaps even someone with experience setting up bug tracking systems. Certainly Postgres must be one of the last major open source projects without a bug tracker; there is plenty of hard-won experience out there to be learned from. It would also be ideal if the person or persons was *not* a Postgres hacker of any sort, as taking the time to build and maintain this system would definitely take time away from their other hacking tasks. On the other hand, one could argue that a bug tracker is a vital piece of project infrastructure that is potentially as important as any other work that goes on. I certainly think so.

End Point at Internet Retailer 2011

We're excited to announce that End Point will be in full force at Internet Retailer 2011. A team of six (Rick, Ben, Jon, Ron, Carl and myself) will be managing a booth during the 3 day Conference and Exhibition in San Diego from June 14th-17th.

We're hoping to reach out to future clients to discuss our services including ecommerce develpoment and support, database development and support, systems architecture, hosting, and Liquid Galaxy build, maintenance and support. We'll also have a Liquid Galaxy display to show off as part of our exhibit. We have a couple of conference passes so hopefully we'll attend a few sessions and share some details about those sessions here on our blog!

Liquid Galaxy

Please come by our booth at the conference to meet us or to say hi! We're excited to talk to you about what we can do to help your business succeed.

Spree Performance Benchmarking

I see a lot of questions regarding Spree performance in the spree-user group, but they are rarely answered with metrics. I put together a quick script using the generic benchmark tool ab to review some data. Obviously, the answer to how well a site performs and scales is highly dependent on the host and the consumption of the web application, so the data here needs to be taken with a grain of salt. Another thing to note is that only two of the following use cases are running on Rails 3.0 — many of our current Spree clients are on Spree 0.11.2 or older. I also included one non-Spree Rails ecommerce application, in addition to a few non-Rails applications for comparison. All of the tests were run from my home network, so in theory there shouldn't be bias on performance tests for sites running on End Point servers.

ab -n 100
-c 2 homepage -c 20 homepage -c 2
product page
-c 20
product page
Client #1
Spree: 0.11.2
Hosting: 4 cores, 512 GB RAM
DB: MySQL
# Products: <100
7.49 24.75 6.49 19.87 Requests per second
266.889808.041307.9971006.552 Time per request (ms)
Client #2
Spree 0.11.2
Hosting: Engineyard, medium instance
DB: MySQL
# Products: 100s
5.3220.285.3618.03 Requests per second
375.713986.309373.2891109.524 Time per request (ms)
Client #3
Spree: 0.9.0
Hosting: 4 cores, 1 GB RAM
DB: PostgreSQL
# Products: <100
4.9125.391.986.54 Requests per second
407.135787.7821011.8753060.062 Time per request (ms)
(Former) Client #4
Spree: 0.11.2
Hosting: Unknown
DB: PostgreSQL
# Products: >5000
20.698.8410.1519.28 Requests per second
96.6732262.105196.9961037.146 Time per request (ms)
Client #5
Spree: 0.11.2
Hosting: EngineYard, small instance
DB: MySQL
# Products: 1
12.2816.23N/AN/A Requests per second
162.9091231.945N/AN/A Time per request (ms)
Client #6
Spree: 0.40
Hosting: 4 cores, 1 GB RAM
DB: MySQL
# Products: 50-100
3.618.932.963.06 Requests per second
553.5692240.657675.3066539.433 Time per request (ms)
SpreeDemo
Spree: Edge
Hosting: Heroku, 2 dynos
DB: Unknown
# Products: 100s
8.1712.794.75.48 Requests per second
244.8311563.642425.273652.927 Time per request (ms)
Client #7
*custom Rails ecommerce app
Hosting: 1.0 GB RAM
DB: MySQL
# Products: 1000s
5.4329.84.4523.14 Requests per second
368.409671.082448.962864.24 Time per request (ms)
Interchange Demo
Hosting: 4 cores, 2 GB RAM
DB: MySQL
# Products: >500
7.4155.277.513.93 Requests per second
269.942361.875266.4921435.51 Time per request (ms)
Client #8
*PHP site,
serves fully cached pages
with nginx with no app server or db hits
Hosting: 4 cores, 4 GB RAM
10.8130.546.059.87 Requests per second
184.994654.858330.7272027.092 Time per request (ms)
Magento Demo
Hosting: Unknown
DB: Unknown
# Products: 100s
4.2644.852.6836.29 Requests per second
469.831445.931745.472551.11 Time per request (ms)

Here's the same data in graphical form:

Requests per Second




Time Per Request (ms)

We expect to see high performance on some of the sites with significant performance optimization. On smaller VPS, we expect to see the the server choke with higher concurrency.

Raising open file descriptor limits for Dovecot and nginx

Recently we've needed to increase some limits in two excellent open-source servers: Dovecot, for IMAP and POP email service, and nginx, for HTTP/HTTPS web service. These are running on different servers, both using Red Hat Enterprise Linux 5.

First, let's look at Dovecot. We have a somewhat busy mail server and as it grew busier, it occasionally hit connection limits when the server itself still has plenty of available capacity.

Raising the number of processes in Dovecot is easy. Edit /etc/dovecot.conf and change from the prior (now commented-out) limits to the new limits:

#login_max_processes_count = 128
login_max_processes_count = 512

and later in the file:

#max_mail_processes = 512
max_mail_processes = 2048

However, then Dovecot won't start at all due to a shortage of available file descriptors. There are various ways to change that, including munging the init scripts, changing the system defaults, etc. The most standard and non-interventive way to do so with this RHEL 5 Dovecot RPM package is to edit /etc/sysconfig/dovecot and add:

ulimit -n 131072

That sets the shell's maximum number of open file descriptors allowed in the init script /etc/init.d/dovecot before the Dovecot daemon is run. The default ulimit -n is 1024, so we here increased it to an arbitrarily big enough number (2 * 64K) to handle the new limits and then some.

Similarly, on another server we needed to increase the number of connections allowed per nginx worker process from the default 1024 for a very high-capacity HTTP caching proxy server.

We edited /etc/nginx/nginx.conf and changed the events block like this:

events {
    worker_connections  65536;
}

But then nginx wouldn't start at all. The same problem and same solution applied. We edited /etc/sysconfig/nginx to add:

ulimit -n 131072

And now nginx has enough file descriptors to start.

Changing the limits this way also has the benefit of surviving an upgrade, because /etc/sysconfig files are marked in RPM as configuration files that should have any changes preserved.

Coding Tips from RailsConf 2011

A couple of the sessions I attended on Day 1 of RailsConf 2011 were along the lines of how to write good Rails code: Keeping Rails on the Tracks by Mikel Lindsaar and Confident Code by Avdi Grimm. Here's a mishmash summary from these talks. Although the talks didn't focus on Ruby and Rails techniques, both talks had plenty of examples and tips for writing maintainable code that apply to my world of consulting and development.

In the first talk, Mikel talked about what he meant by keeping Rails on the Tracks and the balance that we might often experience between just working code and sexy code. Maintainable code lands in the middle there somewhere. And he briefly touched on the fact that budget matters in writing maintainable code, trade-offs are part of life, and that you are selling a solution and not code, which means that clients don't really care about the technology as long as it's maintainable.

Mikel's first pro tip for building a maintainable Rails app is to treat deployment as a first class citizen by having trivial deployment that takes advantage of existing tools like Chef and Puppet. Using Bundler will help you avoid additional pain, but be careful of avoiding locking on git repos that you don't own since you can't control these other repos. This really speaks to End Point's camps project — it's so easy to deploy changes on many of our campified Perl-based Interchange sites, but more difficult for clients that aren't on camps. The bottom line is that having trivial deployment saves time & money.

Mikel also mentioned several performance tips that make clients happy, listed below. I wasn't sure how these recommendations fit into the talk on how to keep Rails on the tracks by writing maintainable code, but nonetheless here they are:

  • combining JS, CSS, CSS sprites, utilizing the Rails 3.1 asset pipeline
  • caching optimization: fragment caching, action caching, page caching
  • avoid a bloated session, and avoid storing objects in the session
  • push things out to the browser if possible to minimize data and web-app load

Another topic that Mikel touched on was how being smart can be stupid. He recommends to not use before and after filters for object instantiation and to minimize their use to state modifications such as authentication, authorization or related to the user session. Mikel mentioned that while meta programming is sexy and has its place, that that place is not in your Rails app because it's harder for other developers and even yourself to understand what's automagically happening when you look at the code 3 months after you wrote it.

Mikel mentioned a few examples of using the right tools for the job. He discussed two examples where using simple SQL reduced a Ruby report run-time from 45 minutes down to 15 seconds and a implementing a PostgreSQL COPY statement that completed a data migration in 74 minutes down from 150 hours. Mikel also noted that Cucumber is not unit testing, so just write unit tests!

Confident Code

Next up, Avdi gave a nice talk about writing confident code and explained the standard narrative of a method or function:

When gathering input, Avdi recommends that developers employ strategies to deal with uncertain inputs such as coercion (e.g. to_s, to_i), rejection (guard clause or something more complex to return method), or ignoring invalid inputs. He also talked about his approach to having a zero tolerance for nil in Ruby because it's overused and can be indicative of too many things such as an error, missing data, the default behavior, and an uninitialized variable.

In part 2 of the narrative, perform work, I liked Avdi's comments about how conditionals should be reserved for business logic. For example, consider the following two conditionals:

A) if post.published?  ... end
B) if post ... end

The first line of code most likely represents business logic, while the second line may not. We want to minimize occurrences of instances of the second where conditionals are not representative of business logic. Avdi also touched on writing with the confident styles of chaining and iterative style such as that used in jQuery, where a jQuery selector does nothing when empty.

In part 3 of the narrative, deliver results, Avdi suggested to employ a style to raise a special case or a null object rather than a meaningless nil if there are no results. Finally, while handling failures, Avdi suggested to write the happy path first and have a visual divide in the code between the happy and unhappy paths.

Conclusion

My takeaways from the talks are:

  • write code for people first (even yourself), then computers. This seemed to be a recurring recommendation at RailsConf.
  • writing code is communicating the business logic. make sure it's clear, componentized, and each method has a single responsibility.
  • like photography, sometimes the art is more about what you leave out rather than what you include.

While these takeaways aren't novel, I did like the insight into how both speakers approach development.

Only Try This At Home

Taken by Josh 6 years to the day before the release of 9.1 beta 1
Taken by Josh 6 years to the day before the release of 9.1 beta 1

For the record, 9.1 is gearing up to be an awesome release. I was tinkering and testing PostgreSQL 9.1 Beta 1 (... You are beta testing, too, right?) ... and some of the new PL/Python features caught my eye. These are minor among all the really cool high profile features, to be sure. But it made me think back to a little bit of experimental code written some time ago, and how these couple language additions could make a big difference.

For one reason or another I'd just hit the top level postgresql.org website, and suddenly realized just how many Postgres databases it took to put together what I was seeing on the screen. Not only does it power the content database that generated the page, of course, but even the lookup of the .org went through Afilias and their Postgres-backed domain service. It's a pity the DBMS couldn't act as the middle layer between those.

Or could it?

That's a shortened form of it just for demonstration purposes (the original one had things like a table browser) ... but it works. For example, on this test 9.1 install, hit http://localhost:8000/public/webtest and the following table appears:

generate_serieslhrnd
100.548577250913
211.70926172473
311.24841631576
(etc)......

Note the use of two specific 9.1 features, though. The plpy object contains nice query building helper utilities like quote_ident that you may be familiar with in other languages. But this also makes use of subtransactions, which helps recover from db errors. That's important here, as something like a typo in a table name will generate an error from Postgres and without that in place the database will end the transaction and ignore any subsequent commands the function tries to run.

But with that in place, the page shows the 404 error, and picks up where it left off with subsequent requests:

Error code 404.

Message: Table not found.

By the way, if it's not clear by now don't take this anywhere near a production database, if not any other reason that a transaction will be held open as long as that function runs. That will hold back all the nice maintenance stuff that keeps things running efficiently. Still, I think it helps show off what just a handful of lines of code can do in a powerful language like PL/Python. I'm sure with the right module PL/PerlU could do something very similar. But even more I think it shows how Postgres is growing and innovating by leaps and bounds, seemingly every day!

JavaScript and APIs at RailsConf 2011

A big trend of RailsConf 2011 has been the presence of JavaScript, whether it be talk of CoffeeScript, node.js, Sproutcore, backbone.js, etc. I attended Yehuda Katz's talk on Building Rails Apps for the Rich Client on Day 4 of the conference.

Part 1: Rails versus Sinatra Smackdown

Yehuda gave a two part talk about developing API-heavy applications. The first bit addressed why to develop in Rails rather than Sinatra if your application is API-heavy and doesn't appear to be utilizing valuable parts of Rails? This is fairly applicable to a couple of Sinatra projects that I've done at End Point — I like Sinatra a lot, but at some point you begin to replicate parts of Rails. Yehuda explained that because it's easy to develop web applciations efficiently using Rails conventions, developers can become forgetful/ignorant of the underlying functionality of Rails that doesn't ship with something like Sinatra, much like how working with an ORM can breed developers who aren't familiar with the underlying database interactions. The checklist for things that Rails manages we might forget about includes:

  • ActionDispatch internals
  • Session deserialization
  • Browser standards mode
  • Cookie abstraction and management
  • Deserialization of content types like JSON, XML
  • Reloading (by comparison Sinatra doesn't reload automagically unless you use a tool like shotgun)
  • Handling IP Spoofing
  • Routing / Complex Routing
  • Browser Caching (ETags, Last-Modified)
  • Content negotiation to automatically support different MIMEs
  • ActionMailer

This list is is a nice checklist of items you might review when you are considering building in Sinatra to be reminded of things that won't necessarily be trivial to implement.

Part 2: Consistent APIs

The second part of Yehuda's talk covered his experience with building client rich applications with APIs. His general observation is that while Rails follows universal conventions like the ActiveRecord convention that requires foreign keys to be named {table_name}_id, there isn't a convention for APIs. While there has been decent support for building APIs in Rails since 2006, there's been missing documentation on what to generate which has producted APIs with limited consistency. He gave the following pro tips for developing an API:

  • Return a JSON object with one or more keys, that is iterable. Rather than returning
    { product_title: "Some Awesome Product." }
    
    return something that has one or more keys that is iterable, such as:
    {
        "products": [
            {
                id: 1,
                title: "Some Awesome Product."
            }
        ]
    }
    
  • Avoid nested resources like /users/1/orders. Instead, always make the request do a GET on /orders or a PUT on /orders/1. This will help maintain consistency (and maintainability) with a simple asset reqeusts.
  • In general, create a convention and follow conventions. Essentially, apply the principles of Rails here to produce maintainable and consistent code.

Finally, Yehuda presented the bulk_api gem to address the use case where you need to get a set of items, but want to avoid making single requests for multiple items and avoid getting all items and parse through them on the client side. An ecommerce example of this might be applicable in inventory management: Let's say you have ~30 unrelated listed inventory units that need to be updated to an on_hand state. From the admin screen, you may click on checkboxes next to those 30 items and then "Update". Rather than sending a single request to update each item, the bulk_api gem will make a single request to update the inventory unit status for all of the objects. Similarly, you can make a GET request to the bulk api to retrieve a set of objects, such as looking at a set of products on page 2 of a product listing. Yehuda went into more details on authentication and authorization with this gem, so I'd recommend reading the documentation here.

Conclusion

Yehuda concludes that Rails is great for JSON APIs and that we should continue to be building abstractions similar to ActiveRecord abstractions that follow conventions in a sensible way. Hooray for convention over configuration.

Sass at RailsConf 2011

With the inclusion of the Scss gem in Rails 3.1, RailsConf is a nice time to get a refresher on Sass/Scss functionality. Sass defines itself as syntactically awesome stylesheets, or a CSS meta language built to provide more powerful functionality to manipulate website appearances with efficiency and elegance. Note that Sass has two syntaxes and the examples presented in this article use the newer Scss syntax. Around the time of RailsConf two years ago, Sass was a included in Spree, an open-source Ruby on Rails ecommerce framework that End Point supports. At the time, I was skeptical about Sass inclusion in Spree because it wasn't being leveraged to it's full potential and had hopes of taking advantage of Sass, but a few months later it was removed from the core. Since then, I haven't worked with Sass on other projects but hope to do so moving forward after being reminded of it's features and of the fact that it will be included in Rails 3.1 as a default. I attended Chris Eppstein's talk on Sass and explain a few features related to real-life use cases of CSS manipulation.

Variables

While working on a new feature, your client says, "I want this to be the same red that we use all over the site." This is exactly what I experienced while working on Paper Source to build out new Quickview/popup functionality shown in the image below.

Sass variables can be defined and then included in various selectors. Need to change the styling for all those selectors that use this variable? Just change the variable instead of grepping through the code for the color and different variations of the color. Here's an example of what variable definition and use might look like:

$red_error: #33FF00;
.error {
    color: $red_error;
}
.qty_alert {
    color: $red_error;
}

Nesting

Instead of writing code with repeated selectors, take full advantage of nesting in Sass. Below is a good example of a repeated styling selector I use for Paper Source's custom wedding invitation feature:

.editor_box {}
  .editor_box label { font-size: 11px; display: block; padding: 2px 0px; font-style: italic; }
  .editor_box label a { font-size: 9px; color: #82B3CC; }
  .editor_box .fonts { float: left; width: 176px; margin: 0px 5px 5px 0px; }
    .editor_box fonts select { width: 176px; margin: 0px; }

In Sass, this might would look like:

.editor_box {
    label {
        font-size: 11px;
        display: block;
        padding: 2px 0px;
        font-style: italic;
        a { 
            font-size: 9px;
            color: #82B3CC;
        }
    }
    .fonts { 
        float: left;
        width: 176px;
        margin: 0px 5px 5px 0px;
        select {
            width: 176px; margin: 0px;
        }
    }
}

While the nested functionality doesn't produce less lines of code, it does do a better job of following the DRY principle and is more readable in Sass form.

Mixins

Next, while building out a new feature, a client says, "I want you to build a product thumbnail page that has equivalent styling to our other thumbnail pages." See the thumbnails in the image below that share similar styling on multiple product navigation pages through Paper Source's site.

Mixins are reusable sections of CSS that can be included in other selectors to reduce duplicate property and value definitions. A simple example, shown below, includes the product thumbnail mixin in two distinct classes.

@mixin product_thumbnail {
    a { 
        color: $blue;
        text-decoration: none;
        &:hover {
            color: $red;
        }
    }
    img { padding-top: 10px; }
}
.category_products {
    @include product_thumbnail;
}
.special_products {
    @include product_thumbnail;
}

Transformations

Next, while building out another(!) new feature, the client comments, "I want this to be a color close to the other one, but a little different. Can you try what you think looks good?" The image below is a screen snippet with a jQuery-based accordion with different colors representing open and closed sections.

Transformations are calculations that can be applied to values such as color manipulation like saturate, desaturate, lighten, darken, make greyscale. Sass might look like this in the jQuery accordion scenario to show a lightened version of blue on inactive accordion regions:

$blue: #99CCCC;
.active {
    background-color: $blue;
}
.inactive {
    background-color: lighten($blue, 25%);
}

Importing Stylesheets

From a performance perspective, it's ideal to have a single compiled CSS file included on each page. This can be difficult for maintainability as you try to manage an extremely large stylesheet. The CSS directive @import can import additional stylesheets, but these imported files are divided in multiple HTTP requests. Sass's approach allows you to include stylesheets with rules and other Sass functionality, where a single file will be created at compile time. In the Paper Source example, we could do the following to include styles for thumbnails on various pages:

_product_thumbnails.scss
@mixin product_thumbnail {
    a { 
        color: $blue;
        text-decoration: none;
        &:hover { color: $red; }
    }
    img { padding-top: 10px; }
}

category.scss
@import "product_thumbnails"
.category_products {
    @include product_thumbnail;
}

@import "product_thumbnails"
landing_page.scss
.special_products {
    @include product_thumbnail;
}

Check out the Sass documentation to read more about Sass and it's features, including abstractions, calculations, and selector inheritence covered by Chris.

CSS Sprites with Compass

Compass is an additional gem that can be installed on top of Sass. In my opinion, the best feature of Compass mentioned in the talk was automated CSS spriting. CSS sprites is a technique where an aggregate of images is served as one image and CSS is used to show only a portion of the image in or as the DOM element. I've built a few different scripts in Ruby and Perl using ImageMagick that automatically build sprites, and was pleasantly surprised to hear that there is a feature in Compass that handles this. With Compass installed, CSS sprites in Sass might look like the code below, where wrapper elements are automagically compiled into a single sprited image and CSS rules are defined.

@import "wrapper_elements/*.png";
$wrapper-sprite-dimensions: true;
@include all-wrapper-sprites;

Conclusion

Admittedly, the examples shown in this blog article come from a site that runs on Perl-based Interchange, but I used these examples because I can distinctly remember each of these use cases. It might not be quite as easy to include Sass here with Interchange as it will be in Rails 3.1, where Scss/Sass is included as a new default.

Rails 3 at RailsConf 2011

A keynote by DHH kicked off Day 2 of RailsConf, where much of his talk was spent discussing new asset behavior in Rails 3.1. Here's a run down of a few topics I found worth noting:

Asset Pipeline

DHH started by explaining how although a Rails application has lovely organization in terms of Ruby files, the assets (stylesheets, images, and JavaScripts) have become a junk drawer where junk continues to pile in. Once there's more than a handful of files, the broken window theory applies and no one tries to maintain organization of those assets. This gets nasty, like a honey badger.

With Rails 3.1, the asset pipeline addresses the junk drawer. Assets directories (images, stylesheets, and JavaScripts) are now created in the app, lib, and vendor assets directories as they pertain to the main app, plugin dependencies, or introduce new library dependencies like a jquery calendar date select plugin. There's also the introducton of appending to config.assets.paths, which allows you to add new directories that store these assets in arbitrary directories. The new asset pipeline allows you to store these assets in different organization, which encourages JavaScript files to be stored based on their level of abstraction, and the asset pipeline combined with Bundler enables you to track the jquery version with a jquery-rails gem yielding better maintenance.

New Defaults

Rails 3.1 now assumes the default of CoffeeScript and scss (Sass). Jason discussed a Sass talk he attended yesterday at BohConf which includes things like nesting to reduce duplicate code and variables to improve maintainability. I haven't worked with CoffeeScript much, so I'll just link to the CoffeeScript documentation and possibly attend a CoffeeScript talk tomorrow. The argument between setting defaults and setting no defaults was revisited, and defaults won. The new defaults use Bundler to include Sass and coffee-script in the Gemfile:

gem 'sass'
gem 'coffee-script'

And these can simply be commented out of the dependency list if desired. In my case, if I were developing a Rails app tomorrow with a limited budget, I might choose to use Sass since I've worked with it before, but pass on CoffeeScript until I learned more and felt confident working with it.

Scalability and Compiling

Another question that comes up with these new defaults is the scalability and compilability. A new rake task is introduced:

rake assets:precompile

The rake task goes through the load path to precompile application JavaScript and CSS into application-*md5_hash*.js or application-*md5_hash*.css and copy over the images to the application public directory. This new file based method ensures that users will request the correct application file in addition to keeping the older compiled files around. Finally, compression tools are built straight into Rails, uglifier for JavaScript compression and scss for CSS compression. There is no penalty to writing comments or white-space rich code with these compression tools built in.


We need a photo break. A Honey Badger.

The second talk I attended was "SOLID Design Principles Behind the Rails 3 Refactoring" by José Valim, a member of the Rails core team.

Single Responsibility Principle

Jose spent the most time talking about the single responsibility principle, or that a class should have one and only one purpose. José discussed the evolution of the ActionView::Base that was responsible for tracking details, finding templates, compling templates, and rendering to gradually be divided into the following components and responsibilities in Rails 3:

  • View Path: holds resolvers
  • Resolver: finds template. The resolvers abstraction no longer restricts templates to the filesystem (can be anywhere in the app, web service, or even database) which simplifies testing and therefore improves maintainability.
  • Lookup Context: tracks details, lookup context object is passed to the view.
  • AV::Renderer: renders templates
  • ActionView::Base: renders context

By applying the single responsibility principle to the view rendering functionality in Rails, modularization now allows us to extend or override individual points of the process (such as grabbing a template from a CMS-driven database, or passing a different lookup context object to the view) and ensure maintainability by enabling more testable code.

José talked about the other principles, but some pertain to static languages more so than Ruby as the book was originally written with static languages in mind. These included:

Since I missed a few points here and there, feel free to check out the conference keynote videos here and I'll add a link to José's talk when it becomes available. Undoubtedly, Rails 3.0 and related Rails 3.0 topics will continue to be a highlight of the conference and I look forward to sharing more!

RailsConf 2011 - Day One

Today was the first official day of RailsConf 2011. As with most technical conferences, this one spent the first day with tutorials and workshops. For those of us without paid access to the tutorial sessions, the BohConf was a nice way to spend our first day of the four-day event.

BohConf is described as the "unconference" of RailsConf. It's a loosely organized collection of presentations, mini-hackathons and barcamp-style meetings. I spent the first half of Monday at the BohConf. Of particular interest to me was Chris Eppstein and Scott Davis' introduction to Sass and Compass. I've dabbled with Sass in the past but only recently learned of Compass.

Sass is a great way to construct your CSS without the tedious duplication that's typical of most modern spreadsheets. Introducing programming features like variables, inheritance and nested blocks, Sass makes it easy to keep your source material concise and logical. Once your source declarations are ready, compile your production spreadsheets with Sass or Compass.

Compass is effectively a framework for easy construction and deployment of spreadsheets using Sass. To hear Scott describe it, "Compass is to Sass as Rails is to Ruby". Together they're a very attractive combination for the Ruby developer who also dabbles in design (and who doesn't these days?). Truth be told, while I'm very impressed with the capabilities of Sass, I worry about the trend to re-introduce logic and presentation. My mom raised me to abstract the presentation layer for ease on graphic designers, and that rule has suited me well to date. Time will tell.

In the afternoon I wandered into a sponsored workshop from VMware. Dave McCrory and Dekel Tankel led a demonstration of their new CloudFoundry service. A nice reward for attending was getting instant approval of your CloudFoundry.com beta registration. Although I felt mildly guilty for it afterwards, I took advantage of this opportunity to get an extra beta account (my personal request had been approved a week earlier).

Dave introduced everyone to the CloudFoundry beta offering and discussed a vague roadmap for the open source project and their own commercial VM product slated for early 2012. They emphasized that the CloudFoundry core project will remain open source, and that interested parties can fork it on github, hack on the code, and deploy their own private "micro-clouds". Dave even hinted that at least one startup has based their PHP PaaS service on CloudFoundry.

Once all the attendees had received their beta accounts, Dekel walked us through the installation and basic command-line usage of the vmc utility. I was able to immediately vmc login with my Beta account credentials and change my password with vmc passwd. I should note that before logging in, I had to choose my target server with vmc target api.cloudfoundry.com. Why is this important? This will allow developers to easily switch between targeted environments. For example, I could install CloudFoundry on my workstation or development server and "target" it as my development or staging environment. Once my tests pass, I can quickly switch targets and push the changes to production.

They had us follow along by recreating a sample Ruby application designed to check if one Twitter account follows another. The examples were simple and easy to follow. Once we had our Gemfile, controller and views in place, we had to bundle package all of the dependencies. Once this was complete, a quick vmc push myappname and we were live!

Unfortunately, most (if not all) of us encountered a gotcha with this example. Because all of our instantiated applications reside behind the same IP address on CloudFoundry's network, we quickly hit Twitter's API quota. I'm not sure if this will be a problem once CloudFoundry officially launches, but it's something to keep in mind. And while it was useful to vmc logs myappname to debug this problem, an astute attendee brought up the fact that there was no way to tail application logs in CloudFoundry. This is a glaring oversight and one I hope they rectify before the Beta is finished.

The workshop continued with an introduction into binding services like Redis, Mongo or MySQL. We added new functionality to our existing application that introduced a Redis backend to store leaderboard information for Twitter activity. Lastly, Dekel demonstrated the ease of scaling our applications with vmc instances myappname 5. This simple command instantiates new copies of the application behind their dynamic load balancer. Coincidentally, they're not currently offering any sort of scalable backend storage, so keep that in mind before you try to launch any production sites on their Beta service. Now you'd never do that, would you? ;-)

The first day of RailsConf 2011 was impressive. I came last year as an exhibitor and am really excited that I get to attend this year for the conference sessions. I can't wait to see all the talks tomorrow!

DBD::Pg and the libpq COPY bug

(image by kvanhorn)

Version 2.18.1 of DBD::Pg, the Perl driver for Postgres, was just released. This was to fix a serious bug in which we were not properly clearing things out after performing a COPY. The only time the bug manifested, however, is if an asynchronous query was done immediately after a COPY finished. I discovered this while working on the new version of Bucardo. The failing code section was this (simplified):

## Prepare the source
my $srccmd = "COPY (SELECT * FROM $S.$T WHERE $pkcols IN ($pkvals)) TO STDOUT";
$fromdbh->do($srccmd);

## Prepare each target
for my $t (@$todb) {
    my $tgtcmd = "COPY $S.$T FROM STDIN";
    $t->{dbh}->do($tgtcmd);
}

## Pull a row from the source, and push it to each target
while ($fromdbh->pg_getcopydata($buffer) >= 0) {
    for my $t (@$todb) {
        $t->{dbh}->pg_putcopydata($buffer);
    }
}

## Tell each target we are done with COPYing
for my $t (@$todb) {
    $t->{dbh}->pg_putcopyend();
}

## Later on, run an asynchronous command on the source database
$sth{track}{$dbname}{$g} = $fromdbh->prepare($SQL, {pg_async => PG_ASYNC});
$sth{track}{$dbname}{$g}->execute();

This gave the error "another command is already in progress". This error did not come from Postgres or DBD::Pg, but from libpq, the underlying C library which DBD::Pg uses to talk to the database. Strangely enough, taking out the async part and running the exact same command produced no errors.

After tracking back through the libpq code, it turns out that DBD::Pg was only calling PQresult a single time after the copy ended. I can see why this was done: the docs for PQputCopyEnd state: "After successfully calling PQputCopyEnd, call PQgetResult to obtain the final result status of the COPY command. One can wait for this result to be available in the usual way. Then return to normal operation." What's not explicitly stated is that you need call PQgetResult again, and keep calling it, until it returns null, to "clear out the message queue". In this case, PQresult pulled back a 'c' message from Postgres, via the frontend/backend protocol, indicating that the copy command was complete. However, what it really needed was to call PQresult two more times, once to get back a 'C' (indicating the COPY statement was complete), and a 'Z' (indicating the backend was ready for a new query). Technically, there was nothing stopping libpq from sending a fresh query except that its own internal flag, conn->asyncStatus, is not reset on a simple end of copy, but only when 'Z' is encountered. Thus, DBD::Pg 2.18.1 now calls PQresult until it returns null.

If your application is encountering this bug and you cannot upgrade to 2.18.1 yet, the solution is simple: perform a non-asynchronous query between the end of the copy and the start of the asynchronous query. It can be any query at all, so the above code could be cured with:

...
## Tell each target we are done with COPYing
for my $t (@$todb) {
    $t->{dbh}->pg_putcopyend();
    $t->{dbh}->do('SELECT 123');
}

## Later on, run an asynchronous command on the source database
$fromdbh->do('SELECT 123');
$sth{track}{$dbname}{$g} = $fromdbh->prepare($SQL, {pg_async => PG_ASYNC});
$sth{track}{$dbname}{$g}->execute();

Why does the non-asynchronous command work? Doesn't it check the conn->asyncStatus as well? The secret is that PQexecstart has this bit of code in it:

    /*
     * Silently discard any prior query result that application didn't eat.
     * This is probably poor design, but it's here for backward compatibility.
     */
    while ((result = PQgetResult(conn)) != NULL)

Wow, that code looks familiar! So it turns out that the only reason this was not spotted earlier is that non-asynchronous commands (e.g. those using PQexec) were silently clearing out the message queue, kind of as a little favor from libpq to the driver. The async function, PQsendQuery, is not as nice, so it does the correct thing and fails right away with the error seen above (via PQsendQueryStart).

Locally served YUI3

For the vast majority of sites serving static JS and CSS files such as those required by YUI, or jQuery, etc. from a CDN makes great sense, improved performance through caching and geography, reduced load, improved uptime, leveraging some large corporations' resources, etc. Unfortunately as soon as you hit an SSL secured page you can't use a CDN's resources securely, a common thing with e-commerce sites and administration panels. In the YUI case that means doing at least a little bit of extra configuration and maintenance of an installed library base (an all too common task in typical server side development that's becoming more common as libraries are maintained for client side usage as well). Toss in "combo loading" and all of a sudden it feels like the software development cycle can't just be discarded cause you are working on the web, maybe this is really what they meant by web 2.0. But I digress...

Since I'm familiar with and working on YUI3, here is how we have it working for non-CDN based serving for an administration panel developed for a newer generation of an Interchange site. Our custom application uses YUI3 (core), modules from the YUI3 Gallery, and a few modules that are pulled in using the YUI 2in3 version compatibility layer. Now down to business...

Each of the libraries is provided by the YUI team through github.com. Since our project has git under it (you are using git, right?) we can set up submodules to help us with tracking new versions as they are released (or really be on the bleeding edge ahead of releases). To start we choose a location to serve the files, this is fairly arbitrary I happened to choose /vendor. Then set up new submodules for each of the libraries in use,

  • git submodule add http://github.com/yui/yui3 vendor/yui3
  • git submodule add http://github.com/yui/yui3-gallery vendor/yui3-gallery
  • git submodule add http://github.com/yui/2in3 vendor/yui-2in3

Follow that with the normal git submodule init, git submodule update cycle to pull down the HEAD version of each library. At this point the library files are available for local serving, and the current version is pegged against the superproject.

With the files locally stored they can be served locally. To do so we need to adjust the "seed" file for the YUI core, the loader, and any CSS files loaded on the page. For instance,

<!-- YUI Seed, Loader, and our primary source --> <script type="text/javascript" src="/combo?vendor/yui3/build/yui/yui.js&vendor/yui3/build/loader/loader.js"></script>

(For now ignore the "/combo" portion of this, more on that later.)

With the yui.js and loader.js files loading from a local resource YUI will automatically know where to look for other YUI3 core files. But for non-core files (such as gallery modules or 2in3 loaded modules) we need to provide more information to the loader through the YUI_config global variable.

The above configuration has parameters for both combo and non-combo loading. Lines 4-7 tell the loader where to find core modules explicitly, though the loader should default to these based on the location of the seed file. Starting on line 9 there are definitions for the additional groups of files that we now want locally served. The first group, lines 11-20, are needed to load our local gallery files. Lines 25-44, as the comment above them indicates, are for loading specific CSS files that were not properly handled by the first group, though I've yet to track down why (more robust skin support may fix this issue). Lines 46-62 bring in YUI 2-in-3, note the specific minor version of 2 is pegged in line 47 and 50. With this configuration in place, and properly located files on the filesystem you should be able to use any core, gallery, or 2in3 modules loaded from your local system.

But what about "combo loading." "Combo loading" allows us to reduce the number of roundtrip requests made to pull down the dependencies by concatenating a set of files together (either JS or CSS but not together), when serving from the CDN you get this because Yahoo! has provided a combo loading service. You can see a PHP based loading service provided by the YUI team in the phploader project, https://github.com/yui/phploader/ . Because we are working with Interchange and aren't using PHP I've written a similar script to be used as plain ole' boring CGI, it can be found at github. In the example code this script would be setup to run at the root of our URL space, specifically as "/combo" and the script itself will need to know where to find the local root, "vendor", on the physical file system. (It is important to note that combo loading CSS files takes some special parsing to make images and other referenced files work correctly. Also note that my version does not work with includes in the CSS.)

Finally one of the biggest benefits to this set up is being able to test literally any commit in any of the modules while leaving all else the same. This can allow you to determine when a bug first entered the picture (see git bisect) or test against development branches ahead of releases. To test against the latest and greatest between releases, you simply have to go to one of the submodule paths, fetch any recent updates, then check out the master (or any) branch, merge from the origin branch, and test. If all your tests pass, then you git add the submodule path in the superproject, git commit, and know that your stack is now up to date. Hopefully some day the gallery modules themselves will be git submodules so that maintaining a local gallery will not involve pulling down all modules and an application can peg (or update) a specific module as needed by the application, essentially you'd have your own gallery.

Benchmarking in Perl: Map versus For Loop

Last week, I was coding in Perl for an Interchange project. I've been in and out of Perl and Ruby a lot lately. While I was working on the project, I came across the following bit of code and wanted to finally sit down and figure out how to use the map function in Perl on this bit of code.

my @options;
for my $obj (@$things) {
    push @options, {
        value => $obj->{a},
        label => $obj->{b}
    };        
}
return \@options;

I'm a big fan of Ruby's inject method and in general a fan of the Enumerable Module, but I have a brain block when it comes to using the map method in both Perl and Ruby. I spent a little time investigating and working on a small local Perl script to test the implementation of the map method. I came up with the following:

return [ map {
    {
        value => $_->{a},
        label => $_->{b}
    }
} @$things ];

After that, I wanted to make sure the code change was justified. The Interchange application that is the source of this code is built for performance, so I wanted to ensure this change didn't hinder performance. It's been a while since I've done benchmarking in Perl, so I also had to refresh my memory regarding using the Benchmark module. I came up with:

#!/usr/bin/perl

use Benchmark;

my $count = 1000000;
my $things = [
    {'a' => 123, 'b' => 456, 'c' => 789 },
    {'a' => 456, 'b' => 789, 'c' => 123 }
];

#Test definitions as methods to mimic use in application
my $test1 = sub {
    my @options;
    for my $obj (@$things) {
        push @options, {
            value => $obj->{a},
            label => $obj->{b} 
        };
    }
    return \@options;
};
my $test2 = sub {
    return [ map {
        { 
            value => $_->{a},
            label => $_->{b}
        }
    } @$things ];
};

#Benchmark tests & results.
$t0 = Benchmark->new;
$test1->() for(1..$count);
$t1 = Benchmark->new;
$td = timediff($t1, $t0);
print "the code for test 1 took:",timestr($td),"\n";

$t0 = Benchmark->new;
$test2->() for(1..$count);
$t1 = Benchmark->new;
$td = timediff($t1, $t0);

print "the code for test 2 took:",timestr($td),"\n";

The results were:

Test # Before (For Loop) After (Map)
1 5 sec 4 sec
2 5 sec 4 sec
3 5 sec 5 sec
4 5 sec 5 sec
5 6 sec 4 sec
6 6 sec 4 sec
7 6 sec 4 sec
8 5 sec 5 sec
9 5 sec 4 sec
10 5 sec 4 sec
Average 5.3 sec 4.3 sec

In this case, replacing the imperative programming style here with Functional programming (via map) yielded a small performance improvement, but the script executed each method 1,000,000 times, so the performance gain yielded by just one method call is very small. I doubt it's worth it go on a code cleanup rampage to update and test this, but it's good to keep in mind moving forward as small bits of the code are touched. I also wonder if the performance will vary when the size of $things changes — something I didn't test here. It was nice to practice using Perl's map method and Benchmark module. Yippee.

NOTIFY vs Prepared Transactions in Postgres (the Bucardo solution)

We recently had a client use Bucardo to migrate their app from Postgres 8.2 to Postgres 9.0 with no downtime (which went great). They also were using Bucardo to replicate from the new 9.0 mater to a bunch of 9.0 slaves. This ran into problems the moment the application started, as we started seeing these messages in the logs:

ERROR:  cannot PREPARE a transaction that has 
executed LISTEN, UNLISTEN or NOTIFY

The problem is that the Postgres LISTEN/NOTIFY system cannot be used with prepared transactions. Bucardo uses a trigger on the source tables that issues a NOTIFY to let the main Bucardo daemon know that something has changed and needs to be replicated. However, their application was issuing a PREPARE TRANSACTION as an occasional part of its work. Thus, they would update the table, which would fire the trigger, which would send the NOTIFY. Then the application would issue the PREPARE TRANSACTION which produced the error given above. Bucardo is setup to deal with this situation; rather than using notify triggers, the Bucardo daemon can be set to look for any changes at a set interval. The steps to change Bucardo's behavior for a given sync is simply:

$ bucardo_ctl update sync foobar ping=false checktime=15
$ bucardo_ctl validate foobar
$ bucardo_ctl reload foobar

The first command tells the sync not to use notify triggers (these are actually statement-level triggers that simply issue a NOTIFY bucardo_kick_sync_foobar. It also sets a checktime of 15 seconds, which means that the Bucardo daemon will check for changes every 15 seconds - or as if the original notify trigger is firing every 15 seconds. The second command validates the sync but checking that all supporting tables, functions, triggers, etc. are installed and up to date. It also removes triggers that are no longer needed: in this case, the statement-level notify triggers for all tables in this sync. Finally, the third command simply tells the Bucardo daemon to stop the sync, load in the new changes, and restart it.

Another solution to the problem is to simply not use prepared transactions: very few applications actually need it, but I've noticed a few that use it anyway when they should not be. What exactly is a prepared transaction? It's the Postgres way of implementing two-part commit. Basically, this means that a transaction's state is stored away on disk, and can be committed or rolled back at a later time - even by a different session. This is handy if you need to ensure that, for example, you can atomically commit multiple database connections. By atomically, I mean that either they all commit or none of them do. This is done by doing work on each database, issuing a PREPARE TRANSACTION, and then, once all have been prepared, issuing the COMMIT TRANSACTION against each one.

As an aside, prepared transactions are often confused with prepared statements. While the use of prepared statements is very common, use of prepared transactions is very rare. Prepared statements are simply a way of planning a query one time, then re-running it multiple times without having to run the query through the planner each time. Many interfaces, such as DBD::Pg, will do this for you automatically behind the scenes. Sometimes using prepared statements can cause issues, but it is usually a win.

As mentioned above, the use of 2PC (two-phase commit) is very rare, which is why the default for the max_prepared_transactions variable was recently changed to 0, which effectively disallows the use of prepared transactions until you explicitly turning them on in your postgresql.conf file. This helps prevent people from accidentally issuing a PREPARE TRANSACTION and then leaving them around. This mistake is easy to do, for once you issue the command, everything goes back to normal and it's easy to forget about them. However, having them around is a bad thing, as they continue to hold locks, and can prevent vacuum from running.The check_postgres program even has a specific check for this situation:check_prepared_txns.

What does two-part commit look like? There are only three basic commands: PREPARE TRANSACTION, COMMIT PREPARED, and ROLLBACK PREPARED. Each takes a name, which is an arbitrary string 200 characters or less. Usage is to start a transaction, do some work, and then issue a PREPARE TRANSACTION instead of a COMMIT. At this point, all the work you have done is gone from your session and stored on disk. You cannot get back into this transaction: you can only commit it or roll it back. See the docs on PREPARE TRANSACTION for the full details.

Here's an example of two-part commit in action:

testdb=# BEGIN;
BEGIN
testdb=#* CREATE TABLE preptest(a int);
CREATE TABLE
testdb=#* INSERT INTO preptest VALUES (1),(2),(3);
INSERT 0 3
testdb=#* SELECT * FROM preptest;
 a 
---
 1
 2
 3
(3 rows)

testdb=#* PREPARE TRANSACTION 'foobar';
PREPARE TRANSACTION
testdb=# SELECT * FROM preptest;
ERROR:  relation "preptest" does not exist
LINE 1: SELECT * FROM preptest;
                      ^
testdb=# COMMIT PREPARED 'foobar';
COMMIT PREPARED
testdb=# SELECT * FROM preptest;
 a 
---
 1
 2
 3
(3 rows)

A contrived example, but you can see how easy it could be to issue a PREPARE TRANSACTION and not even realize that it actually sticks around forever!