Welcome to End Point’s blog

Ongoing observations by End Point people

ImageMagick EPS bug workaround

Sometimes software is buggy, and even with the malleability of open source software, upgrading to fix a problem may not be an immediate option due to lack of time, risks to production stability, or problems caused by other incompatible changes in a newer version of the software.

ImageMagick is a widely used open source library and set of programs for manipulating images in many ways. It's very useful and I'm grateful it exists and has become so powerful. However, many longtime ImageMagick users like me can attest that it has had a fair number of bugs, and upgrades sometimes don't go very smoothly as APIs change, or new bugs creep in.

Recently my co-worker, Jeff Boes, had the misfortune, or opportunity, of encountering just such a scenario. Our friends at CityPass have several site features that use ImageMagick for resizing, rotating, and otherwise manipulating or gathering data about images.

The environment specifics (skip if you're not troubleshooting an ImageMagick problem of your own!): RHEL 5 with its standard RPM of ImageMagick- The application server is Interchange, running on our local-perl-5.10.0 nonthreaded Perl build, using the local-ImageMagick-perl- library. Those custom builds are available in the endpoint Yum repository.

CityPass reported problems with some EPS (Encapsulated PostScript) images failing to process correctly by ImageMagick. In fact, the bug prevented any subsequent image processing jobs from completing in the same OS process. Upgrading ImageMagick would fix the bug, but we can't currently do that on the production server due to other compatibility problems.

After some trial and error, Jeff determined that the ImageMagick bug only kicks in when the first image processed is an EPS file. If it's any other image type, it works fine. This explained why code that had been unchanged in a year or so suddenly stopped working: Before now, no EPS file had happened to come first.

At first Jeff hacked the system to process the non-EPS files first, then sorted the results as originally desired. Then we realized there may be some rare scenarios where no non-EPS files at all were in the batch, which would trigger the bug. Jeff then had ImageMagick always first process a trivial small JPEG file which was known to work.

That worked, but Jeff then came across the idea of processing an empty image file so we didn't have a dependency on an image that might later be deleted. He tinkered a bit and came up with something suprising but even better. This is his Perl code:

my $first_im = Image::Magick->new;
# (then process all images in any order as originally intended)

I wouldn't have expected an initial read of an empty string filename to solve the problem, but it did. Accompanied by a suitable comment noting the history of the kludge for future software archaeologists, closed the case.

Software's funny, but it's nice when there's a simple -- if counterintuitive -- solution to work around a bug. And I think Jeff has mostly recovered his sanity in the meantime!

Google Summer of Code Mentors Summit

I was able to attend the Google Summer of Code (GSoC) Mentors Summit last weekend in sunny Mountain View, CA. I'd spent the previous few days working with a team to write a mentor's manual, so was full of ideas when it came time to create the actual sessions during the unconference.

The Mentors Summit is a great opportunity to mingle with the leaders in our many diverse communities. This year, the student participants were capped at 1000, and there were 150 participating open source projects mentoring them. Most of the projects were represented at the Summit.

I attended or presented at three sessions that I'll quickly summarize:

  • Casablanca: This wasn't a presentation so much as a discussion. There's one room designed to be a salon - with lots of interesting gadgets, toys and clay. A group of about 20 of us talked about what they'd learned about mentoring that year, and strategies for getting the most out of students, and recovering from student and mentor failures. Some of the smaller project representatives were in awe of the level of discipline and organization of the larger projects. Several useful wiki templates were shared, as were best practices - like having scheduled, weekly meetings with all mentors and students, and requiring daily blog writing and clear deliverable dates for bits of code.
  • Making our communities more welcoming. We arranged for a session to talk about bringing more diversity into open source projects - both gender diversity, as well as cultural. The list we came up with was general, but a good starting point for organizations new to exploring diversity issues:
    1. Build a reputation of being inclusive.
    2. Appreciate and recognize non-code contributions.
    3. Be nice to newbies!
    4. START YOUNG. Start going to middle schools and teaching computer classes.
    5. Do targeted outreach to the community you are interested in attracting.
    6. Tell about what open source does for the social good.
    7. Don't be invisible! Advertise what women are doing.
    8. Have personal contact with an individual.
    9. Have pictures that reflect diversity among your users and developers (other people like me use this software!)
  • Pretty Pictures: How to create non-text based documentation. We talked about the different projects, their approach to producing pictures, diagrams, videos and audio forms of documentation. Many tools were discussed and listed in the session notes. We also talked about software we wished we had, and ways of transcribing video and audio (I suggested that we pipe through Google Voice!). I enjoyed hearing about projects like xWiki's screencasts, and efforts that GIS and video encoder projects had underway to produce non-text documentation.

Much of the rest of the time at the conference was spent discussing individual projects, new cool things that we could be doing (PL/Parrot!), and the successes each open source project had in incorporating new people into their projects.

Photo from used under Creative Commons license BY-NC-SA 2.0

Pentaho Reporting 3.5 for Java Developers

I was recently asked to review a copy of Will Gorman's Pentaho Reporting 3.5 for Java Developers, and came to a few important realizations. Principle among these is that my idea of what "reporting" means includes far too little. Reporting includes much more than just creating a document with some graphs on it -- any document or presentation including information from a "data source" comprises a "report". This includes the typical sheaf of boardroom presentation slides, but also includes dashboards within an application, newsletters, or even form letters. I recently discovered that a local church group uses a simple reporting application to print its membership directory. In short, it's not just the analysts and managers that can use reporting.

The book gives the reader a tour of Pentaho's newest Pentaho Reporting system, which consists of a desktop application where users define reports, and a library by which developers can integrate those reports into their own applications. So as an example, not only can Pentaho Reporting publish weekly sales printouts, but it can also produce real-time inventory information in a J2EE-based web application or even a Swing application running on a user's desktop. Gorman describes, step-by-step, the process of building a report, integrating advanced data sources, working with graphics and visualizations, and building custom reporting components such as user-defined functions and custom report controls. Although this step-by-step description combined with Java's native verbosity occasionally make it difficult to read cover to cover, it provides a valuable reference and starting point for users wanting to implement advanced reporting features.

Shortly after having grasped the idea that "reporting", as a concept, was broader than I'd originally assumed, I got the idea that I'd like to know how to use Pentaho to replace a reporting system I worked on a while ago, which used Jython to gather data from a JMX server. Pentaho Reporting allows users to create data sources in languages other than Java, via the Apache Bean Scripting Framework, suggesting that my old Jython code might work mostly unchanged via Bean Scripting. Disappointingly, the book doesn't give an example of this technique, but since the Bean Scripting integration is still considered experimental (along with, apparently, several other data source types, contributed by the Pentaho community). Perhaps I'll figure this out one day, and make it the subject of a future blog post.

This fairly slight omission notwithstanding, I enjoyed the book and the ideas it suggested about places I might apply more interesting reporting. Thanks to PacktPub for the opportunity to review it.

Performance optimization of

Some years ago Davor Ocelić redesigned, Interchange's home on the web. Since then, most of the attention paid to it has been on content such as news, documentation, release information, and so on. We haven't looked much at implementation or optimization details. Recently I decided to do just that.

Interchange optimizations

There is currently no separate logged-in user area of, so Interchange is primarily used here as a templating system and database interface. The automatic read/write of a server-side user session is thus unneeded overhead, as is periodic culling of the old sessions. So I turned off permanent sessions by making all visitors appear to be search bots. Adding to interchange.cfg:

RobotUA *

That would not work for most Interchange sites, which need a server-side session for storing mv_click action code, scratch variables, logged-in state, shopping cart, etc. But for a read-only content site, it works well.

By default, Interchange writes user page requests to a special tracking log as part of its UserTrack facility. It also outputs an X-Track HTTP response header with some information about the visit which can be used by a (to my knowledge) long defunct analytics package. Since we don't need either of those features, we can save a tiny bit of overhead. Adding to catalog.cfg:

UserTrack No

Very few Interchange sites have any need for UserTrack anymore, so this is commonly a safe optimization to make.

HTTP optimizations

Today I ran the excellent test, and this was the test result. Even though is a fairly simple site without much bloat, two obvious areas for improvement stood out.

First, gzip/deflate compression of textual content should be enabled. That cuts down on bandwidth used and page delivery time by a significant amount, and with modern CPUs adds no appreciable extra CPU load on either the client or the server.

We're hosting on Debian GNU/Linux with Apache 2.2, which has a reasonable default configuration of mod_deflate that does this, so it's easy to enable:

a2enmod deflate

That sets up symbolic links in /etc/apache2/mods-enabled for deflate.load and deflate.conf to enable mod_deflate. (Use a2dismod to remove them if needed.)

I added two content types for CSS & JavaScript to the default in deflate.conf:

AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css application/x-javascript

That used to be riskier when very old browsers such as Netscape 3 and 4 claimed to support compressed CSS & JavaScript but actually didn't. But those browsers are long gone.

The next easy optimization is to enable proxy and browser caching of static content: images, CSS, and JavaScript files. By doing this we eliminate all HTTP requests for these files; the browser won't even check with the server to see if it has the current version of these files once it has loaded them into its cache, making subsequent use of those files blazingly fast.

There is, of course, a tradeoff to this. Once the browser has the file cached, you can't make it fetch a newer version unless you change the filename. So we'll set a cache lifetime of only one hour. That's long enough to easily cover most users' browsing sessions at a site like this, but short enough that if we need to publish a new version of one of these files, it will still propagate fairly quickly.

So I added to the Apache configuration file for this virtual host:

ExpiresActive On
ExpiresByType image/gif  "access plus 1 hour"
ExpiresByType image/jpeg "access plus 1 hour"
ExpiresByType image/png  "access plus 1 hour"
ExpiresByType text/css   "access plus 1 hour"
ExpiresByType application/x-javascript "access plus 1 hour"
FileETag None
Header unset ETag

This adds the HTTP response header "Cache-Control: max-age=3600" for those static files. I also have Apache remove the ETag header which is not needed given this caching and the Last-modified header.

There are cases where the above configuration would be too broad, for example, if you have:

  • images that differ with the same filename, such as CAPTCHAs
  • static files that vary based on logged-in state
  • dynamically-generated CSS or JavaScript files with the same name

If the website is completely static, including the HTML, or identical for all users at the same time even though dynamically generated, we could also enable caching the HTML pages themselves. But in the case of, that would probably cause trouble with the Gitweb repository browser, live documentation searches, etc.

After those changes, we can see the results of a new run and see that we reduced the bytes transferred, and the delivery time. It's especially dramatic to see how much faster subsequent page views of the Hall of Fame are, since it has many screenshot thumbnail images.

Optimizing a simple non-commerce site such as is easy and even fun. With caution and practicing on a non-production system, complex ecommerce sites can be optimized using the same techniques, with even more dramatic benefits.

Upgrading from RHEL 5.2 to CentOS 5.4

I have a testing server that was running RHEL 5.2 (x86_64) but its RHN entitlement ran out and I wanted to upgrade it to CentOS 5.4. I found a few tips online about how to do that, but they were a little dated so here are updated instructions showing the steps I took:

yum clean all
mkdir ~/centos
cd ~/centos
rpm --import RPM-GPG-KEY-CentOS-5 
rpm -e --nodeps redhat-release
rpm -e yum-rhn-plugin yum-updatesd
rpm -Uvh *.rpm
yum -y upgrade
# edit /etc/grub.conf to point to correct new kernel (with Xen, in my case)
shutdown -r now

It has worked well so far.

Talk slides are available! Bucardo: Replication for PostgreSQL

I'm in Seattle for the PostgreSQL Conference West today! I just finished giving a talk on Bucardo, a master-slave and multi-master replication system for Postgres.

The talk was full, and had lots of people who've used Slony in the past, so I got lots of great questions. I realized we should publish some "recommended architectures" for setting up the Bucardo control database, and provide more detailed diagrams for how replication events actually occur. I also talked to someone interested in using Bucardo to show DDL differences between development databases and suggested he post to the mailing list. Greg has created scripts to do similar things in the past, and it would be really cool to have Bucardo output runnable SQL for applying changes.

I also made a hard pitch for people to start a SEAPUG, and it sounds like some folks from the Fred Hutchinson Cancer Research Center are interested. (I'm naming names, hoping that we can actually do it this time :D). If you are from the Seattle area, go ahead and subscribe to the mailing list (pick 'seapug' from the list dropdown menu)!

Thanks everyone who attended, and I'm looking forward to having lunch with a bunch of PostgreSQL users here in Seattle!

Rails Approach for Spree Shopping Cart Customization

Recently, I was assigned a project to develop Survival International's ecommerce component using Spree. Survival International is a non-profit organization that supports tribal groups worldwide in education, advocacy and campaigns. Spree is an open source Ruby on Rails ecommerce platform that was sponsored by End Point from its creation in early 2008 until May 2009, and that we continue to support. End Point also offers a hosting solution for Spree (SpreeCamps), that was used for this project.

Spree contains ecommerce essentials and is intended to be extended by developers. The project required customization including significant cart customization such as adding a buy 4 get 1 free promo discount, adding free giftwrap to the order if the order total exceeded a specific preset amount, adding a 10% discount, and adding a donation to the order. Some code snippets and examples of the cart customization in rails are shown below.

An important design decision that came up was how to store the four potential cart customizations (buy 4 get 1 free promo, free giftwrap, 10% discount, and donation). The first two items (4 get 1 free and free gift wrap) are dependent on the cart contents, while the latter two items (10% discount and donation) are dependent on user input. Early on in the project, I tried using session variables to track the 10% discount application and donation amount, and I applied an after_filter to calculate the buy 4 get 1 free promo and free giftwrap for every order edit, update, or creation. However, this proved somewhat cumbersome and required that most Rails views be edited (frontend and backend) to show the correct cart contents. After discussing the requirements with a coworker, we came up with the idea of using a single product with four variants to track each of the customization components.

I created a migration file to introduce the following variants similar to the code shown below. A single product by the name of 'Special Product' contained four variants with SKUs to denote which customization component they belonged to ('supporter', 'donation', 'giftwrap', or '5cards').

p = Product.create(:name => 'Special Product', :description => "Discounts, Donations, Promotions", :master_price => 1.00)
v = Variant.create(:product => p, :price => 1.00, :sku => 'supporter') # 10% discount
v = Variant.create(:product => p, :price => 1.00, :sku => 'donation')  # donation
v = Variant.create(:product => p, :price => 1.00, :sku => 'giftwrap')  # free giftwrap
v = Variant.create(:product => p, :price => 1.00, :sku => '5cards')    # buy 4 get 1 free discount

Next, I added accessor elements to retrieve the variants shown below. Each of these accessor methods would be used throughout the code and so this would be the only location requiring an update if the variant SKU was modified.

module VariantExtend
  def get_supporter_variant
  def get_donation_variant
  def get_giftwrap_variant
  def get_cards_promo_variant

The design to use variants makes the display of cart contents on the backend and frontend much easier, in addition to calculating cart totals. In Spree, the line item price is not necessarily equal to the variant price or product master price, so the prices stored in the product and variant objects introduced above are meaningless to individual orders. An after_filter was added to the Spree orders controller to add, remove, or recalculate the price for each special product variant. The order of the after_filters was important. The cards (buy 4 get 1 free) discount was added first, followed by a subtotal check for adding free giftwrap, followed by adding the supporter discount which reduces the total price by 10%, and finally a donation would be added on top of the order total:

OrdersController.class_eval do
  after_filter [:set_cards_discount, :set_free_giftwrap, :set_supporter_discount, :set_donation], :only => [:create, :edit, :update]

Each after filter contained specific business logic. The cards discount logic adds or removes the variant from the cart and adjusts the line item price:

def set_cards_discount
  v =  # get variant
  # calculate buy 4 get 1 free discount (cards_discount)
  # remove variant if order contains variant and cards_discount is 0
  # add variant if order does not contain variant and cards_discount is not 0
  # adjust price of discount line item to cards_discount
  # save order

The free giftwrap logic adds or removes the variant from the cart and sets the price equal to 0:

def set_free_giftwrap
  v =  # get variant
  # remove variant if cart contains variant and order subtotal < 40
  # add variant if cart does not contain variant and order subtotal >= 40
  # adjust price of giftwrap line item to 0.00
  # save order

The supporter discount logic adds or removes the discount variant depending on user input. Then, the line item price is adjusted to give a 10% discount if the cart contains the discount variant:

def set_supporter_discount
  v =  # get variant
  # remove variant if cart contains variant and user input to receive discount is 'No'
  # add variant if cart does not contain variant and user input to receive discount is 'Yes'
  # adjust price of discount line item to equal 10% of the subtotal (minus existing donation)
  # save order

Finally, the donation logic adds or removes the donation variant depending on user input:

def set_donation
  v =  # get variant
  # remove variant if cart contains variant and user donation is 0
  # add variant if cart does not contain variant and user donation is not 0
  # adjust price of donation line item
  # save order

This logic results in a simple process for all four variants to be adjusted for every recalculation or creation of the cart. Also, the code examples above used existing Spree methods where applicable (add_variant) and created a few new methods that were used throughout the examples above (Order.remove_variant(variant), Order.adjust_price(variant, price)). A few changes were made to the frontend cart view.

To render the desired view, line items belonging to the "Special Product" were not displayed in the default order line display. The buy 4 get 1 free promo and free giftwrap were added below the default line order items. Donations and discounts were shown below the line items in order of how they are applied to the order. The backend views were not modified and as a result the site administrators would see all special variants in an order:

An additional method was created to define the total number of line items in the order, shown at the top right of every page except for the cart and checkout page.

module OrderExtend
  def mod_num_items
    item_count = line_items.inject(0) { |kount, line_item| kount + line_item.quantity } +   
      (contains?( ? -1 : 0) +   
      (contains?( ? -1 : 0) +   
      (contains?( ? -1 : 0) + 
      (contains?( ? -1 : 0)
    item_count.to_s + (item_count != 1 ? ' items' : ' item')

The solution developed for this project was simple and extended the Spree core ecommerce code elegantly. The complex business logic required was easily integrated in the variant accessor methods and after_filters to re add, remove, and recalculate the price of the custom variants where necessary. The project required additional customizations, such as view modifications, navigation modifications, and complex product optioning, which may be discussed in future blog posts :).

Learn more about End Point's Rails development or Spree ecommerce development.

Fun with SQL

Many programmers, I expect, have a favorite obscure language or two they'd like to see in wider use. Haskell has quite a following, though it sees relatively little use; the same can be said for most pure functional languages. Prolog seemed like a neat idea when I first read about it, but I've never heard of anyone using it for something serious (caveat: there are lots of things I've never heard of).

My own favorite underused language is SQL. Although most programmers have at least a passing familiarity with SQL, and many use it daily, few seem to achieve real SQL fluency. This is unfortunate; SQL databases are powerful and ubiquitous tools, and even the least among them can generally manage a great deal more than the fairly simple uses to which they are commonly put.

I recently had the opportunity to trot out this curmudgeonly opinion of mine at the Utah Open Source Conference. Now in its third year, this annual conference continues to surprise me. Utah is home to an extraordinarily vibrant open source community, and it shows, in the attendance (over 400 expected), the number of presentations (92 by my count of the schedule, perhaps minus a few that aren't really presentations, such as the Geek Dinner), and the overall quality and diversity of information presented.

Conference presentations were all recorded, and will eventually be available from the UTOSC website. Most speakers, in the meantime, have made their slides available. Mine are here.

New End Point Site Launched: Rails, jQuery, Flot, Blogger

This week we launched a new website for End Point. Not only did the site get a facelift, but the backend content management system was entirely redesigned.

Goodbye Old Site:

Hello New Site:

Our old site was a Rails app with a Postgres database running on Apache and Passenger. It used a custom CMS to manage dynamic content for the bio, articles, and service pages. The old site was essentially JavaScript-less, with the exception of Google Analytics.

Although the new site is still a Rails application, it no longer uses the Postgres database. As developers, we found that it is more efficient to use Git as our "CMS" rather than developing and maintaining a custom CMS to meet our ever-changing needs. We also trimmed down the content significantly, which further justified the design; the entire site and content is now comprised of Rails views and partial views. Also included in the new site is cross browser functioning jQuery and flot. Some of the interesting implementation challenges are discussed below.

jQuery Flot Integration

The first interesting JavaScript component I worked on was using flot to improve interactivity to the site and to decrease the excessive text that End Pointers are known for [for example, this article]. Flot is a jQuery data plotting tool that contains functionality for plot zooming, data interactivity, and various configuration display settings (see more flot examples). I've used flot before in several experiments but had yet to use it on a live site. For the implementation, we chose to plot our consultant locations over a map of the US to present our locations in an interactive and fun to use way. The tedious part of this implementation was actually creating the datapoints to align with cities. Check out the images below for examples.

Flot has built in functionality for on hover events. When a point on the plot is hovered over, correlating employees are highlighted using jQuery and their information is presented to the right of the map.

When a bio picture is hovered over, the correlating location is highlighted using jQuery and flot data point highlighting.

We also implemented a timeline using flot to map End Point's history. Check out the images below.

When a point on the plot is hovered over, the history details are revealed in the section below.

The triangle image CSS position is adjusted when a point on the plot is activated.

Dynamic Rails Partial Generation

One component of the old site that was generated dynamically sans-CMS was Blogger article integration into the site. A cron job ran daily to import new Blogger article title, link, and content snippets into the Postgres database. We opted for removing dependency on a database with the new site, so we investigated creative ways to include the dynamic Blogger content. We developed a rake task that is run via cron job to dynamically generate partial Rails views containing Blogger content. Below is an example and explanation of how the Blogger RSS feed is retrieved and a partial is generated:

Open URI and REXML are used to retrieve and parse the XML feed.

require 'open-uri'
require 'rexml/document'

The feed is retrieved and a REXML object created from the feed in the rake task:

data = open('', 'User-Agent' => 'Ruby-Wget').read
doc =

The REXML object is iterated through. An array containing the Blogger links and titles is created.

results = [] 
doc.root.each_element('//item') do |item|
  author = item.elements['author'].text.match(/\(.+/).to_s.gsub(/\.|\(|\)/,'')
  results << '<a href="' + item.elements['link'].text + '">' + item.elements['title'].text + '</a>'

Finally, a Rails dynamic partial is written containing the contents of the results array:{RAILS_ROOT}/app/views/blog/_index.rhtml", 'w') { |f| f.write(results.inject('') { |s, v| s = s + '<p>' + v  + '</p>'}) }

A similar process was applied for bio and tag dynamic partials. The partials are included on pages such as the End Point service pages, End Point bio pages, and End Point home page.

jQuery Carousel Functionality

Another interesting JavaScript component I worked on for the new site was the carousel functionality for the home page and client page. Carousels are a common "web 2.0" JavaScript component where visible items slide one direction out of view and new items slide into view from the other direction. I initially planned on implementing a simple carousel with a jQuery plugin, such as jCarousel. Other JavaScript frameworks also include carousel functionality such as the YUI Carousel Control or the Prototype UI. I went along planning to implement the existing jQuery carousel functionality, but then was asked, "Can you make it a circular carousel where the left and right buttons are always clickable?" In many of the existing carousel plugins and widgets, the carousel is not circular, so this request required custom jQuery. After much cross-browser debugging, I implemented the following (shown in images for a better explanation):

Step 1: The page loads with visible bios surrounded by empty divs with preset width. The visibility of the bios is determined by CSS use of the overflow, position, and left attributes.

Step 2: Upon right carousel button click, new bios populate the right div via jQuery.

Step 3: To produce the carousel or slider effect, the left div uses jQuery animation functionality and shrinks to a width of 0px.

Step 4: Upon completion of the animation, the empty left div is removed, and a new empty div is created to the right of the new visible bios.

Step 5: Finally, the left div's contents are emptied and the carousel is in its default position ready for action!

Another request for functionality came from Jon. He asked that we create and use "web 2.0" URLs to load specific content on page load for the dynamic content throughout our site, such as,

Upon page load, JavaScript is used to detect if a relative link exists:

if(document.location.href.match('#.+')) {
    var id = document.location.href.match('#.*').toString().replace('#', '');

The id retrieved from the code snippet above is used to populate the dynamic page content. Then, JavaScript is used during dynamic page functionality, such as carousel navigation, to update the relative link:

document.location.href = document.location.href.split('#')[0] + '#' + anchor;

Twitter Integration

Another change in the new site was importing existing functionality previously written in Python to update End Point's Twitter feed automagically. The rake task uses the Twitter4R gem to update the Twitter feed and is run via cron job every 30 minutes. See the explanation below:

The public twitter feed is retrieved using Open URI and REXML.

    data = open('', 'User-Agent' => 'Ruby-Wget').read
    doc =

An array containing all the titles of all tweets is created.

    doc.each_element('statuses/status/text') do |item|
      twitter << item.text.gsub(/ http:\/\/j\.mp.*/, '')

The blogger RSS feed is retrieved and parsed. An array of hashes is created to track the un-tweeted blog articles.

    data = open('', 'User-Agent' => 'Ruby-Wget').read
    doc =
    found_recent = false
    doc.root.each_element('//item') do |item|
      found_recent = true if twitter.include?(item.elements['title'].text)
      blog << { 'title' => item.elements['title'].text, 'link' => item.elements['link'].text } if !found_recent

Using the api, a short url is generated. A Twitter message is created from the short URL.

      data = open('' + blog.last['link'] + '&login=**&apiKey=*****&format=xml')
      twitter_msg = blog.last['title'] + ' ' + short_url

The twitter4r gem is used to login and update the Twitter status message.

      client = => **, :password => *****)
        status = client.status(:post, twitter_msg)

Google Event Tracking

Finally, since we implemented dynamic content throughout the site, we decided to use Google Event Tracking to track user interactivity. We followed the standard Google Analytics event tracking to add events for events such as the slider carousel user involvement, the team page bio and history hover user involvement:

//pageTracker._trackEvent(category, action, optional_label, optional_value);
pageTracker._trackEvent('Team Page Interaction', 'Map Hover', bio);

We are happy with the new site and we hope that it presents our skillz including Interchange Development, Hosting Expertise and Support, and Database Wizardry!

Learn more about End Point's rails development.

rsync and bzip2 or gzip compressed data

A few days ago, I learned that gzip has a custom option --rsyncable on Debian (and thus also Ubuntu). This old write-up covers it well, or you can just `man gzip` on a Debian-based system and see the --rsyncable option note.

I hadn't heard of this before and think it's pretty neat. It resets the compression algorithm on block boundaries so that rsync won't view every block subsequent to a change as completely different.

Because bzip2 has such large block sizes, it forces rsync to resend even more data for each plaintext change than plain gzip does, as noted here.

Enter pbzip2. Based on how it works, I suspect that pbzip2 will be friendlier to rsync, because each thread's compressed chunk has to be independent of the others. (However, pbzip2 can only operate on real input files, not stdin streams, so you can't use it with e.g. tar cj directly.)

In the case of gzip --rsyncable and pbzip2, you trade a little lower compression efficency (< 1% or so worse) for reduced network usage by rsync. This is probably a good tradeoff in many cases.

But even more interesting for me, a couple of days ago Avery Pennarun posted an article about his experimental code to use the same principles to more efficiently store deltas of large binaries in Git repositories. It's painful to deal with large binaries in any version control system I've used, and most people simply say, "don't do that". It's too bad, because when you have everything else related to a project in version control, why not some large images or audio files too? It's much more convenient for storage, distribution, complete documentation, and backups.

Avery's experiment gives a bit of hope that someday we'll be able to store big file changes in Git much more efficiently. (Though it doesn't affect the size of the initial large object commits, which will still be bloated.)