End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Integrating UPS Worldship - Pick and Pack

Using UPS WorldShip to automate a pick and pack scenario

There are many options when selecting an application to handle your shipping needs. Typically you will be bound to one of the popular shipping services; UPS, FedEx, or USPS or a combination thereof. In my experience UPS Worldship offers a very robust shipping application that is dynamic enough to accommodate integration with just about any custom or out of the box ecommerce system.

UPS Worldship offers many automating features by allowing you to integrate in many different ways. The two main automated features consist of batch label printing and individual label printing. I would like to cover my favorite way of using UPS Worldship that allows you to import and export data seamlessly.

You should choose the solution that works best for you and your shipping procedure. In this blog post I would like to discuss a common warehouse scenario refereed to as Pick And Pack. The basic idea of this scenario is an order is selected for a warehouse personnel to fulfill, it is then picked, packed, and shipped. UPS Worldship allows you to do this in a very automated way with a bit of customization. This is a great solution for a small to medium sized business that wants to automate their shipping process and communicate tracking information with their customers.

Overall Breakdown of Process

There are a few steps involved in integrating your system with UPS Worldship. To get started I have listed the high-level breakdown of the process. I mention a few tables in this example, that I will explain in detail in the next section.




  1. An order is placed on your website and saved into your database. For the sake of this example this will be the 'orders' table
  2. A warehouse worker can then print out a packing slip that contains a barcode
  3. Worldship then grabs the order information from the 'orders' table and inserts the shipping information in the the mapped fields
  4. The Worldship operator then presses 'F7' which will process the shipment with UPS, thus retrieving a tracking number
  5. Worldship marks the order as shipped and inserts the tracking number into a place holder tracking table
  6. Worldship prints an active shipping label for your customer's order

Setting Up Your Data Structure

In order to integrate Worldship seamlessly you will need to make a few database modifications. I have decided to use two tables 'orders' and 'ups_order_tracking'. The 'orders' table represents a standard table that contains the shipping information for an order. The 'ups_order_tracking' table is used to hold an order number and a tracking number. The order number, of course, refers to the unique order number in the 'orders' table. Every system is different, but this is a simple way that worked for me in the past. You will most likely need to make a few modifications to suit the needs of your data model and environment. I have included an example that will show what you will need at the very least.

'orders'

This is the bare information needed by Worldship in order to fill in the shipping information for a package. I have added two other columns 'tracking', and 'tracking_sent'. The 'tracking' column will hold the tracking number for this order. The 'tracking_sent' is a boolean that will keep track of our tracking number emails discussed later in this post.

Column                   |         Type          |
-------------------------+-----------------------+
 id                      | integer               |
 order_status_id         | integer               |
 ship_to_name            | character varying     |
 ship_to_address         | character varying     |
 ship_to_address2        | character varying     |
 ship_to_city            | character varying     |
 ship_to_state_code      | character varying     |
 ship_to_province        | character varying     |
 ship_to_zip             | character varying     |
 tracking                | character varying     |
 tracking_sent           | boolean               |
'ups_order_tracking'

This table acts as a temporary holding table for the tracking number for an individual order. I have found that it is much easier to have Worldship insert rows to a table and have a trigger copy the information to the 'orders' table (or something similar depending on your database), as opposed to updating a table. Since this is the case we smiply need to create a trigger that will updated the 'orders' table when a row is inserted into 'ups_order_tracking'.

Column        |         Type          |
--------------+-----------------------+
 order_id     | integer               |
 tracking     | character varying     |

Setting Up Triggers

This article is written with Postgres used as the database. You will need to make the appropriate adjustments for your environment. The 'ups_order_tracking' table will need a simple trigger that is responsible for the following:

  • Updating the 'order.order_status_id' column with a shipped flag (in this example 2 means it has been shipped)
  • Updating 'order.tracking' with the tracking number supplied by Worldship

BEGIN
:
:  UPDATE order
:    SET order_status_id = 2 WHERE id = NEW.order_id;
:  UPDATE order
:    SET tracking = NEW.tracking WHERE id = NEW.order_id;
:     RETURN NEW;
END;
After you have configured your database you are now ready to setup and integrate Worldship.

Integrating UPS Worldship

UPS Worldship offers many ways to import shipment data. Since this article is about automating a pick and pack scenario, I am only going to cover how to use the Connection Assistant to import and export data from your database.

You will need to install the appropriate ODBC driver and setup access to your database before starting this step. RazorSQL.com has a decent explanation to get you started: ODBC Setup. Once you have this setup and connecting to your database you can continue to the 'Importing Data' section below.


Importing Data

Follow these steps below and reference the starting on Page 10: UPS Importing Shipment Instructions.

Please note the following steps:
  • Step 4: make sure you select 'By Known ODBC Source' and select your installed ODBC driver you setup previously
  • Step 8 (part 1): you want to select the 'orders' table (or whatever your table is called) and map the appropriate shipment information to the Worldship fields on the right
  • Step 8 (part 2): When mapping your data from your orders table make sure you set the Reference ID field to the order number. This allows you to 1) Use the order number later when exporting your order data and 2) You can then search UPS by your tracking number OR your Reference ID which is also your order number (very convenient if a tracking number is lost!)
  • Step 12: If you have custom shipping options that is predetermined make sure you map these as seen in Step 12
  • Name your map something meaningful like 'Shipment Import'
  • Step 20: Make sure you select your newly named import map under Keyed Import as this is how Worldship knows to use your ODBC driver and map to import your shipping data
Exporting Data

Follow these steps below and reference the starting on Page 1 UPS Exporting Shipment Data Instructions. Please note the following steps:

  • Skip to Page 8: 'Export Shipment Data using Connection Assistant' since we want to automatically update our 'ups_order_tracking' table after a label is processed
  • Step 8: Make sure you map the tracking number and order number to the 'ups_order_tracking' table.
  • Name your map something meaningful like 'Shipment Export'
  • Step 12: You can either configure Worldship to update your 'ups_order_tracking' table at the 'End of the Day' or 'After processing Shipment'. I prefer to have Worldship update my 'ups_order_tracking' after each label is printed so the data is immediately available in the database.

Processing an Order

Now that you have setup Worldship to interface with your database by creating maps you can use the 'Keyed Import' functionality to start processing packages. After you have selected your map under 'Keyed Import' you will see a small dialog box that is waiting for input.

Scanner

In my experience the fastest way to pull shipment data is by using a scanner that can scan the barcode on your packing slip. This barcode is the encoded order number that is referenced by your 'orders' table and used by the maps you created to retrieve the appropriate data. Most scanners can be configured to supply the key after a successful scan has occurred. NOTE: You must make sure the 'Keyed Import' box has focus and is waiting import. The basic process is as follows:

  • An order is printed with a barcode (I was able to make use of PHP Barcode Generator to generate my barcodes on the packing slip). You will need to find something that suites your needs if you want to make use of barcodes.
  • User either scans an order or enters the order number in manually into the Keyed Import input.
  • Worldship then pulls the order data is pulled from the database and inserted into the proper Worldship fields.
  • The Worldship operator then presses the 'F7' key to process the shipment.
  • Worldship then inserts the order_id and tracking number into ups_order_tracking tabl.e
  • The ups_order_tracking table's trigger is executed and updates the 'orders' table (or whatever is needed for your data model).
  • Worldship prints out the shipping label and runs the action you selected to run after a shipment is processed, or at the end of day.
  • Your order is now marked as shipped, updated with a tracking number, and you have a package ready to be picked up by UPS.

This might be enough for your needs, but I like to send an email to the customer letting them know their order has shipped and giving them a UPS tracking number to track their package.

Email the Customer a Tracking Number

UPS Worldship does offer a feature that will send an email with a tracking number after the label is printed. This might be enough for some people, but it does not offer anything in the way of customizing the communication. Most businesses prefer branded emails with custom information in all communication sent to their customers. As such, I integrated a small feature that sends a custom and branded email to customers. This email includes their tracking number with a link to the UPS tracking page along with a friendly message letting them know their order is on the way.

Remember the order.tracking_sent boolean mentioned earlier ? This is where that field will come in handy. I wrote a small Perl script that runs every few hours. The script queries the 'orders' table and looks for:

  • order.order_status_id = 2 (The order has been set to shipped)
  • order.tracking_set IS NULL (A tracking email has not been sent)

After it pulls a list of all of the orders that have been marked as shipped AND have not had a tracking email sent, it pulls the tracking number and fires off an email to the customer with the tracking number. The script then sets 'order.tracking_sent' to TRUE so the next time the script runs it does not resend the tracking number to the customer. This is of course a very custom feature specific to this database. I am sure you would want to customize this to your needs. I thought it was worth mentioning as customers really like confirmation that 1) Their order was placed and 2) Their order has shipped (with a means of tracking its progress).

Final Thoughts

As mentioned initially UPS Worldship offers many ways of integrating into your environment and offers many customizations. UPS offers great support if you simply contact your UPS representative they can put you in touch with a developer that can answer any question you have. I believe that for a Pick and Pack scenario using a packing slip with a barcode, a scanner, and a properly configured Worldship application you can streamline a small to medium sized warehouse environment. Unfortunately there are not many affordable solutions for small to medium sized e-commerce businesses, but UPS Worldship does a great job trying to fill that need and automate your shipping and communication needs.


Simple Pagination with AJAX

Here's a common problem: you have a set of results you want to display (search results, or products in a category) and you want to paginate them in a way that doesn't submit and re-display your results page every time. AJAX is a clear winner in this; I'll outline a very simple, introductory approach for carrying this off.

(I'm assuming that the reader has some modest familiarity with Javascript and jQuery, but no great expertise. My solutions below will tend toward the “Cargo Cult” programming model, so that you can cut and paste, tweak, and go, but with enough “how and why” sprinkled in so you will come away knowing enough to extend the solution as needed.)

Firstly, you have to have the server-side processing in place to serve up paginated results in a way you can use. We'll assume that you can write or adapt your current results source to produce this for a given URL and parameters:

/search?param1=123&param2=ABC&sort=colA,colB&offset=0&size=24

That URL offers a state-less way to retrieve a slice of results: in this case, it corresponds to a query something like:

SELECT … FROM … WHERE param1='123' AND param2='ABC'
ORDER BY colA,colB OFFSET 0 LIMIT 24

You can see that this will generate a slice of 0-24 results; changing “offset” will get other slices, which is the foundation of our ability to “page” the results.

The code behind “/search” should return a JSON structure suitable for your needs. My usual approach is to assemble what I want in Perl, then pass it through JSON::to_json:

my $results = perform_search(...);
my $json = JSON::to_json($results);

Don't forget to include an appropriate document header:

Content-type: application/json

Now we need a Javascript function to retrieve a slice; I'll use jQuery here as it's my preferred solution (and because I'm not at all fluent in non-jQuery approaches!).

function(){
  $.ajax({ url: '/search', data: { …, offset: $offset, limit: 24 },
…}

You'll need to keep track of (or calculate) the offset within your page. My approach is to drop each result into a div or similar HTML construction, then I can count them on the fly:

var $offset = $('div.search_result').length;

For the “data” passed in the AJAX call above, you need to assemble your query parameters, most likely from form elements on the page. (Newbie note: you can put <input> and <select> elements in the page without a surrounding <form>, as jQuery doesn't care -- you aren't going to submit a form, but construct a URL that looks like it came from a form.) Here's one useful model to follow:

var $data = { offset: $offset, limit: 24 };
$.each(['param1', 'param2'], function(ix, val) {
  $data[val] = $('input[name=' + val + '], select[name=' + val + ']').val();
};
and then:
$.ajax({ url: '/search', data: $data, … });

Now we need something to handle the returned data. Within the ajax() call, we reference (or construct) a function that takes at least one argument:

function(results) { … }

“results” is of course the JSON structure you built up in the “/search” response. Here, we'll assume that you are just sending back an array of objects:

[ { col1: 'val1', col2: 'val2', col3: 'val3' }, { col1: 'val4', col2: 'val5', col3: 'val6' } ]
would represent two rows of three columns each. We can now process these:
$.each(results, function(ix, val){
  var new_result = $('div.search_result').first().clone();
  $(new_result).find('span.col1').html(val.col1);
  $(new_result).find('span.col2').html(val.col2);
  $(new_result).find('span.col3').html(val.col3);
  $('div.search_result').last().append(new_result);
};

The entire working example can be seen in action here.

In place of an actual database query or search engine, I have a simple PHP program that sends back a chunk of simulated rows. A few other notes and finesses:

  • In the HTML document, I have a template for the "search_result" DIV that is hidden. I can style this any way I like, then I clone it for each returned result row. Note that it's initially hidden, so after appending a new clone to the page, I have to "show()" it.
  • I do some very simple arrangement of the results by inserting a hard break after every four results. You could do much fancier arrangements: assigning CSS classes based on whether the new result is at an edge of the grid, for instance.
  • Error handling in this example is very rudimentary.

Here's a screenshot of what you might expect, with Firebug showing the returned JSON object.


Liquid Galaxy at Doodle 4 Google

Last week I went to Google’s New York Office on 8th Ave with Ben, intern Ben, and hired hand Linton. For those who have not experienced this wonderful place, Google’s building takes up an entire city block, is very colorful, and is probably one of the coolest places I have ever been to in the Big Apple. Walking through the huge building is an experience in itself, with people riding Razor Scooters by you as you pass by street signs marking different areas in the office (It was explained to me that each floor is themed after a different place in the city. For example, the 10th floor, the main floor we were working on, is based on Queens). And of course they have the best break rooms. Free food EVERYWHERE!

Also they have ball pits. You know you are awesome when you have ball pits.

Anyway, the reason we were at Google in the first place was to move the Liquid Galaxy on the 10th floor down to the 5th floor. It was great to see how many people came up to us and told how much they enjoyed using the system, and they all wanted to know when and if it would ever be back. Moving the Liquid Galaxy went smoothly, and setting it back up on the 5th floor (at the “Water Tower”) went even smoother, if possible. With everyone’s help we were able to get into the office and out in about four and a half hours with the Liquid Galaxy all set up, cleaned off, and in good working condition.

The move was made for the Doodle 4 Google event, an awards ceremony where 50 student finalists (Kindergarten to 12th grade, one from each state) are awarded for their interpretation of the Google homepage logo. The theme this year was “If I could go back in time, I would…” The national finalist won a $30,000 college scholarship, $50,000 for their school, a Chromebook, a Wacom Digital Design tablet, and their design printed on a t-shirt, though all finalists received some sort of scholarship and a t-shirt printed with their design. Also, if you checked Google's homepage last Friday (2012-05-18), you would have seen the winner’s drawing in place of the normal logo. The winner, a 2nd grader from Wisconsin, visited the golden era of pirates in his drawing. I did not know it at the time, but he was also the first kid to wander over to the Liquid Galaxy to find out what it was. He was a little intimidated at first, but after some friendly encouragement he decided to check it out. It turned out later that we would have to take him to the side and explain that he could not hit the top of the 3D Space Navigator and yell “SMASH” while other people were using the Liquid Galaxy (and completely disorienting all the surrounding parents in the process).

National winner’s drawing

Regardless, staffing the booth and talking to people was a great experience. Nearly everyone at the event (including the staff!) wanted to come over and fly around, and I heard many funny stories from children and adults alike as people went to different places that meant something to them. However, as an anonymous Internet user once said: “It’s amazing… with Google Earth you can fly anywhere in the world, but all anyone wants to see is their house,” and this was certainly true at Doodle 4 Google.

Linton and Jandro with some people checking out the LG

Doodle 4 Google lasted from 12:30 to 2:30, and after the event ended, Ben, Ben, Linton and I packed up the Liquid Galaxy once more and brought it back upstairs to the 10th floor. The second packing up and re-installation went even better than the first day, and Ben took the opportunity to show us some Linux System Administration and how to operate the Switched PDU (Power Distribution Unit) which controls the outlets that provide power to the computers on the Liquid Galaxy.

The two days at Google were very busy, but helping out at Google's office was a great experience, and one which I hope to enjoy again sometime in the future.

Website Performance Boot Camp at UTOSC 2012

I’ll keep brief my last post about this year’s Utah Open Source Conference.

I was asked to give on both day one and day two a talk called “Website Performance Boot Camp” which carried this brief description:

What’s the difference between a snappy website and a sloth that you turn away from in frustration? A lot of little things, usually. It’s rarely worth doing 100% of the optimization you could do, but getting 75% of the way isn’t hard if you know where to look.

We’ll look at HTTP caching, compression, proxying, CDNs, CSS sprites, minification, and more, how to troubleshoot, and what’s best to leave alone when you have limited time or tolerance for risk.

Here is the video recording of the first time I presented the talk. (The technician noted its audio was “a little hot”.)

Use this Website Performance Boot Camp direct YouTube video link if the embedded video doesn’t work for you.

The slides for this Website Performance Boot Camp presentation are available.

Thanks again to the conference organizers and the other speakers and sponsors, and the nice venue Utah Valley University, for making it a great conference!

UTOSC 2012 talks of interest

It's been two weeks now since the Utah Open Source Conference for 2012. My fellow End Pointers wrote previously about it: Josh Ausborne about the mini Liquid Galaxy we set up there for everyone to play with, and Josh Tolley with a write-up of his talks on database constraints and KML for geographic mapping markup.

There were a lot of interesting talks planned, and I could only attend some of them. I really enjoyed these:

  • Rob Taylor on AngularJS
  • Brandon Johnson on Red Hat's virtualization with oVirt, Spacewalk, Katello, and Aeolus
  • Clint Savage about RPM packaging with Mock & Koji
  • Daniel Evans on testing web applications with Capybara, embedded WebKit, and Selenium (which End Pointer Mike Farmer wrote about here back in December)
  • Aaron Toponce on breaking full-disk encryption (I missed this talk, but learned about it from Aaron in the hallway track and his slides afterwards)
  • Matt Harrison's tutorial Hands-on intermediate Python, covering doctest, function parameters and introspection, closures, function and class decorators, and more.

I gave a talk on GNU Screen vs. tmux, which was fun (and ends with a live demo that predictably fell apart, and audience questions that you can't hear on the recording). Here's the video:

Follow this Direct YouTube link in case the embedded version doesn't work for you. And here are the presentation slides.

I have a bit more to cover from the conference later!

Keeping Your Apps Neat & Tidy With RequireJS

RequireJS is a very handy tool for loading files and modules in JavaScript. A short time ago I used it to add a feature to Whiskey Militia that promoted a new section of the site. By developing the feature as a RequireJS module, I was able to keep all of its JavaScript, HTML and CSS files neatly organized. Another benefit to this approach was the ability to turn the new feature "on" or "off" on the site by editing a single line of code. In this post I'll run through a similar example to demonstrate how you could use RequireJS to improve your next project.

File Structure

The following is the file structure I used for this project:
├── index.html
└── scripts
    ├── main.js
    ├── my
    │   ├── module.js
    │   ├── styles.css
    │   └── template.html
    ├── require-jquery.js
    ├── requirejs.mustache.js
    └── text.js

The dependencies included RequireJS bundled together with jQuery, mustache.js for templates and the RequireJS text plugin to include my HTML template file.

Configuration

RequireJS is included in the page with a script tag and the data-main attribute is used to specify additional files to load. In this case "scripts/main" tells RequireJS to load the main.js file that resides in the scripts directory. Require will load the specified files asynchronously. This is what index.html looks like:

<!DOCTYPE html>
<html>
<head>
<title>RequireJS Example</title>
</head>
<body>
<h1>RequireJS Example</h1>
<!-- This is a special version of jQuery with RequireJS built-in -->
<script data-main="scripts/main" src="scripts/require-jquery.js"></script>
</body>
</html>

I was a little skeptical of this approach working on older versions of Internet Explorer so I tested it quickly with IE6 and confirmed that it did indeed work just fine.

Creating a Module

With this in place, we can create our module. The module definition begins with an array of dependencies:

define([
  "require",
  "jquery",
  "requirejs.mustache",
  "text!my/template.html"
  ],

This module depends on require, jQuery, mustache, and our mustache template. Next is the function declaration where our module's code will live. The arguments specified allow us to map variable names to the dependencies listed earlier:

  function(require, $, mustache, html) { ... }

In this case we're mapping the $ to jQuery, mustache to requirejs.mustache and, html to our template file.

Inside the module we're using Require's .toUrl() function to grab a URL for our stylesheet. While it is possible to load CSS files asynchronously just like the other dependencies, there are some issues that arise that are specific to CSS files. For our purposes it will be safer to just add a <link> element to the document like so:

  var cssUrl = require.toUrl("./styles.css");
  $('head').append($('',
    { rel: "stylesheet", media: "all", type: "text/css", href: cssUrl }));

Next, we define a view with some data for our Mustache template and render it.

  var view = {
    products: [
      { name: "Apples", price: 1.29, unit: 'lb' },
      { name: "Oranges", price: 1.49, unit: 'lb'},
      { name: "Kiwis", price: 0.33, unit: 'each' }
    ],
    soldByPound: function(){
      return (this['unit'] === 'lb') ? true : false;
    },
    soldByEach: function() {
      return (this['unit'] === 'each') ? true : false;
    }
  }

  // render the Mustache template
  var output = mustache.render(html, view);

  // append to the HTML document
  $('body').append(output);
});

The Template

I really like this approach because it allows me to keep my HTML, CSS and JavaScript separate and also lets me write my templates in HTML instead of long, messy JavaScript strings. This is what our template looks like:

<ul class="hot-products">
{{#products}}
<li class="product">
{{name}}: ${{price}} {{#soldByEach}}each{{/soldByEach}}{{#soldByPound}}per lb{{/soldByPound}}
</li>
{{/products}}
</ul>

Including the Module

To include our new module in the page, we simply add it to our main.js file:

require(["jquery", "my/module"], function($, module) {
    // jQuery and my/module have been loaded.
    $(function() {

    });
});

When we view our page, we see that the template was was rendered and appended to the document:
Require rendered

Optimizing Your Code With The r.js Optimizer

One disadvantage of keeping everything separate and using modules in this way is that it adds to the number of HTTP requests on the page. We can combat this by using the the RequireJS Optimizer. The r.js script can be used a part of a build process and runs on both node.js and Rhino. The Optimizer script can minify some or all of your dependencies with UglifyJS or Google's Closure Compiler and will concatenate everything into a single JavaScript file to improve performance. By following the documentation I was able to create a simple build script for my project and build the project with the following command:

node ../../r.js -o app.build.js

This executes the app.build.js script with Node. We can compare the development and built versions of the project with the Network tab in Chrome's excellent Developer Tools.

Development Version:
Webapp devel

Optimized with the RequireJS r.js optmizer:
Webapp built

It's great to be able to go from 8 HTTP requests and 360 KB in development mode to 4 HTTP requests and ~118 KB after by running a simple command with Node! I hope this post has been helpful and that you'll check out RequireJS on your next project.

 

Vim - working with encryption

On occasion I have to work with encrypted files for work or personal use. I am partial to a Linux environment and I prefer Vim as my text editor, even when I am only reading a file. Vim supports quite a few different ways of interfacing with external encryption packages. I only use two of those variations as described below.

Vim comes packaged with a default encryption mechanism referred to as VimCrypt in the documentation. I typically use this functionality as a temporary solution in a situation where my GPG is not immediately available, like a remote system that is not mine.

Using Vim's default VimCrypt feature

Creating a new encrypted file or open a plain text file you wish to encrypt:
vim -x 

This will create a new file if it does not exist or open an existing file and then prompt you for a password. This password is then used as the key to encrypt and decrypt the specified file. Upon saving and exiting this file, it will be saved in this encrypted format using your crypt key.

You can also save and encrypt an open file you are currently working on like so. Please note this is a capital X:

:X 
This will also ask you for a password to encrypt the file.

Reasons I usually don't use this option:
  • Vim uses a weak encryption method by default. Vim encrypts the file using an encryption method 'zip', the same encryption algorithm that is used by Pkzip (known to be flawed). You can set the default encryption to use the more secure 'blowfish' cipher. For more information see the documentation.
    Set the cryptmethod to use the blowfish cipher
    :setlocal cm=blowfish 
    
    Documentation on Vim Encryption
    :h :X
    
  • Typically uses swap files that can compromise the security of the encrypted file. You can turn this off by using the 'n' flag.
    vim -xn <filename>
    

Integrating with GPG

In order to seamlessly integrate with GPG encrypted files you will need to add the following to your .vimrc file

" Transparent editing of gpg encrypted files.
" By Wouter Hanegraaff
augroup encrypted
  au!

  " First make sure nothing is written to ~/.viminfo while editing
  " an encrypted file.
  autocmd BufReadPre,FileReadPre *.gpg set viminfo=
  " We don't want a swap file, as it writes unencrypted data to disk
  autocmd BufReadPre,FileReadPre *.gpg set noswapfile

  " Switch to binary mode to read the encrypted file
  autocmd BufReadPre,FileReadPre *.gpg set bin
  autocmd BufReadPre,FileReadPre *.gpg let ch_save = &ch|set ch=2
  " (If you use tcsh, you may need to alter this line.)
  autocmd BufReadPost,FileReadPost *.gpg '[,']!gpg --decrypt 2> /dev/null

  " Switch to normal mode for editing
  autocmd BufReadPost,FileReadPost *.gpg set nobin
  autocmd BufReadPost,FileReadPost *.gpg let &ch = ch_save|unlet ch_save
  autocmd BufReadPost,FileReadPost *.gpg execute ":doautocmd BufReadPost " . expand("%:r")

  " Convert all text to encrypted text before writing
  " (If you use tcsh, you may need to alter this line.)
  autocmd BufWritePre,FileWritePre *.gpg '[,']!gpg --default-recipient-self -ae 2>/dev/null
  " Undo the encryption so we are back in the normal text, directly
  " after the file has been written.
  autocmd BufWritePost,FileWritePost *.gpg u
augroup END
Source: Vim Wiki - Encryption

This works by detecting the extension on the files you are opening with Vim. This allows you to open, edit, and save files as if they were plain text in a seamless fashion.

Now you can create a new GPG encrypted file or edit an existing GPG encrypted doing this:

vim <filename>.gpg

This should prompt you for GPG password either with a GUI window or command line depending on your environment's configuration. If the file did not exist Vim creates one and when you save it will encrypt when you write and quit.

Another thing I like to do is save a file that I have already decrypted, but want to save in plain text. This can be done by simply opening the encrypted GPG file as seen above and change the extension when saving. Simply save like so:

:w <newfilename>.txt
Any extension other than .gpg will save your file as plain text.

Reason to use GPG
  • Much safer encryption as it uses GPG
  • No swap files thanks to this line in .vimrc autocmd BufReadPre,FileReadPre *.gpg set noswapfile

Of course when using Vim there are many features and many different ways of doing this. This is simply how I use Vim to easily work with encrypted files in my daily life.

For more information on Vim and external encryption programs please see and making use of GPG:



Points of Interest

It's been a fairly straight forward week at work, but I have stumbled a few interesting finds along the way this week.

Vim Adventures

Finally! A game based approach to learning Vim keyboard commands. I was hoping someone would do this. It's just getting started (only two levels) and sadly, it looks like it'll be charging money to unlock higher levels. However, some things are worth paying for. I've found just playing the first two levels a few times have helped retrain my brain to not take my fingers off the home row. It's still quite buggy and seems to only work in Chrome. I found several times I needed to close all my Chrome windows after playing. Also, incognito mode seems to help with the bugs, as it disables all extensions you may have installed.

MySQL query comments in Rails

Ever wanted to know where that slow query was being called from? Well, if you're using MySQL with your Rails 2.3.x or 3.x.x app, you can get debug information about what controller's action made the call. Check out 37Signals new marginalia gem.

How to use EC2 as a web proxy

Kevin Burke provides a very detailed HOWTO article for working around restrictions you may experience in the course of an Internet based life. Pretty amazing what Amazon's free usage tier puts out there; of course it's only free for 12 months.

Include PIDs in your Logs

For many Rails developers we get comfortable looking at development log files. Sometimes when I have to investigate a customer issue on a production server using logs, I wished I had the level of detail the development logger had. While that's a wish, I'm finding it mandatory to include PID numbers in my production logs. In production systems with multiple requests being handled simultaneously, Rails logs start to become unusable. It's not clear which log lines are from which requests. Adding a PID in front of the time stamp can help untangle the mess. Here are some example approaches to this for Rails 3.x.x and Rails 2.3.x. Also, if you're really a log-lover and manage a lot of servers, check out Papertrail, it looks very impressive for $7/mo.

Spectrum Shortages - Why it's happening and what can be done

Telecom is not an area I have much familiarity with, but I found this article to be an interesting read. For example did you know that that largest owner of spectrum licenses are "under-capitalized or unwilling to build out networks" to use the spectrum? So while AT&T and Verizon struggle to meet the iPhone 4S's data demands (twice as much as iPhone 4!), "there are some companies that have spectrum, but they're struggling financially. Or they aren't quite sure what to do with the spectrum. And others that have the money and business model, but need the spectrum." It seems the way out of the mess is 4G, offering to improve the efficiency of spectrum use by 700 percent.

SELinux Local Policy Modules

If you don't want to use SELinux, fair enough. But I find many system administrators would like to use it but get flustered at the first problem it causes, and disable it. That's unfortunate, because often it's simple to customize SELinux policy by creating what's known as a local policy module. That way you allow the actions you need while retaining the added security SELinux brings to the system as a whole.

A few years ago my co-worker Adam Vollrath wrote an article on this same subject for Red Hat Enterprise Linux (RHEL) 5, and went into more detail on SELinux file contexts, booleans, etc. I recently went through the process of building an SELinux local policy module on a RHEL 6 mail server and found a few differences and want to document some of the details here. This applies to RHEL 5 and RHEL 6, and near relatives CentOS, Scientific Linux, et al.

When under pressure …

If you're tempted to disable SELinux, consider leaving it on, but in "permissive" mode. That will leave it running but stop it from blocking disallowed actions until you have time to deal with them properly. It's as simple as:

setenforce 0

That will last until you reboot, unless otherwise changed manually. You can edit /etc/sysconfig/selinux and set:

SELINUX=permissive

To keep permissive mode even after a reboot. To see what mode SELinux is in, you can do either of:

getenforce
# or
cat /selinux/enforce

Prerequisites

First make sure you have installed:

yum install policycoreutils
yum install policycoreutils-python   # also needed on RHEL 6

You must have SELinux enabled, though enforcing isn't required; permissive mode is fine. If it's not enabled, edit /etc/sysconfig/selinux for permissive mode and reboot.

You'll need an up-to-date file /var/lib/sepolgen/interface_info, which is created by /usr/sbin/sepolgen-ifgen for the specific machine you're running it on. That should be done automatically, but be aware of it in case it somehow got stale. If you run into any unexpected problems, make sure the timestamp on interface_info is recent, or just regenerate it, which is harmless.

Making the policy module

Choose a unique name for your local policy module. It's better to use something specific to your organization, or the hostname, rather than just "postfix" or "dovecot" or something similar which may conflict with existing vendor policy modules.

Run semodule -l to list the existing modules. For this example I'll use "epmail".

Create a directory for your new policy module:

mkdir -p /root/local-policy-modules/epmail
cd /root/local-policy-modules/epmail

Copy relevant error messages verbatim from /var/log/audit/audit.log to a new file. Here for example are two denials of a script called by Postfix as a transport agent, which needed to connect to PostgreSQL locally:

type=AVC msg=audit(1335581974.308:69047): avc:  denied  { write } for  pid=14649 comm=F9616121202873696E676C65206D65 name=".s.PGSQL.5432" dev=sda2 ino=79924 scontext=system_u:system_r:postfix_pipe_t:s0 tcontext=system_u:object_r:postgresql_tmp_t:s0 tclass=sock_file
type=AVC msg=audit(1335581974.308:69047): avc:  denied  { connectto } for  pid=14649 comm=F9616121202873696E676C65206D65 path="/tmp/.s.PGSQL.5432" scontext=system_u:system_r:postfix_pipe_t:s0 tcontext=system_u:system_r:postgresql_t:s0 tclass=unix_stream_socket

In the logs you want to look for "AVC", which stands for Access Vector Cache and is how SELinux logs denials. You can grab all the recent denials with:

grep ^type=AVC /var/log/audit/audit.log > epmail.log

and then filter it manually to contain just what you need.

You can see a usually more informative explanation of each error by piping it into audit2why:

audit2why < epmail.log

Now you're ready to create your policy module:

audit2allow -m epmail < epmail.log > epmail.te
checkmodule -M -m -o epmail.mod epmail.te
semodule_package -o epmail.pp -m epmail.mod
semodule -i epmail.pp

That's a somewhat longwinded way to do things, but that's how I learned it from my co-worker Kiel, and it's easy once put into a script. See the man page of each program for more details on what that step is doing, and various options.

A more streamlined way that has audit2allow performing the functions of checkmodule and semodule_package is:

audit2allow -M $module_name -R -i epmail.log
semodule -i epmail.pp

Wrap-up

You will of course need to keep an eye on the audit log to look for any more AVC denials, as you exercise all the functions of the system. For a production system it may be best to leave SELinux permissive for a few weeks, and once you're confident you've allowed all the actions needed, you can switch it to enforcing mode.

Finally, I have not normally had to do this, but if you need to force reload the SELinux policy on the server, you can do it with:

semodule -R

Have fun with the extra security SELinux offers!

Three Things: Rails, JOIN tip, and Responsiveness

Here's another entry in my Three Things series, where I share a few small tips I've picked up lately.

1. Rails and Dramas

Sometimes I think that since Rails allows you write code efficiently, [a few] members of the Rails community have time to overdramatize incidents that otherwise would go relatively unnoticed :) Someone with a good sense of humor created this website to track these dramas. While it's probably a waste of time to get caught up on the personal aspects of the drama, some of the dramas have interesting technical aspects which are fiercely defended.

2. JOIN with concat

Recently I needed to perform a JOIN on a partial string match in MySQL. After some investigation, I found that I had use the CONCAT method in a conditional (in an implicit inner JOIN), which looked like this:

SELECT * FROM products p, related_items ri WHERE concat(p.sku, '%') = ri.id

In modern MVC frameworks with ORMs, databases are typically not designed to include data associations in this manner. However, in this situation, data returned from a third party service in a non-MVC, ORM-less application was only a substring of the original data. There may be alternative ways to perform this type of JOIN, and perhaps my fellow database experts will comment on the post with additional techniques ;)

3. Responsiveness

Responsive web design is all the rage lately from the increase in mobile web browsing and tablets. Here is a fun tool that that the ecommerce director at Paper Source pointed out to me recently. The website allows you to render a single URL at various widths for a quick review of the UI.

UTOSC Recap

I spent three days last week attending the Utah Open Source Conference, in company with Josh Ausborne and Jon Jensen. Since End Point is a "distributed company", I'd never met Josh Ausborne before, and was glad to spend a few days helping and learning from him as we demonstrated the Liquid Galaxy he has already written about.

This time around, the conference schedule struck me as being particularly oriented toward front-end web development. The talks were chosen based on a vote taken on the conference website, so apparently that's what everyone wanted, but front end stuff is not generally my cup of tea. That fact notwithstanding, I found plenty to appeal to my particular interests, and a number of talks I didn't make it to but wished I had.

I delivered two talks during the conference, the first on database constraints, and the second on Google Earth and the Liquid Galaxy as they apply to geospatial visualization (slides here and here, respectively). Though I couldn't get past the feeling that my constraints talk dragged quite a bit, it was well received. Where possible I kept it as database-agnostic as possible, but no talk on the subject would be complete without mentioning PostgreSQL's innovative exclusion constraints. Their applicability to scheduling applications, by easily preventing things like overlapping time ranges, seemed particularly interesting to one attendee with recent experience writing such an application. Should I have opportunity to deliver the talk again, it will definitely include more examples of some of the more overlooked constraint types, as well as a more detailed description of the surrogate vs. natural keys, which generated quite a bit of discussion after I mentioned it in passing.

My mapping talk was less enthusiastically attended, which may well be due to the topic or the speaker, but it was also scheduled at 6:00 PM, in the last slot of the day, and I expect many attendees had gone home. UTOSC features an unusually high number of attendees with young families, compared to most conferences I've attended, and clears out relatively rapidly toward evening. The last day's tracks tend to be family-focused specifically because of all the parents who want to bring their children, and included hands-on labs, board game sessions, and child-friendly demonstrations.

Sparse attendance notwithstanding, I enjoyed introducing my audience to Google Earth's KML language, the Kamelopard library I've been working on to facilitate making KML, and some of the applications of Google Earth for visualization. We moved the Liquid Galaxy from our display booth to the classroom for my presentation; I expect it was one of the more involved demonstrations in any talk, and certainly deserves honorable mention for being a live demo that actually worked.

Monitoring many Postgres files at once with tail_n_mail

This post discusses version 1.25.0 of tail_n_mail, which can be downloaded at http://bucardo.org/wiki/Tail_n_mail

One of our clients recently had one of their Postgres servers crash. In technical terms, it issued a PANIC because it tried to commit a transaction that had already been committed. We are using tail_n_mail for this client, and while we got notified six ways to Sunday about the server being down (from Nagios, tail_n_mail, and other systems), I was curious as to why the actual PANIC had not gotten picked up by tail_n_mail and mailed out to us.

The tail_n_mail program at its simplest is a Perl script that greps through log files, finds items of interest, and mails them out. It does quite a bit more than that, of course, including normalizing SQL, figuring out which log files to scan, and analyzing the data on the fly. This particular client of ours consolidates all of their logs to some central logging boxes via rsyslog. For the host in question that issued the PANIC, we had two tail_n_mail config files that looked like this:

## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Fri Apr 27 18:00:01 2012
MAILSUBJECT: Groucho fatals: NUMBER

INHERIT: tail_n_mail.fatals.global.txt

FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-err.log
LASTFILE: /var/log/2012/groucho/04/27/18/pgsql-err.log
OFFSET: 10199
## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Fri Apr 27 18:00:01 2012
MAILSUBJECT: Groucho fatals: NUMBER

INHERIT: tail_n_mail.fatals.global.txt

FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-warning.log
LASTFILE: /var/log/2012/groucho/04/27/18/pgsql-warning.log
OFFSET: 7145

The reason for two files was that rsyslog was splitting the incoming Postgres logs into multiple files. Which is normally a very handy thing, because the main file, pgsql-info.log, is quite large, and it's nice to have the mundane things filtered out for us already. Because rsyslog also splits things based on the timestamp, we don't give it an exact file name, but use a POSIX template instead, e.g. /var/log/apps/%Y/groucho/%m/%d/%H/pgsql-warning.log. By doing this, tail_n_mail knows where to find the latest file. It also uses the LASTFILE and OFFSET to know exactly where it stopped last time, and then walks through all files from LASTFILE until the current one.

So why did we miss the PANIC? Because it was in a heretofore unseen and untracked log file known as pgsql-crit.log. (Which goes to show how rarely Postgres crashes: this was the first time in well over 700,000 log files generated that a PANIC had occurred!) At this point, the solution was to either create yet another set of config files for each host to watch for and parse any pgsql-crit.log files, or to give tail_n_mail some more brains and allow it to handle multiple FILE entries in a single config file. Obviously, I chose the latter.

After some period of coding, testing, debugging, and caffeine consumption, a new tail_n_mail was ready. This one (version 1.25.0) allows multiple values of the FILE parameter inside of a single config. Thus, for the above, I was able to combine everything into a single tail_n_mail config file like so:

MAILSUBJECT: Groucho fatals: NUMBER

INHERIT: tail_n_mail.fatals.global.txt

FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-warning.log
FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-err.log
FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-crit.log

The INHERIT file is a way of keeping common config items in a single file: in this case, groucho and a bunch of other similar hosts all use it. It contains the rules on what tail_n_mail should care about, and looks similar to this:

## Global behavior for all "fatals" configs
EMAIL: acme-alerts@endpoint.com
FROM: postgres@endpoint.com
FIND_LINE_NUMBER: 0
STATEMENT_SIZE: 3000
INCLUDE: FATAL:
INCLUDE: PANIC:
INCLUDE: ERROR:

## Client specific exceptions:
EXCLUDE: ERROR:  Anvils cannot be delivered via USPS
EXCLUDE: ERROR:  Jetpack fuel quantity missing
EXCLUDE: ERROR:  Iron Carrots and Giant Magnets must go to different addresses
EXCLUDE: ERROR:  Rocket Powered Rollerskates no longer available

## Postgres excceptions:
EXCLUDE: ERROR:  aggregates not allowed in WHERE clause
EXCLUDE: ERROR:  negative substring length not allowed
EXCLUDE: ERROR:  there is no escaped character
EXCLUDE: ERROR:  operator is not unique
EXCLUDE: ERROR:  cannot insert multiple commands into a prepared statement
EXCLUDE: ERROR:  value "\d+" is out of range for type integer
EXCLUDE: ERROR:  could not serialize access due to concurrent update

Thus, we only have one file per host to worry about, in addition to a common shared file across all hosts. So now tail_n_mail can handle multiple files over a time dimension (by walking forward from LASTFILE to the present), as well as over a vertical dimension (by forcing together the files split by rsyslog). However, there is no reason we cannot handle multiple files over a horizontal dimension as well. In other words, putting multiple hosts into a single file. In this client's case, there were other hosts very similar to "groucho" that had files we wanted to monitor. Thus, the config file was changed to look like this:

MAILSUBJECT: Acme fatals: NUMBER

INHERIT: tail_n_mail.fatals.global.txt

FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-warning.log
FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-err.log
FILE: /var/log/%Y/groucho/%m/%d/%H/pgsql-crit.log

FILE: /var/log/%Y/dawson/%m/%d/%H/pgsql-warning.log
FILE: /var/log/%Y/dawson/%m/%d/%H/pgsql-err.log
FILE: /var/log/%Y/dawson/%m/%d/%H/pgsql-crit.log

FILE: /var/log/%Y/cosby/%m/%d/%H/pgsql-warning.log
FILE: /var/log/%Y/cosby/%m/%d/%H/pgsql-err.log
FILE: /var/log/%Y/cosby/%m/%d/%H/pgsql-crit.log

We've just whittled nine config files down to a single one. Of course, the config file cannot stay like that, as the LASTFILE and OFFSET entries need to be applied to specific files. Thus, when tail_n_mail does its first rewrite of the config file, it will assign numbers to each FILE, and the file will then look something like this:

FILE1: /var/log/%Y/groucho/%m/%d/%H/pgsql-warning.log
LASTFILE1: /var/log/2012/groucho/04/27/18/pgsql-warning.log
OFFSET1: 100


FILE2: /var/log/%Y/groucho/%m/%d/%H/pgsql-err.log
LASTFILE2: /var/log/2012/groucho/04/27/18/pgsql-err.log
OFFSET2: 2531

FILE3: /var/log/%Y/groucho/%m/%d/%H/pgsql-crit.log

FILE4: /var/log/%Y/dawson/%m/%d/%H/pgsql-warning.log
LASTFILE4: /var/log/2012/dawson/04/27/18/pgsql-warning.log
OFFSET4: 42

# etc.

By using this technique, we were able to reduce a slew of config files (the actual number was around 60), and their crontab entries, into a single config file and a single cron call. We also have a daily "error" report that mails a summary of all ERROR/FATAL calls in the last 24 hours. These were consolidated into a single email, rather than the half dozen that appeared before.

While tail_n_mail has a lot of built-in intelligence to handle Postgres logs, it is ultimately regex-based and can be used on any files which you want to track and receive alerts when certain items appear inside of them, so feel free to use it for more than just Postgres!

Inherit an application by rewriting the test suite

One of my first tasks at End Point was to inherit a production application from the lead developer who was no longer going to be involved. It was a fairly complex domain model and had passed through many developers' hands on a tight client budget. Adding to the challenge was the absence of any active development; it's difficult to "own" an application which you're not able to make changes to or work with users directly. Moreover, we had a short amount of time; the current developer was leaving in just 30 days. I needed to choose an effective strategy to understand and document the system on a budget.

Taking Responsibility

At the time I was reading Robert C. Martin's The Clean Coder, which makes a case for the importance of taking responsibility as a "Professional Software Developer". He defines responsibility for code in the broadest of terms.

Drawing from the Hippocratic oath may seem arrogant, but what better source is there? And, indeed, doesn't it make sense that the first responsibility, and first goal, of an aspiring professional is to use his or her powers for good?

From there he continues to expound in his declarative style about how to do no harm to the function and structure of the code. What struck me most about this was his conclusions about the necessity of testing. The only way to do no harm to function is know your code works as expected. The only way to know your code works is with automated tests. The only way to do no harm to structure is by "flexing it" regularly.

The fundamental assumption underlying all software projects is that software is easy to change. If you violate this assumption by creating inflexible structures, then you undercut the economic model that the entire industry is based on.

In short: You must be able to make changes without exorbitant costs.

The only way to prove that your software is easy to change is to make easy changes to it. Always check in a module cleaner than when you checked it out. Always make some random act of kindness to the code whenever you see it.

Why do most developers fear to make continuous changes to their code? They are afraid they'll break it! Why are they afraid to break it? Because they don't have tests.

It all comes back to the tests. If you have an automated suite of tests that covers virtually 100% of the code, and if that suite of tests can be executed quickly on a whim, then you simply will not be afraid to change the code.

Test Suite for the Triple Win

Fortunately, there was a fairly large test suite in place for the application, but as common with budget-constrained projects, the tests didn't track the code. There were hundreds of unit tests, but they weren't even executable at first. After just a few hours of cleaning out tests for classes which no longer existed, I found about half of the 500 unit tests passed. As I worked through repairing the tests, I was learning the business rules, classes, and domain of the application, all without touching "production" code (win). These tests were the documentation that future developers could use to understand the expected behavior of the system (double win). While rebuilding the tests, I got to document bugs, deprecation warnings, performance issues and general code quality issues (triple win).

By the end of my 30 day transition, I had 500+ passing unit tests that were more complete and flexible than before. Additionally, I added 100+ integration tests which allowed me to exercise the application at a higher level. Not only was I taking responsibility for the code, I was documenting important issues for the client and myself. This helps the client feel I had done my job transitioning responsibilities. This trust leaves the door open to further development, which means a better system over the long haul.

Problem with CISCO VPN on Ubuntu 12.04

A couple of days ago I had to change my notebook. I installed Ubuntu 12.04 on the new one, while on the previous one there was Ubuntu 11.10. There were no problems with copying all the files from the old to the new machine, including GPG and SSH keys. Everything went smoothly and I could connect to all the machines I needed.

The only problem was with VPN. While working for one of our clients, I need to connect to their VPN. On the old machine I did that through the Network Manager. Nothing easier, I went to the Network Manager, chose the Export option and saved all the settings to a file. I copied the file to the new computer and loaded it into the Network Manager.

The file loaded correctly. I could switch the VPN on. It said everything works. But in fact it didn't. The message was "VPN is connected", I could switch it on and off, but I couldn't access any of the client's resources available from my previous notebook.

The first thing I checked was the content of /etc/resolv.conf on both computers. The file without connecting to VPN looked like this on both computers:

$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 127.0.0.1

When I connected to the VPN the files on both computers were quite different. For example on my new computer (and Ubuntu 12.04) the content of the file looked like this:

$ cat /etc/resolv.conf 
# Generated by NetworkManager
domain something.net
search something.net
nameserver 127.0.0.1

I changed the data a little bit of course, so the domain names and IP addresses (except for 127.0.0.1) are not real.

On my old computer the resolv.conf file had a lot more entries, however I thought the above file should work as well. The problem was still the same: I couldn't connect to the client's resources.

The client is using the CISCO VPN, so I had to install network-manager-vpnc. This is just a plugin for network-manager which uses the vpnc program internally. I thought that maybe the plugin was doing something wrong.

I checked the plugin versions. Yes, they really differ. I started thinking about using the program without the Network Manager.

It turned out to be very simple to use. I need just a config file. The file is really simple:

IPSec gateway   something.net
IPSec ID        something.id
IPSec secret    somethingpass
Xauth username  mylogin
Xauth password  mypass

I keep all my local scripts in ~/bin (which can also be accessed as /home/szymon/bin). The directory ~/bin is added to the PATH environment variable. This way I can access all the scripts placed there in the console without providing the whole path. I did it by adding the following line at the end of my local ~/.bashrc file.

PATH=$PATH:$HOME/bin

To keep the things together I saved the config file at the same location ~/bin/vpn.conf.

Now I can connect to the VPN using:

$ sudo vpnc-connect /home/szymon/bin/vpn.conf

I can also stop the VPN using:

$ sudo vpnc-disconnect

To automate it a little bit I created a simple script stored at ~/bin/vpn:

#!/usr/bin/env bash


case "$1" in

start)
  sudo vpnc-connect /home/szymon/bin/vpn.conf
  ;;
stop)
  sudo vpnc-disconnect
  ;;
status)
  ps uaxf | grep vpnc-connect | grep -v grep
  ;;
restart)
  sudo vpnc-disconnect
  sudo vpnc-connect /home/szymon/bin/vpn.conf
  ;;
*)
  echo "Usage: vpn (start|stop|status|restart)"
  exit 1
  ;;

esac

This way I can simply write:

$ vpn start
[sudo] password for szymon: 
VPNC started in background (pid: 13771)...

I noticed that now the /etc/resolv.conf file contains different entries than when using the Network Manager plugin:

$ cat /etc/resolv.conf 
#@VPNC_GENERATED@ -- this file is generated by vpnc
# and will be overwritten by vpnc
# as long as the above mark is intact
# Generated by NetworkManager
nameserver 1.2.3.4
nameserver 1.2.3.4
search something.net

I can also disconnect from the VPN with simple command:

$ vpn stop
Terminating vpnc daemon (pid: 13771)

I'm using this script for a couple of days and I don't have any problems with the CISCO VPN. It seems like the vpnc program in Ubuntu 12.04 is OK, however there is something wrong with the Network Manager plugin for vpnc.

End Point at the Utah Open Source Conference

End Point had a table at the Utah Open Source Conference at Utah Valley University this week. We implemented a "mini Liquid Galaxy" system for the event, and it was a big hit. Most of the other sponsors were offering services or recruiting, but End Point offered a physical item which people could touch and engage with. This allowed us to present our product and services to people, as well as make contact with people who may be interested in joining our team.




Numerous people were really excited about the Liquid Galaxy. The most common thing that people did first when they started using it was to look for their own homes, but they quickly moved onto other areas that contained more 3D building content. They asked questions about the practical application of the system, as well how well it would play games. Multiple other people asked about the ability to build video walls with the system. One of the biggest things that people found interesting about the hardware was the 3D mouse from 3Dconnexion. It took some visitors a while to get the hang of it, but others picked it up quickly and found themselves really liking the way it interacts.



The mini LG consisted of a headless head node, three display nodes powering five 24" inch HP monitors, and one HP touch screen being used as a control panel. The head node was a Dell Inspiron 531s with a 2.8GHz AMD Athlon, 4GB of RAM, and a 160GB hard drive. The display nodes are Puget Systems-built with 2.6GHz i5 processors, 8GB of RAM, 80GB SSDs, and GeForce 210 video cards.


One of the things that we had to do differently with this system, as opposed to our normal installs, was to flatten out the semicircle of displays. The HP displays' viewing angle wasn't very wide, and was preventing people from being able to view the system unless they were right up near the displays. By flattening it out, more people were able to view the system and have their interest captured.


One thing that we also noticed during the event is that a Liquid Galaxy system performs very well when it has 60 Mbps of bandwidth available to it.




There was even a kid who found a way to keep himself entertained with the Liquid Galaxy. He found his house and started using the LG to "jump" on the trampoline in his backyard.

Instance Variable Collision with ActsAsTaggableOn

As developers, a lot of what we do is essentially problem solving. Sometimes it's a problem of how to implement a specific feature. Sometimes it's a problem *with* a specific feature. Last week, I ran into a case of the latter in some relatively mature code in the Rails app I was working on.

I was getting a sporadic exception while trying to save an instance of my StoredFile model. I encountered the problem while implementing a pretty trivial boolean field in my model, while I was playing around with it in the rails console. This is where it gets a little weird.

The exception message:

#<NoMethodError: undefined method 'uniq' for "":String>
#Backtrace:
... acts-as-taggable-on-2.2.2/lib/acts_as_taggable_on/acts_as_taggable_on/core.rb:264:in 'block in save_tags'
...rest of backtrace...

Note that none of my work was related to my model's use of acts_as_taggable_on. I looked briefly at line 264 and its cohorts in core.rb, but nothing jumped out as "a giant, obvious bug in acts-as-taggable-on" (which I wouldn't expect.) Also, the actual error is a bit suspicious. I love duck typing (and ducks) as much as anyone, but it's pretty rare to see well traveled code try to call uniq() on a String. An empty array, sure, no problem. I'll call uniq() on an empty array all day - but a String? Madness.

In the absence of any obvious answers, I switched to serious debug mode. I remembered that I ran into this same issue briefly about two weeks ago, but it had fixed itself. Then I remembered that this app is not Skynet and is therefore incapable of "fixing itself." OK, so there's been a bug lurking for a while. I was going to need a code debugging montage to get to the bottom of this. I fired up some techno music and the blue mood lighting in my office, and got down to debugging.

By the end of the debugging montage, I had determined that my StoredFile model was overriding the "tag_list" method supplied by ActsAsTaggableOn, in order to return the tags for this StoredFile instance regardless of which User owned the tags. Here's our entire method:

def tag_list
    @tag_list ||= self.anonymous_tag_list(:tags)
end

Can you guess where the issue might be with this code? Hint: It's the @tag_list instance variable. We're actually only using it here as a lazy way to cache the return value of self.anonymous_tag_list(:tags) during the lifespan of this single instance. I came to discover that ActsAsTaggableOn defines and uses an instance variable of the same exact name within the scope of my model. So, my tag_list method was assigning "" to the @tag_list instance variable for a StoredFile instance with zero tags, and that was trampling the [] that ActsAsTaggableOn would expect in that scenario. Hence, the ill-fated attempt to call "".uniq().

As a bonus, because it's an instance variable assigned inside a method, it would only get populated/trampled by my StoredFile model if an instance's tag_list was examined in any way. A sort of reverse Heisenbug, if you will. Renaming @tag_list to something else fixed the bug.

It was a problem and I solved it, which is cool. But, is this injection of instance variables considered poor behavior on ActsAsTaggableOn's part? What do you think? I'm still not sure how I feel about it, but I do know how I felt when I figured it out and fixed it:



Reverting Git Commits

Git is great, but it's not always easy to use. For example, reverting a commit is a very nice feature. There are git commands for reverting a commit which has not been pushed to the main repository. However after pushing it, things are not so easy.

While I was working for one of our clients, I made about 20 commits and then I pushed them to the main repository. After that I realised that I was working on a wrong branch. The new branch I should have used wasn't created yet. I had to revert all my commits, create the new branch, and load all my changes into it.

Creating the branch named NEW_BRANCH is as easy as:

$ git branch NEW_BRANCH

Now the harder part... how to delete the commits pushed to the main repo. After reading through tons of documentation it turned out that it is not possible. You cannot just delete a pushed commit. However you can do something else.

As an example of this, I created a simple file, added a couple of lines there, and made four commits. The git log looks like this:

$ git log
commit dc47a884f7b303fc8b207550104f5a1de192c91c
Author: Szymon Guz 
Date:   Mon Apr 30 12:14:21 2012 +0200

    replaced b with d

commit 68f56d3321324bd14cd1e73d003b1e151c4d43b4
Author: Szymon Guz 
Date:   Mon Apr 30 12:14:05 2012 +0200

    added c

commit a77427d8151f143cacb85f00eb6c8170079dc290
Author: Szymon Guz 
Date:   Mon Apr 30 12:13:58 2012 +0200

    added b

commit 73e586bb6d401f4049cf977703f25bf47c93b227
Author: Szymon Guz 
Date:   Mon Apr 30 12:13:49 2012 +0200

    added a

Now let's move the last 3 commits to another branch. I will create one diff for reverting the changes and one for replaying them on the new branch. Let's call these the 'down' and 'up' diff files: 'down' for reverting, and 'up' for recreating the changes.

The up diff can be created with:

$ git diff 73e586bb6d401f4049cf977703f25bf47c93b227 dc47a884f7b303fc8b207550104f5a1de192c91c
diff --git a/test b/test
index 7898192..3171744 100644
--- a/test
+++ b/test
@@ -1 +1,3 @@
 a
+d
+c

The down diff can be created using exactly the same command, but with switched parameters:

$ git diff dc47a884f7b303fc8b207550104f5a1de192c91c 73e586bb6d401f4049cf977703f25bf47c93b227
diff --git a/test b/test
index 3171744..7898192 100644
--- a/test
+++ b/test
@@ -1,3 +1 @@
 a
-d
-c

I saved the diffs into files called 'up.diff' and 'down.diff'.

On the old branch I want to revert the changes, after doing this I will just commit the changes and the branch will look like it was before all the commits. However all the commits stay in the branch. This something like a revert commit.

I reverted the changes on current branch with:

$ patch -p1 < down.diff 
patching file test
$ git commit -a -m "reverted the changes, moved to another branch"

Now let's move the changes into the new branch. I need to create the new branch from the repo after the first commit:

$ git branch NEW_BRANCH 73e586bb6d401f4049cf977703f25bf47c93b227

Switch to the new branch:

$ git checkout NEW_BRANCH

Apply the up.diff patch to the new branch:

patch -p1 < up.diff

And commit the changes:

$ git commit -a -m "Applied changes from the other branch"

I know that all the steps can be replaced with different ones, however this solution worked for me pretty well and without any problem.

Profile Ruby with ruby-prof and KCachegrind

This week I was asked to isolate some serious performance problems in a Rails application. I went down quite a few paths to determine how to best isolate the issue. In this post I want to document what tools worked most quickly to help find offending code.

Benchmarks

Before any work begins finding how to speed things up, we need to set a performance baseline so we can know if we are improving things, and by how much. This is done with Ruby's Benchmark class and some of Rail's Benchmark class.

The Rails guides would have you setup performance tests, but I found this cumbersome on this Rails 2.3.5 application I was dealing with. Initial attempts to set it up were unfruitful, taking time away from the task at hand. In my case, the process of setting up the test environment to reflect the production environment was prohibitively expensive, but if you can automate the benchmarks, do it. If not, use the logs to measure your performance, and keep track in a spreadsheet. Regardless of benchmarking manually or automatically, you'll want to keep some kind of log of the results keeping notes about what changed in each iteration.

Isolating the Problem

As always, start with your logs. In Rails, you get some basic performance information for free. Profiling code slows down runtime a lot. By reviewing the logs you can hopefully make a first cut at what needs to be profiled, reducing already long profile runs. For example, instead of having to profile an entire controller method, by reading the logs you might notice that it's just a particular partial which is rendering slowly.

Taking a baseline benchmark

Once you've got a sense of where the pain is, it's easy to get a benchmark for that slow code as a baseline.

module SlowModule
  def slow_method
    benchmark "SlowModule#slow_method" do
      #my slow code
    end
  end
end

Look to your log files to see results. If for some reason, you're outside your Rails enviornment, you can use Ruby's Benchmark class directly.

require 'benchmark'
result = Benchmark.ms do
  #slow code
end
puts result

This will tell you the process time in milliseconds and give you a precise measurement to compare against.

Profiling with ruby-prof

First, setup ruby-prof. Once installed, you can add these kinds of blocks around your code.

require 'ruby-prof'

module SlowModule
  def slow_method
    benchmark "SlowModule#slow_method" do
      RubyProf.start
      # your slow code here
      results = RubyProf.end
    end
    File.open "#{RAILS_ROOT}/tmp/SlowModule#slow_method_#{Time.now}", 'w' do |file|
      RubyProf::CallTreePrinter.new(results).print(file)
    end
  end
end

Keep in mind that profiling code will really slow things down. Make sure to collect your baseline both with profiling and without, to make sure you're doing apples-to-apples comparison.

By default ruby-prof measures process time, which is the time used by a process between any two moments. It is unaffected by other processes concurrently running on the system. You can review the ruby-prof README for other types of measurements including memory usage, object allocations and garbage collection time.

If you choose to measure any of these options, make sure your Ruby installation has the tools a profiler needs to collect data. Please see the Rails guides for guidance on compiling and patching Ruby.

Interpreting the Data with KCachegrind

At this point you should have a general sense of what code is slow having reviewed the logs. You've got a benchmark log setup with baseline measurements to compare to. If you're going to Benchmark while your profiling, make sure your baseline includes the profile code; it will be much slower! Remember we want an apples-to-apples comparison! You're ready to start profiling and identifying the root source of the performance problems.

After manually or automatically running your troubled code with the profiling block above, you can open up the output from ruby-prof and quickly find it not to be human friendly. Fortunately, KCachegrind turns that mess into something very useful. I found that my Ubuntu installation had a package for it already built, so installation was a breeze. Hopefully things are as easy for you. Next simply open your result files and start reviewing there results.

The image above shows what's called a "call graph" with the percentages representing the relative amount of time that method uses for the duration of the profile run. The CacheableTree#children method calls Array#collect and takes up more then 90% of the runtime. The subsequent child calls are relatively modest in proportion. It's clear we can't modify Array#collect so let's look at CacheableTree#children.

module CacheableTree
  def children(element = @root_element)
    full_set.collect { |node| if (node.parent_id == element.id)
      node
    }.compact
  end
end

Defined elsewhere, full_set is an array of Ruby objects. This is common performance optimization in Rails; collecting data looping through arrays works well with a small data set, but quickly becomes painful with a large one. It turned out in this case that full_set had 4200+ elements. Worse yet the children method was being called recusrively on each of them. Yikes!

At this point I had to decide how to optimize. I could go for broke and completely break the API and try and clean up the mess, or I could see if I could collect the data more quickly, some other way. I looked at how the full_set was defined and found I could modify that query to return a subset of elements rather easily.

module CacheableTree
  def children(element = @root_element)
    FormElement.find_by_sql(...) #the details aren't important
  end
end

By collecting the data directly via a SQL call, I was able to cut my benchmark by about 20%. Not bad for a single line change! Let's see what the next profile told us.

The above is another view of the profile KCachegrind provides. It's essentially the same information, but in table format. There were a few indicators that my optimization was helpful:

  • The total process_time cost had dropped
  • The amount of time spent in each function seemed to better distributed - I didn't have a single method soaking up all the process time
  • Most of the time was spent in code wasn't mine!
Although, we still saw 66% of time spent in the children method, we could also see that 61% of the time was spent in ActiveRecord::Base. Effectively, I had pushed the 'slowness' down the stack, which tends to mean better performance. Of course, there were LOTS of database calls being made. Perhaps some caching could help reduce the number of calls being made.

module CacheableTree
  def children(element = @root_element)
    @children ||= {}
    @children[element] ||= FormElement.find_by_sql(...) #the details aren't important
  end
end

This is called memoization and let's us reuse this expensive method's results within the page load. This method took another 10% off the clock against the baseline. Yay!

Knowing When to Stop

Performance optimization can be really fun, especially once all the infrastructure is in place. However, unless you have unlimited budget and time, you have to know when to stop. For a few lines of code changed, the client would see ~30% performance improvement. It was up to them to decide how much further to take it.

If allowed, my next step would be to make use of the applications existing dependence on redis, and add the Redis-Cacheable gem. It allows you to marshal Ruby objects in and out of a redis server. The application already makes extensive use of caching, and this page was no exception, but when the user modified the page in a way that expired the cache, we would hit this expensive method again, unnecessarily. Based on the call graph above, we could eliminate another ~66% of the call time, and perhaps, by pre-warming this cache, could help the user to minimize the chances of experiencing the pain of slow browsing!