News

Welcome to End Point’s blog

Ongoing observations by End Point people

Oceanographic Museum of Monaco Liquid Galaxy

In December End Point installed a Liquid Galaxy at the spectacular and renowned Musée Océanographique de Monaco, which is breathtakingly situated on a cliff overlooking the Mediterranean. The system, donated by Google, will be officially presented on January 21st to H.S.H. Prince Albert II of Monaco, who is the Honorary President of the Oceanographic Institute of which the museum is a major part.

https://plus.google.com/photos/116748432819762979484/albums/5824395228310553057/5824395228004170738

End Point set up and configured the system. Our expertise was also called on by Google to create and adapt Google Earth Tours focused on the world's oceans, including a tour about Ocean Acidification. In addition, End Point engineers developed a customized panoramic photo viewer for the remarkable Catlin Seaview Survey panoramas, which display and provide a baseline dataset for the earth's coral reefs.

Many thanks are due to Jenifer Austin Foulkes, Google's Ocean Program Manager, and to Jason Holt of Google for their work in supporting this project.

It is difficult to speak highly enough about the Musée Océanographique de Monaco. Prince Albert I of Monaco was an oceanographer himself and created the museum in 1901 with a vision of bringing art and ocean science together. The royal family of Monaco has maintained its support of ocean science through the years, and Monaco today is a center for oceanography and for organizations concerned with the state of the world's oceans. The museum has a wonderful aquarium, compelling exhibits, and is actively engaged in ocean science. I wasn't aware of it before visiting the museum, but Jacques Cousteau was the Director of the Museum for over thirty years. I always liked that guy. :-)

France 3 TV covered the public opening of the Musée Océanographique de Monaco's Liquid Galaxy: http://cote-d-azur.france3.fr/2012/12/22/le-musee-oceanographique-de-monaco-accueille-le-simulateur-liquid-galaxy-de-google-169527.html

It's been a tremendous privilege to be a part of this project and I am happy to close out 2012 with this blog post reflecting on it. In 2013 End Point will continue to support this wonderful addition to the museum with remote system and content updates.

Photo provided by Olivier Dufourneaud

Piggybak: End of Year Update

Over the last few months, my coworkers and I have shared several updates on Piggybak progress (October 2012 Piggybak Roadmap , November 2012 Piggybak Roadmap Status Update). Piggybak is an open source, mountable as a Rails Engine, Ruby on Rails ecommerce platform developed and maintained by End Point. Here's a brief background on Piggybak followed by an end of year update with some recent Piggybak news.

A Brief Background

Over the many years that End Point has been around, we've amassed a large amount of experience in working with various ecommerce frameworks, open source and proprietary. A large portion of End Point's recent development work (we also offer database, hosting, and Liquid Galaxy support) has been with Interchange, a Perl based open source ecommerce framework, and Spree, a Ruby on Rails based open sourced ecommerce framework. Things came together for Piggybak earlier this year when a new client project prompted the need for a more flexible and customizable Ruby on Rails ecommerce solution. Piggybak also leveraged earlier work that I did with light-weight Sinatra-based cart functionality.

Jump ahead a few months, and now Piggybak is a strong base for an ecommerce framework with several extensions to provide advanced ecommerce features. Some of the features built and already reported on were real-time shipping lookup (USPS, UPS, and FedEx support), improvement of the Piggybak installation process, gift certificate, discount, and bundle discount support.

Recent Progress

Since the last general update, we've tackled a number of additional changes:

  • SSL support: The Piggybak core now supports SSL for the checkout, which leverages the lightweight Rails gem rack-ssl-enforcer. A Piggybak config variable specifying that checkout should be secure must be set to true in the main Rails application, which triggers that a specific set of pages should be secure. This configuration is not ideal to use if the main Rails application requires more complex management of secure pages.
  • Minor bug fixes & cleanup: The updates below include minor refactoring and/or bug fixes to the Piggybak core:
    • Moved order confirmation outside of controller, to minimize failure of order processing if the email confirmation fails.
    • RailsAdmin DRY cleanup
    • Abilities (CanCan) cleanup to require less manual coding, which simplifies the code required in the CanCan model.
    • Breakdown of orders/submit.html.erb, which allows for easier override of checkout page elements.
    • Tax + coupons bug fixes.
    • RailsAdmin upgrade to updated recent versions.
  • Heroku tutorial: Piggybak support in Piggybak was described in this blog article.
  • Advanced taxonomy or product organization: An extension for advanced product organization (e.g. categories, subcategories) was released, but we still plan to add more documentation regarding its functionality and use.
  • Bundle discount support: Another extension for bundle discount support was released. Bundle discount offers the ability to give customer discounts when a bundle or set of products has been added to the cart. Barrett shared his experiences in creating this extension in this article.
  • Fancy jQuery tour: I wrote about a new Piggybak demo tour that I created for checking out the features of Piggybak.
  • Advanced product optioning: Another extension for advanced product option support (e.g. size, color) was released a couple of months ago, but this recent article provides more documentation on its functionality and use.

What's Next?

At this point, one of our big goals is to grow the Piggybak portfolio and see many of the extensions in action. We'd also like to improve the Piggybak core and extension documentation to help get folks up and running on Piggybak quickly. In addition to documentation and portfolio growth, additional features we may focus on are:

  • Product reviews & ratings support
  • Saved address/address book support
  • Wishlist, saved cart functionality

A few large features that are on our wishlist that may need client sponsorship for build-out are:

  • Multiple shipping addresses per order: This allows for users to select multiple shipping addresses per order. I implemented this functionality for Paper Source just over a year ago. This would likely be developed in the form of an extension that requires several non-trivial Piggybak core overrides.
  • Subscription support: The Piggybak Google Group has expressed interest in subscription support, which also is not trivial.
  • Point-based credit support
  • Multi-store architecture: End Point is very familiar with multi-store architecture, which allows multiple stores to be support via one code base. I shared some of the options in this blog article.
  • One deal at a time support: This is another popular feature that End Point has been involved with for Backcountry.com sites Steep and Cheap, WhiskeyMilitia.com, and Chainlove.com.

Get Involved

If you are interested in helping develop Piggybak, don't hesitate to jump on the Piggybak google group or tackle one of the Piggybak GitHub issues.

If you are interested in getting your site up and running using Piggybak, contact End Point right now!

Find your Perl in Other Shells

Often when programming, it turns out the best tools for the job are system tools, even in an excellent language like Perl. Perl makes this easy with a number of ways you can allocate work to the underlying system: backtick quotes, qx(), system(), exec(), and open(). Virtually anyone familiar with Perl is familiar with most or all of these ways of executing system commands.

What's perhaps less familiar, and a bit more subtle, is what Perl really does when handing these off to the underlying system to execute. The docs for exec() tell us the following:

       exec LIST
       exec PROGRAM LIST
[snip]
            If there is more than one argument in LIST, or if LIST is an
            array with more than one value, calls execvp(3) with the
            arguments in LIST.  If there is only one scalar argument or an
            array with one element in it, the argument is checked for shell
            metacharacters, and if there are any, the entire argument is
            passed to the system's command shell for parsing (this is
            "/bin/sh -c" on Unix platforms, but varies on other platforms).

That last parenthetical is a key element when we "shell out" and expect certain behavior. Perl is going to use /bin/sh. But I don't; I use /bin/bash. And I am very happy to ignore this divergence ... until I'm not.

Without considering any of these issues, I had leveraged a shell command to do a nifty table comparison between supposedly replicated databases to find where the replication had failed. The code in question was the following:

$ diff -u \
> <(psql -c "COPY (SELECT * FROM foo ORDER BY foo_id) TO STDOUT" -U chung chung) \
> <(psql -c "COPY (SELECT * FROM foo ORDER BY foo_id) TO STDOUT" -U chung chung2)

The above code produced exactly the results I was looking for, so I integrated it into my Perl code via qx() and ran it. Doing so produced the following surprising result:

$ ./foo.pl
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `diff -u <(psql -c "COPY (SELECT * FROM foo ORDER BY foo_id) TO STDOUT" -U chung chung) <(psql -c "COPY (SELECT * FROM foo ORDER BY foo_id) TO STDOUT" -U chung chung2)'

I worked with an End Point colleague and figured out the problem. <() is supported by bash, but not by Bourne. Further, there is no way to instruct Perl to use /bin/bash for its target shell.

In order to access a different shell and leverage the desired features, I had to use the invocation described for PROGRAM LIST, but identify the shell itself as my program. While I'm unaware of any way to accomplish this with backticks, I was certainly able to do so using Perl's open():

my $cmd = q{diff -u <(psql -c "COPY (SELECT * FROM foo ORDER BY foo_id) TO STDOUT" -U chung chung) <(psql -c "COPY (SELECT * FROM foo ORDER BY foo_id) TO STDOUT" -U chung chung2)};

    open (my $pipe, '-|', '/bin/bash', '-c', $cmd)
        or die "Error opening pipe from diff: $!";
    while (<$pipe>) {
        # do my stuff
    }
    close ($pipe);

Now results between command line and invocation from Perl are consistent. And now I understand for future reference how to control which shell I find my Perl in.

Redirect from HTTP to HTTPS before basic auth

While reviewing PCI scan results for a client I found an issue where the scanner had an issue with a private admin URL requesting basic http auth over HTTP. The admin portion of the site has its own authentication method and it is served completely over HTTPS. We have a second layer of protection with basic auth, but the issue is the username and password could be snooped on since it can be accessed via HTTP.

The initial research and attempts at fixing the problem did not work out as intended. Until I found this blog post on the subject. The blog laid out all of the ways that I had already tried and then a new solution was presented.

I followed the recommended hack which is to use SSLRequireSSL in a location matching the admin and a custom 403 ErrorDocument. This 403 ErrorDocument does a bit of munging of the URL and redirects from HTTP to HTTPS. The instructions in the blog did have one issue, in our environment I could not serve the 403 document from the admin, I had to have it in an area that could be accessed by HTTP and by the public. I'm not sure how it could work being served from a URL that requires ssl and is protected by basic auth. The reason that this hack does work is because SSLRequireSSL is processed before any auth requirements and ErrorDocument 403 is presented when SSL is not being used.

Now hopefully the scanner will be happy (as happy as a scanner can be) by always requiring HTTPS when /admin appears in the URL and presenting an error when that is not the case, before the basic auth is requested.

Announcing Ruby gem: email_verifier

How many times have you tried to provide a really nice validation solution for our fields containing user emails? Most of the time - the best we can come up with is some long and incomprehensible Regex we find on StackOverflow or somewhere else on the Internet.

But that's really only a partial solution. As much as email format correctness is a tricky thing to get right using regular expressions, it doesn't provide us with any assurance that user entered email address in reality exists.

But it does a great job at finding out some typos and misspellings.. right?

Yes - but I'd argue that it doesn't cover full range of that kind of data entry errors. The user could fill in 'whatever' and traditional validation through regexes would do a great job at finding out that it's not really an email address. But what I'm concerned with here are all those situations when I fat finger kaml@endpoint.com instead of kamil@endpoint.com.

Some would argue at this point that it's still recoverable since I can find out about the error on the next page in a submission workflow, but I don't want to spend another something-minutes on going through the whole process again (possibly filling out tens of form fields along the way).

And look at this issue from the point of view of a web application owner: You'd like to be sure that all those leads you have in your database point to some real people and that some percentage of them will end up paying you at some point real money, making you a living. What if even 10% of email addresses invalid (being valid email addresses but pointing to no real mailboxes) due to user error? What would that potentially mean to you in cash?

The Solution

Recently, I faced this email validation question for mobixa.com. (By the way. if you own a smart phone that you'd like to sell - there is no better place than mobixa.com to do it!)

The results of my work, I'd like to announce here and now. Please give a warm welcome to a newborn citizen of RubyGems society: email_verifier

How does it work?

Email verifier takes a different approach to email validation. Instead of checking just the format of given address in question - it actually tries to connect with a mail server and pretends to send a real mail message. We can call it 'asking mail server if recipient exists'.

How to use it?

Add this line to your application's Gemfile:

gem 'email_verifier'

And then execute:

$ bundle

Or install it yourself as:

$ gem install email_verifier

Some SMTP servers will not allow you to check if you will not present yourself as some real user. So first thing you'd need to set up is to put something like this either in initializer or in application.rb file:

EmailVerifier.config do |config|
  config.verifier_email = "realname@realdomain.com"
end

Then add this to your model:

validates_email_realness_of :email

Or - if you'd like to use it outside of your models:

EmailValidator.check(youremail)

This method will return true or false, or - will throw exception with nicely detailed info about what's wrong.

Read More about the extension at Email verifier RDoc or try to sell your smartphone back here at Mobixa.com.

Verify Addresses the Easy Way with SmartyStreets


Adding an address form is a pretty common activity in web apps and even more so with ecommerce web apps. Validations on forms allow us to guide the user to filling out all required fields and to make sure the fields conform to basic formats. Up until now going further with addresses to verify they actually exist in the real world was a difficult enough task that most developers wouldn't bother with it. Imagine though the cost to the merchant who ships something to the wrong state because the customer accidently selected "SD" (South Dakota) when they thought they were selecting "SC" (South Carolina), a simple enough mistake to make and one that wouldn't be caught by most address forms. In today's ecommerce world customers expect deliveries to be fast and reliable, and in this case the customer would have to wait until the package is returned to the merchant with "Address Unknown" only to have to wait even longer for the reshipment. Even worse for the merchant, maybe the package never gets returned.

SmartyStreets is a new API web app that I implemented for our client Mobixa, a web app that allows people to sell their used mobile phones. Mobixa sends shipping labels and payment checks to customers so that they can send their phones to Mobixa and get paid for it. Wrong addresses can delay the whole process and Mobixa wanted to reduce the number of bad addresses that were being keyed in by the customers. SmartyStreets provides an easy way for developers to allow Address verification to their web forms so that addresses are verified against the US Postal Service's valid addresses. SmartyStreets is CASS certified meaning that they meet the USPS accuracy requirements.

The big advantage of Smarty Streets is that adding address verification to a form can be as easy as adding a link to their jQuery based plugin and then a script tag with your SmartyStreets API key. The plugin autodetects address fields and when a minimum of 3 fields are entered (address, city, state), it will display an indicator sprite to the user and send an async request to the API for verification. The verification has three possible outcomes:

  1. The address is verified. A valid address will display a verified image to the side of the zip code and all of the address fields will be modified with a more "correct" address, with correct being defined as what matches the USPS official definition for the matched address. Zip codes will be modified with the carrier route code, so "92806" becomes "92806-3433". An address modification would for example change "1735 pheasant" to "1731 N Pheasant St.". Proper casing and spelling errors will also be enacted.

  2. The address is ambiguous. An ambiguous address is one that returns multiple matches. Let's say for example that "1 Rosedale St.", could be "1 N Rosedale Street" or "1 S Rosedale Street". In this case it displays a popup which allows the user to select the correct address or to override the suggestions and continue with the address they entered.

  3. The address is invalid. An invalid address informs the user that it's invalid and both the invalid address and the ambiguous address offer the user two additional choices. "Double checking the address" will rerun the address validation after the user has modified the address. "I certify that what I type is correct", is a second choice which allows the user to continue with the address they typed. This last choice is important because it allows the user the power to continue with what they want instead of forcing them to conform to the address validation.

Checks are performed when SmartyStreets senses that it has enough address information to run a search. During this time the submit button to the form is disabled until the check is completed. Once a check is performed once, it will not perform again unless the user elects to "Double check" the address, this is a good design choice to prevent the user from getting stuck in an infinite loop of sorts.

Our implementation of SmartyStreets into Mobixa included customizing it to do things a little bit differently than the out of the box defaults. The jQuery plugin comes with many events and hooks for adding customization and if you want to go your own way you can implement everything on the frontend save for the API call yourself. Documentation on the website is useful, and the developers of SmartyStreets conveniently answered my questions via an Olark chat window.

The costs of SmartyStreets is that you have to spend time to implement it in your app, a monthly fee based on number of API calls, and also that your UI flow will change slightly in that the user will need to wait for the API call to complete before submitting the form. I don't always implement validation when I have an address form in an app, but when I do, I like to use SmartyStreets.

SFTP virtual users with ProFTPD and Rails: Part 1

I recently worked on a Rails 3.2 project that used the sweet PLupload JavaScript/Flash upload tool to upload files to the web app. To make it easier for users to upload large and/or remote files to the app, we also wanted to let them upload via SFTP. The catch was, our users didn't have SFTP accounts on our server and we didn't want to get into the business of creating and managing SFTP accounts. Enter: ProFTPD and virtual users.

ProFTPD's virtual users concept allows you to point ProFTPD at a SQL database for your user and group authentication. This means SFTP logins don't need actual system logins (although you can mix and match if you want). Naturally, this is perfect for dynamically creating and destroying SFTP accounts. Give your web app the ability to create disposable SFTP credentials and automatically clean up after the user is done with them, and you have a self-maintaining system.

Starting from the inside-out, you need to configure ProFTPD to enable virtual users. Here are the relevant parts from our proftpd.conf:

##
# Begin proftpd.conf excerpt. For explanation of individual config directives, see the 
# great ProFTPD docs at http://www.proftpd.org/docs/directives/configuration_full.html
##
DefaultServer off
Umask 002
AllowOverwrite on

# Don't reference /etc/ftpusers
UseFtpUsers off



# Enable SFTP
SFTPEngine on

# Enable SQL based authentication
SQLAuthenticate on

# From http://www.proftpd.org/docs/howto/CreateHome.html
# Note that the CreateHome params are kind of touchy and easy to break.
CreateHome on 770 dirmode 770 uid ~ gid ~

# chroot them to their home directory
DefaultRoot ~

# Defines the expected format of the passwd database field contents. Hint: An
# encrypted password will look something like: {sha1}IRYEEXBUxvtZSx3j8n7hJmYR7vg=
SQLAuthTypes OpenSSL

# That '*' makes that module authoritative and prevents proftpd from
# falling through to system logins, etc
AuthOrder mod_sql.c*

# sftp_users and sftp_groups are the database tables that must be defined with
# the proceeding column names. You can have other columns in these tables and
# ProFTPD will leave them alone. The sftp_groups table can be empty, but it must exist.
SQLUserInfo sftp_users username passwd uid sftp_group_id homedir shell
SQLGroupInfo sftp_groups name id members

SFTPHostKey /etc/ssh/ssh_host_rsa_key
SFTPHostKey /etc/ssh/ssh_host_dsa_key

SFTPCompression delayed
SFTPAuthMethods password
RequireValidShell no

# SQLLogFile is very verbose, but helpful for debugging while you're getting this working
SQLLogFile /var/log/proftpd_sql.sql

## Customize these for production
SQLConnectInfo database@localhost:5432 dbuser dbpassword

# The UID and GID values here are set to match the user that runs our web app because our
# web app needs to read and delete files uploaded via SFTP. Naturally, that is outside
# the requirements of a basic virtual user setup. But in our case, our web app user needs
# to be able to cd into a virtual user's homedir, and run a `ls` in there. Also, note that
# setting these two IDs here (instead of in our sftp_users table) *only* makes sense if
# you are using the DefaultRoot directive to chroot virtual users.
SQLDefaultUID  510
SQLDefaultGID  500


The CreateHome piece was the trickiest to get working just right for our use-case. But there are two reasons for that; we needed our web app to be able to read/delete the uploaded files, and we wanted to make ProFTPD create those home directories itself. (And it only creates that home directory once a user successfully logs in via SFTP. That means you can be more liberal in your UI with generating credentials that may never get used without having to worry about a ton of empty home directories lying about.)

That's it for the introductory "Part 1" of this article. In Part 2, I'll show how we generate credentials, the workflow behind displaying those credentials, and our SftpUser ActiveRecord model that handles it all. In Part 3, I'll finish up by running through exactly how our web app accesses these files, and how it cleans up after it's done.

Advanced Product Options (Variants) in Piggybak

About a month ago, Tim and I developed and released a Piggybak extension piggybak_variants, which provides advanced product optioning (or variant) support in Piggybak. Piggybak is an open source Ruby on Rails ecommerce platform developed and maintained by End Point. Here, I discuss the background and basics of the extension.

Motivation & Background

The motivation for this extension was the common ecommerce need for product options (e.g. size, color), where each variation shares high-level product information such as a title and description, but variants have different options, quantities available, and prices. Having been intimately familiar with Spree, another open source Ruby on Rails ecommerce framework, we decided to borrow similarities of Spree's product optioning data model after seeing its success in flexibility over many projects. The resulting model is similar to Spree's data model, but a bit different due to the varied nature in Piggybak's mountability design.


Spree's data model for advanced product optioning. A product has many variants. Each variant has and belongs to many option values. A product also has many options, which define which option values can be assigned to it.

Piggybak Variants Data Model


Option configuration data model in Piggybak

The data model starts with option configurations, option configurations are created and specify which class they belong to. For example, a Shirt model may have options Size and Color, and this would be stored in the option configurations table. In this case, an option will have a name (e.g. Size and Color) and a position for sorting (e.g. 1 and 2). The option configuration will reference an option and assign a klass to that option (in this case Shirt). Another example of option configurations may be a Picture Frame, that has option configurations for Dimensions and Finish.


Option value configuration in Piggybak

After option configurations are defined, one will define option values for each option configuration. For example, option values will include Red, Blue, and Green for the option Color with position 1, 2, and 3. And option values will include Small, Medium, and Large with positions 1, 2, and 3 for the option Size.


After options, option configurations, and option values are defined, we are ready to create our variants. Per the above data model, a variant has and belongs to many option_values_variants (and must have one value per option). In our Shirt example, a variant must have one Color option value and one Size option value assigned to it through the option_values_variants table. A variant belongs to a specific sellable item (Shirt) through a polymorphic relationship, which is consistent with Piggybak's mountability design to allow different classes to be sellable items. Finally, a variant has_one piggybak_sellable and accepts piggybak_sellable attributes in a nested form, which means that a variant has one sellable which contains quantity, pricing, and cart description information. What this gives us is a sellable item (Shirt) with many variants where each variant has option values and each variant has sellable information such as quantity available, price, and description in cart. Below I'll provide a few screenshots of what this looks like in the admin and front-end interface.

How to Use the Plugin

To install the extension, the following steps must be applied:

  1. Add the gem to the Gemfile and run bundle install
  2. Install and run the extension rake piggybak_variants:install:migrations and rake db:migrate
  3. Add acts_as_sellable_with_variants to any model that should have variants. You may need to add appropriate attr_accessible settings in your model as well, depending on your attribute accessibility settings.
  4. In the admin, define option configurations and option values for each option, then create variants for your sellable instances.
  5. Finally, add <%= variant_cart_form(@instance) %> to your sellable item's show page to render the cart form.

These steps are similar to Piggybak's core behavior for adding non-variant sellable items.

Screenshots

The Piggybak demo uses this extension for selling several product options of photography frames. The images and captions below represent the variants extension for this use case.


The Frame class has two options assigned to it (Frame Size and Frame Finish). Since Frame Size has a position equal to one and Frame Finish has a position equal to two, Frame Size will show as the first option on the product page.


The Frame Finish option is assigned to the Frame class and it has four option values (Black, Cherry, Bronze, and Iron).


On the Frame edit page, 8 variants are created to represent the combinations of 2 Frame Sizes and 4 Frame Finishes.
Each variant has pricing, quantity, and cart description information, as well as additional sellable fields.


And the product page shows the options and option values for that item, displayed based on Position and Size data.
When each option value is triggered, appropriate pricing information is displayed.

Conclusion

The goal of this extension was to provide variant functionality that is not necessarily required to be used with Piggybak. Piggybak can still be leveraged without this extension to provide simple single product option add to cart functionality. The Piggybak cart only examines what elements are in the cart based on the sellable_id and the quantity, which is the driving force of the core Piggybak architecture as well as this extension.

Stay tuned for additional updates to the Piggybak Ruby on Rails Ecommerce platform or contact End Point today to start leveraging our Ruby on Rails ecommerce expertise on your next project!

Lazy AJAX

Don't do this, at least not without a good reason. It's not the way to design AJAX interfaces from scratch, but it serves well in a pinch, where you have an existing CGI-based page and you don't want to spend a lot of time rewriting it.

I was in a hurry, and the page involved was a seldom-used administration page. I was attempting to convert it into an AJAX-enabled setup, wherein the page would stand still, but various parts of it could be updated with form controls, each of which would fire off an AJAX request, and use the data returned to update the page.

However, one part of it just wasn't amenable to this approach, or at least not quick-and-dirty. This part had a relatively large amount of inline interpolated (Interchange) data (if you don't know what Interchange is, you can substitute "PHP" in that last sentence and you'll be close enough.) I wanted to run the page back through the server-side processing, but only cared about (and would discard all but) one element of the page.

My lazy-programmer's approach was to submit the page itself as an AJAX request:

$.ajax({
    url: '/@_MV_PAGE_@',
    data: {
        'order_date': order_date,
        'shipmode' : shipmode
    },
    method: 'GET',
    async: true,
    success: function(data, status){
        $('table#attraction_booklet_order').replaceWith(
            $(data).find('#attraction_booklet_order').get(0)
        );
        $('table#attraction_booklet_order').show();
    }
}); 

In this excerpt, "MV_PAGE" is a server-side macro that evaluates to the current page's path. The element I care about is a rather complex HTML table containing all sorts of interpolated data. So I'm basically reloading the page, or at least that restricted piece of it. The tricky bit, unfamiliar to jQuery newcomers, lets you parse out something from the returned document much as you would from your current document.

Again, don't do this without a reason. When I have more time, I'll revisit this and improve it, but for now it's good enough for the current needs.

tmux and SecureCRT settings

Richard gave me a call today to show the wonders of tmux. Unfortunately, right off the bat I couldn't see color and there were a bunch of accented a's dividing the panes. After some trial and error and finding this post on the subject we got it working. The key is to configure SecureCRT to use xterm + ANSI colors and set the character set to UTF-8 and Use Unicode line drawing code points. Hooray! I'll be trying out tmux in day-to-day use to see if it will replace or augment screen for me.

Update Your (Gnu) Screen Config on the Fly

An Indispensable Tool

I use Screen constantly in my work at End Point. It is an indispensable tool that I would not want to operate without. It's so handy to resume where I left off after I've detached or when my connection drops unexpectedly. This is likely preaching to the choir but if you are not already using Screen and/or tmux, start now.

The Scenario

I often find myself in the following situation:

  1. SSH into a server
  2. Fire up a new Screen session
  3. Create several windows for editing files, tailing logs etc
  4. Realize the default Screen configuration is inadequate or does not exist.
  5. Facepalm \O/

While my needs are fairly minimal, I do like to bump up the scrollback buffer and display the list of windows in the status line.

Screen example

There are a couple of options at this point. I could put up with the default / non-existent configuration or create a config file and manually re-create the session and all of the windows to pick up the configuration changes. Neither of these options was desirable. I wanted to be able to update the configuration and have all of the existing windows pick up the changes. After asking around a little I ended up taking a look at the manual and discovered the `source` command.

Use the source (command)

The source command can be used to load or reload a Screen configuration file. It can be invoked from inside a Screen session like so:

C-a :source /absolute/path/to/config_file

It is important to note that you must use the absolute path to the config file. There are exceptions which can be found in the Screen man pages but I found it easier to just use the absolute path. Once the source command has been issued, the configuration will be applied to all existing windows! This was exactly what I was looking for. Armed with this information I copied my local .screenrc to the system clipboard, created a new config file on server and applied it to my session using the `source` command.

Works with tmux too

I like to use tmux as well and was happy to find it had a similar feature. The source-file command (`source` is aliased as well) is invoked in the exactly the same way:

C-prefix :source /absolute/path/to/config_file

After issuing the source-file command, all of the windows and panes in the current session will pick up the configuration changes.

Changing the Default Directory

Another related issue I often run into is wishing I had started my Screen or tmux session in a different directory. By default, when you start a Screen or tmux session, all new windows (and panes) will be opened from the same directory where Screen or tmux was invoked. However, this directory can be changed for existing sessions.

For Screen, the chdir command can be used:

C-a :chdir /some/new/directory

In tmux, the default-path command can be used:

C-prefix :default-path /some/new/directory

After issuing the chdir or default-path commands, all new windows and panes will be opened from the specified directory.

I hope this has been helpful — feel free add your own Screen and tmux tips in the comments!


Piggybak Extensions: A Basic How-To Guide

This article outlines the steps to build an extension for Piggybak. Piggybak is an open-source Ruby on Rails ecommerce platform created and maintained by End Point. It is developed as a Rails Engine and is intended to be mounted on an existing Rails application. If you are interested in developing an extension for Piggybak, this article will help you identify the steps you need to take to have your extension leveraging the Piggybak gem, and integrating smoothly into your app.

Introduction

The Piggybak platform is lightweight and relies on Rails meta-programming practices to integrate new extensions. The best references to use alongside your development should be the previously developed extensions found here:

It is likely that your extension will tie into the admin interface. Piggybak utilizes the RailsAdmin gem for its admin interface.

Setting up the Development Environment

A convenient way to start building out your extension is to develop against the demo app found here. The demo app utilizes the Piggybak gem and comes with sample data to populate the e-commerce store.

The Piggybak demo app sample data is exported for a PostgreSQL database. To use this data (suggested) you should be prepared to do one of the following:

  • be using PostgreSQL and understand how to work with the existing data dump
  • transform this data dump to another database format that fits your database flavor of choice
  • ignore the sample data and create your own

Creating the Extension (Gem, Engine)

In a folder outside of the project utilizing the Piggybak gem, create a mountable rails engine:

$ rails plugin new [extension_name] --mountable

The "mountable" option makes you engine namespace-isolated.

Next, update your app's Gemfile to include the extension under development

gem "piggybak_new_extension", :path => "/the/path/to/the/extension"

Run bundle install to install the extension in your application and restart your application.

Special Engine Configuration

Your extension will rely on the engine.rb file to integrate with Piggybak. A sample engine.rb for the piggybak_bundle_discount can be found here. Let's go over this file to get a clue of how bundle discounts are served as an extension in Piggybak.

Make sure you are requiring any of your classes at the top of your engine.rb file, e.g.:

require 'piggybak_bundle_discounts/order_decorator'

The code below is decorating the Piggybak::Order class, which is a helpful pattern to use when you wish to enhance class capabilities across engines. In the bundle discount case, the decorator adds several active record callbacks.

config.to_prepare do
  Piggybak::Order.send(:include, ::PiggybakBundleDiscounts::OrderDecorator)
end

An order is comprised of many line items, which are used to calculate the balance due. More information on the line item architecture is described here. If your extension needs to register new line item types to the order, you may use something similar to the following code to set up the information regarding this new line item type.

config.before_initialize do
  Piggybak.config do |config|
    config.extra_secure_paths << "/apply_bundle_discount"
    config.line_item_types[:bundle_discount] = { 
      :visible => true,
      :allow_destroy => true,
      :fields => ["bundle_discount"],
      :class_name => "::PiggybakBundleDiscounts::BundleDiscount",
      :display_in_cart => "Bundle Discount",
      :sort => config.line_item_types[:payment][:sort]
    } 
    config.line_item_types[:payment][:sort] += 1
  end
end

Does your extension need client side support? Piggybak utilizes the asset pipeline so you will need to register your assets here to have them pre-compiled.

initializer "piggybak_bundle_discounts.precompile_hook" do |app|
  app.config.assets.precompile += ['piggybak_bundle_discounts/piggybak_bundle_discounts.js']
end

Finally, since Piggybak utilizes RailsAdmin for its admin system, we need to register the models as following the RailsAdmin documentation.

initializer "piggybak_bundle_discounts.rails_admin_config" do |app|
  RailsAdmin.config do |config|
    config.model PiggybakBundleDiscounts::BundleDiscount do
      navigation_label "Extensions"
      label "Bundle Discounts"

      edit do
        field :name
        field :multiply do 
          help "Optional"
        end 
        field :discount
        field :active_until
        field :bundle_discount_sellables do 
          active true
          label "Sellables"
          help "Required"
        end
      end
    end

    config.model PiggybakBundleDiscounts::BundleDiscountSellable do
      visible false
      edit do
        field :sellable do          
          label "Sellable"
          help "Required"
        end
      end
    end
  end
end

What else?

From here, extension development can follow standard Rails engine development, which allows for support of its own models, controllers, views, and additional configuration. Any database migrations inside an extension must be copied to the main Rails application to be applied.

You may also need to be aware of how Piggybak integrates with CanCan to ensure that CanCan permissions on your extension models are set correctly.

End Point created and maintains Piggybak project. Much of the inspiration for Piggybak comes from our expert engineers who have ecommerce experience working and contributing to platforms such as Spree, RoR-e, and Interchange. If you are interested in talking with us about your next ecommerce project, or have an ecommerce project that needs support, contact us today.

Is AVS for International Customers Useless?

Any ecommerce site that sells "soft goods", some digitally delivered product, has to deal with a high risk of credit card fraud, since their product is usually received instantly and relatively easily resold. Most payment processors can make use of AVS (Address Verification System). It usually works well for cards issued by United States banks with customers having a U.S. billing address, but its track record with international customers and banks has been less than stellar.

AVS compares a buyer's address information with what the bank has on file for the card's billing address. To reduce false negatives, that comparison is limited to the postal code and the numeric part of the street address. The lack of consistent AVS implementation by non-U.S. banks, and the variety of postal codes seen outside the U.S., Canada, and the U.K., mean problems creep in for most international orders.

Any time you reject an order, whether it's for a legitimately incorrect billing address, a bank/AVS problem, or any other reason, you're increasing the likelihood of losing the customer's business, having them retry and cost you more in payment processing fees, or having them call your customer service line in frustration.

Also note that the AVS result is only available after a transaction is authorized or processed (and the payment processor has thus charged a fee), not before.

So what can you do as a conscientious developer? There are a number of approaches, all with drawbacks as you'll see:

Don't use AVS for foreign cards

That's the simple approach. Skip AVS altogether, at least for international orders, and let the chips fall where they may. By "international order", I mean "one that has a non-US billing address". Merchants tend to think in terms of "foreign" and "domestic" credit cards, and it's true that it's possible to determine the country of the bank based on the card number. See the Wikipedia articles on Bank card number and List of Issuer Identification Numbers for some mostly-accurate information. However, you really need a current "BIN" (Bank Identification Number) database, and those cost money and must be massaged into the format your ecommerce system needs. Oh, and usually your ecommerce system won't know anything about BIN numbers and you'll need custom programming to consult a BIN database.

So for most merchants, actually knowing whether a credit card number is from a U.S. or foreign bank isn't possible to determine, so they fall back to rough estimates such as assuming that a non-U.S. billing address means a non-U.S. bank, and then skipping or weakening the AVS check for those orders.

It seems like that would work, but it's wrong so often that it's not very useful. Customers with a U.S. billing address may have a card issued by a foreign bank that doesn't support AVS, and strict AVS checks for them will mean they can't complete the order. Customers with a foreign billing address may have a card issued by a U.S. bank that does support AVS. And some foreign banks do support AVS, just to keep things interesting.

In any case, disabling or weakening AVS may allow more fraudulent transactions than the merchant is willing to stomach. But this minimizes the grief you incur because of false positives (and the ensuing held funds and customer service calls).

AVS on the whole amount

A charge that fails an AVS check will result in the funds being held on the customer's card until removed, which can cause unease or downright hostility on the part of that customer, especially if you are charging something large compared to the customer's available credit.

Furthermore, many people's cards have a very low credit limit, and there are a lot more people with low remaining balances than you might first think.

In addition, customers using debit cards will be especially cranky if their funds are held because of a failed AVS check. They use that same account for writing checks, withdrawing cash from the ATM, etc., so when you tie up their real money in an account (not just some credit), it feels to them like you have stolen their money.

AVS on a "test charge"

You can submit a "test charge" (usually just $1) against the card to retrieve the the AVS information, to "fish out" the AVS response before making the full charge. This is quite useful if the full charge would be significant; it's less handy for small amounts.

This approach has a serious drawback: it means you are making an additional authorization request for each sale, which doubles transaction fees, and may bump the seller into a higher tier of transaction costs because of the total number of requests.

It is also increasingly noticed by customers in the era of computer banking. Many banks will show pending transactions including these little $1 charges, even after the full charge has come in, inducing over-anxious customers to call your customer service line and waste everyone's time.

Tragic calculus

There's really no magic bullet to correct the issues involved in AVS processing: you can sanitize the data, and cushion the shock for a customer when your website declines their card due to a failed AVS check, but in the end you have to resolve the tragic calculus of whether you lose more to fraud or to abandoned carts, and how many customer service calls you can afford to handle. It is not amusing for customer service reps to have to explain many times per day to anxious customers how the credit card processing system works, often directly contradicting the customers' banks' own explanations that may be so dumbed-down as to be simply incorrect.

Certainly the value of AVS as fraud prevention depends a lot on how you implement it. Perhaps it's time for you to consider whether additional customization of your order processing is in order to maximally balance customer satisfaction, processing charges, and keep fraud at a minimum.

Acknowledgments

This post was extensively edited and extended by Jon Jensen, who has seen plenty of this pain first-hand as well.

Custom validation with authlogic: Password can't be repeated.

I recently worked on a small security system enhancement for one of my projects: the user must not be able to repeat his or her password for at least ten cycles of change. Here is a little recipe for all the authlogic users out there.

We will store ten latest passwords in the users table.

def self.up
    change_table :users do |t|
      t.text    :old_passwords
    end
  end

The database value will be serialized and deserialized into Ruby array.

class User
  serialize :old_passwords, Array
end

If the crypted password field has changed, the current crypted password and its salt are added to the head of the array. The array is then sliced to hold only ten passwords.

def update_old_passwords
  if self.errors.empty? and send("#{crypted_password_field}_changed?")
    self.old_passwords ||= []
    self.old_passwords.unshift({:password => send("#{crypted_password_field}"), :salt =>  send("#{password_salt_field}") })
    self.old_passwords = self.old_passwords[0, 10]
  end
end

The method will be triggered after validation before save.

class User
  after_validation :update_old_passwords
end

Next, we need to determine if the password has changed, excluding the very first time when the password is set on the new record.

class User < ActiveRecord::Base
  def require_password_changed?
    !new_record? && password_changed?
  end
end

The validation method itself is implemented below. The idea is to iterate through the stored password salts and encrypt the current password with them using the authlogic mechanism, and then check if the resulting crypted password is already present in the array.

def password_repeated?
  return if self.password.blank?
  found = self.old_passwords.any? do |old_password|
    args = [self.password, old_password[:salt]].compact
    old_password[:password] == crypto_provider.encrypt(args)
  end
  self.errors.add_to_base "New password should be different from the password used last 10 times." if found
end

Now we can plug the validation into the configuration.

class User < ActiveRecord::Base
  acts_as_authentic do |c|
    c.validate :password_repeated?, :if => :require_password_changed?
  end
end

Done!

Interactive Piggybak Demo Tour

A new interactive tour of Piggybak and the Piggybak demo has been released at piggybak.org. Piggybak is an open source Ruby on Rails ecommerce framework built as a Rails 3 engine and intended to be mounted on existing Rails applications.

The tour leverages jTour (a jQuery plugin) and guides you through the homepage, navigation page, product page, cart and checkout pages, gift certificate page, advanced product option page, and WYSIWYG driven page.The tour also highlights several of the Piggybak plugins available and installed into the demo such as plugins that introduce advanced product navigation, advanced product optioning, and gift certificate functionality. Below are a few screenshots from the demo.

An interesting side note of developing this tour is that while I found many nice jQuery-driven tour plugins available for free or at a small cost, this jQuery plugin was the only plugin offering decent multi-page tour functionality.


Here is the starting point of Piggybak tour.

The Piggybak tour adds an item to the cart during the tour.

The Piggybak tour highlights advanced product navigation
in the demo.

The Piggybak tour highlights features and functionality
on the one-page checkout.

If you'd like to check out the interactive tour, visit the Piggybak demo page here and click on the link to begin the tour! Or contact End Point right now to get started on your next Ruby on Rails ecommerce project!


Mobixa: A Client Case Study

A few weeks ago we saw the official (and successful!) website launch for one of our clients, Mobixa. Mobixa will buy back your used iPhones and/or provide you with information about when you should upgrade your existing phone and sell it back. Right now, Mobixa is currently buying back iPhones and advising on iPhones and Androids. End Point has been working with Mobixa for several months now. This article outlines some of the interesting project notes and summarizes End Point's diverse skillset used for this particular website.

Initial Framework

Mobixa initially wanted a an initial proof of concept website without significant investment in development architecture because the long-term plan and success was somewhat unknown at the project unset. The initial framework comprised of basic HTML combined with a bit of logic driven by PHP. After a user submitted their phone information, data was sent to Wufoo via a Wufoo provided PHP-based API, and data was further handled from Wufoo. Wufoo is an online form builder that has nice export capabilities, and painlessly integrates with MailChimp.

This initial architecture was suitable for collecting user information, having minimal local database needs and allowing external systems (e.g. Wufoo, MailChimp) to handle much of the user logic. However, it became limiting when the idea of user persistence came into play – the long-term goal will be to allow users to modify previous submissions and look up their order information, essentially a need for basic user account management functionality. For that reason, we made a significant switch in the architecture, described below.

Framework #2: Rails 3

Because of the limiting nature of a database-less application with externally managed data and as business needs for users increased, we decided to make the move to Rails. End Point has a large team of Rails developers, Rails is a suitable framework for developing applications efficiently, and we are experienced in working with Rails plugins such as RailsAdmin, Devise, and CanCan, which immediately provide a configurable admin interface, user authentication, and user authentication to the application. In the process of moving to Rails, we eliminated the middle-man Wufoo to integrate with the shipping fulfillment center and MailChimp directly.

The current Mobixa site runs on Rails 3, Nginx and Unicorn backed by PostgreSQL, leverages End Point's DevCamps to allow multiple developers to simultaneously add features and maintain the site painlessly, and uses RailsAdmin, Devise, and CanCan. It features a responsive design and uses advanced jQuery techniques. The focus of the site is still a simple HTML page that passes user-entered information to the local database, but several user management features have been added as well as the ability to sell back multiple phones at a time.

MailChimp Integration

In my search for a decent Rails MailChimp integration gem, I found gibbon. Gibbon is fairly simple - it's an API wrapper for interacting with MailChimp. Any API capabilities and methods available in MailChimp can be called via Gibbon. The integration looks something like this:

# user model 
def update_mailchimp
  gb = Gibbon.new(*api_key*, { :timeout => 30 })

  info = gb.list_member_info({ :id => *list_id*, :email_address => self.email })

  if info["success"] == 1
    gb.listUpdateMember({ :id => *list_id*,
                          :email_address => self.email,
                          :merge_vars => self.mailchimp_data })
  else
    gb.list_subscribe({ :id => *list_id*,
                        :email_address => self.email,
                        # additional new user arguments #
                        :merge_vars => self.mailchimp_data })
  end
end

The above method instantiates a connection to Mailchimp and checks if the user is already subscribed to the Mailchimp list. If the user is subscribed, the listUpdateMember method is called to update the user subscription information. Otherwise, list_subscribe is called to add the user to the Mailchimp list.

What's Next?

In addition to expanding the product buyback capabilities, we expect to integrate additional features such as external-API driven address verification, social media integration, referral management, and more advanced user account management features. The project will continue to involve various members of our team such as Richard, Greg D., Tim, Kamil, Josh W. and me.

Slash URL

There's always more to learn in this job. Today I learned that Apache web server is smarter than me.

A typical SEO-friendly solution to Interchange pre-defined searches (item categories, manufacturer lists, etc.) is to put together a URL that includes the search parameter, but looks like a hierarchical URL:

/accessories/Mens-Briefs.html
/manufacturer/Hanes.html

Through the magic of actionmaps, we can serve up a search results page that looks for products which match on the "accessories" or "manufacturer" field. The problem comes when a less-savvy person adds a field value that includes a slash:

accessories: "Socks/Hosiery"
or
manufacturer: "Disney/Pixar"

Within my actionmap Perl code, I wanted to redirect some URLs to the canonical actionmap page (because we were trying to short-circuit a crazy Web spider, but that's beside the point). So I ended up (after several wild goose chases) with:

my $new_path = '/accessories/' .
   Vend::Tags->filter({body => (join '%2f' => (grep { /\D/ } @path)),
       op => 'urlencode', }) .
   '.html';

By this I mean: I put together my path out of my selected elements, joined them with a URL-encoded slash character (%2f), and then further URL-encoded the result. This was counter-intuitive, but as you can see at the first link in this article, it's necessary because Apache is smarter than you. Well, than me anyway.

Rails: Devise and Email Capitalization

This week, I found a bug for one of our Rails clients that was worth a quick blog post. The client website runs on Rails 3.2.8 with ActiveRecord and PostgreSQL, uses RailsAdmin for an admin interface, Devise for user authentication, and CanCan for user authorization. Before we found the bug, our code looked something like this:

class SomeController < ApplicationController
  def some_method
    user = User.find_or_create_by_email(params[:email])
    # do some stuff with the user provided parameters
    if user.save
      render :json => {}
    else
      render :json => {}, :status => 500
    end
  end
end

It's important to note that the 500 error wasn't reported to the website visitor - there were no visible UI notes to indicate the process had failed. But besides that, this code looks sane, right? We are looking up or creating a user from the provided email, updating the user parameters, and then attempting to save. For the most part, this worked fine, until we came across a situation where the user data was not getting updated properly.

Looking through the logs, I found that the user experiencing the bug was entering mixed caps emails, for example, Steph@endpoint.com. Let's walk through the code in this scenario:

First, a new user is created because there is no user in the system with the exact email Steph@endpoint.com. However, a user does exist in the system tied to steph@endpoint.com.

user = User.find_or_create_by_email(params[:email]) # with "Steph@endpoint.com" 

No problems here:

# do some stuff with the user provided parameters

Below is where the issue is coming up. Devise, our user authentication gem, automatically downcases (lowercases) all emails when they are stored in the database. There is already a user tied to steph@endpoint.com, so user.save fails, a 500 error is thrown, but as an end-user, I don't see anything to indicate that my AJAX call failed.

if user.save

The moral of this story is that it's important to a) understand how plugins manipulate user data automatically (in this case Devise automatically filters the email) and b) test a variety of use cases (in this case, we hadn't considered testing mixed caps emails). Our updated code looks something like this, which downcases emails and upon failure, adds more to the logs for additional unexpected user update failures:

class SomeController < ApplicationController
  def some_method
    user = User.find_or_create_by_email(params[:email].downcase)
    # do some stuff with the user provided parameters
    if user.save
      render :json => {}
    else
      render :json => {}, :status => 500
      Rails.logger.warn "USER ERROR: #{user.errors.full_messages} #{user.attributes.inspect}"
    end
  end
end

Detecting table rewrites with the ctid column

In a recent article, I mentioned that changing the column definition of a Postgres table will sometimes cause a full table rewrite, but sometimes it will not. The rewrite depends on both the nature of the change and the version of Postgres you are using. So how can you tell for sure if changing a large table will do a rewrite or not? I'll show one method using the internal system column ctid.

Naturally, you do not want to perform this test using your actual table. In this example, we will create a simple dummy table. As long as the column types are the same as your real table, you can determine if the change will do a table rewrite on your version of PostgreSQL.

The aforementioned ctid column represents the physical location of the table's row on disk. This is one of the rare cases in which this column can be useful. The ctid value consists of two numbers: the first is the "page" that the row resides in, and the second number is the slot in that page where it resides. To make things confusing, the page numbering starts at 0, while the slot starts at 1, which is why the very first row is always at ctid (0,1). However, the only important information for this example is determining if the ctid for the rows has changed or now (which indicates that the physical on-disk data has changed, even if the data inside of it has not!).

Let's create a very simple example table and see what the ctids look like. When Postgres updates a row, it actually marks the current row as deleted and inserts a new row. Thus, there is a "dead" row that needs to be eventually cleaned out. (this is the way Postgres implements MVCC - there are others). The primary way this cleanup happens is through the use of VACUUM FULL, so we'll use that command to force the table to rewrite itself (and thus 'reset' the ctids as you will see):

postgres=# DROP TABLE IF EXISTS babies;
DROP TABLE

postgres=# CREATE TABLE babies (gender VARCHAR(10), births INTEGER);
CREATE TABLE

postgres=# INSERT INTO babies VALUES ('Girl', 1), ('Boy', 1);
INSERT 0 2

-- Note: the ctid column is never included as part of '*'
postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,1) | Girl   |      1
 (0,2) | Boy    |      1
(2 rows)

-- Here comes Ivy, another girl:
postgres=# UPDATE babies SET births = births+1 WHERE gender = 'Girl';
UPDATE 1

-- Note that we have a new ctid: slot 3 of page 0
-- The old row at (0,1) is still there, but it is deleted and not visible
postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,2) | Boy    |      1
 (0,3) | Girl   |      2
(2 rows)

-- The vacuum full removes the dead rows and moves the live rows to the front:
postgres=# VACUUM FULL babies;
VACUUM

-- We are back to the original slots, although the data is reversed:
postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,1) | Boy    |      1
 (0,2) | Girl   |      2
(2 rows)

That's what a table rewrite will look like - all the dead rows will be removed, and the rows will be rewritten starting at page 0, adding slots until a new page is needed. We know from the previous article and the fine documentation that Postgres version 9.1 is smarter about avoiding table rewrites. Let's try changing the column definition of the table above on version 8.4 and see what happens. Note that we do an update first so that we have at least one dead row.

postgres=# SELECT substring(version() from $$\d+\.\d+$$);
 substring 
-----------
 8.4

postgres=# DROP TABLE IF EXISTS babies;
DROP TABLE

postgres=# CREATE TABLE babies (gender VARCHAR(10), births INTEGER);
CREATE TABLE

postgres=# INSERT INTO babies VALUES ('Girl', 1), ('Boy', 1);
INSERT 0 2

-- No real data change, but does write new rows to disk:
postgres=# UPDATE babies SET gender = gender;
UPDATE 2

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,3) | Girl   |      1
 (0,4) | Boy    |      1
(2 rows)

-- Change the VARCHAR(32) to a TEXT:
postgres=# ALTER TABLE babies ALTER COLUMN gender TYPE TEXT;
ALTER TABLE

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,1) | Girl   |      1
 (0,2) | Boy    |      1
(2 rows)

We can see from the above that changing from VARCHAR to TEXT in version 8.4 of Postgres does indeed rewrite the table. Now let's see how version 9.1 performs:

postgres=# SELECT substring(version() from $$\d+\.\d+$$);
 substring 
-----------
 9.1

postgres=# DROP TABLE IF EXISTS babies;
DROP TABLe

postgres=# CREATE TABLE babies (gender VARCHAR(10), births INTEGER);
CREATE TABLe

postgres=# INSERT INTO babies VALUES ('Girl', 1), ('Boy', 1);
INSERT 0 2

-- No real data change, but does write new rows to disk:
postgres=# UPDATE babies SET gender = gender;
UPDATE 2

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
-------+--------+--------
 (0,3) | Girl   |      1
 (0,4) | Boy    |      1
(2 rows)

-- Change the VARCHAR(32) to a TEXT:
postgres=# ALTER TABLE babies ALTER COLUMN gender TYPE TEXT;
ALTER TABLE

postgres=# SELECT ctid, * FROM babies;
 ctid  | gender | births 
 -------+--------+--------
 (0,3) | Girl   |      1
 (0,4) | Boy    |      1
(2 rows)

We confirmed that the ALTER TABLE in this particular case does *not* perform a table rewrite when using version 9.1, as we suspected. We tell this by seeing that the ctids stayed the same. We could further verify by doing a vacuum full and showing that there were indeed dead rows that had been left untouched by the ALTER TABLE.

Note that this small example works because nothing else is vacuuming the table, as it is too small and transient for autovacuum to care about it. VACUUM FULL is one of three ways a table can get rewritten; besides ALTER TABLE, the other way is with the CLUSTER command. We go through all the trouble above because an ALTER TABLE is the only one of the three that *may* rewrite the table - the other two are guaranteed to do so.

This is just one example of the things you can do by viewing the ctid column. It is always nice to know beforehand if a table rewrite is going to occur, as it can be the difference between a query that runs in milliseconds versus hours!

Job Opening: DevOps Engineer

We're looking for a full-time, salaried DevOps engineer to work with our existing hosting and system administration team and consult with our clients on their needs. If you like to figure out problems, solve them, can take responsibility for getting a job done well without intensive oversight, please read on!

What is in it for you?

  • Work from your home office
  • Flexible full-time work hours
  • Health insurance benefit
  • 401(k) retirement savings plan
  • Annual bonus opportunity
  • Ability to move without being tied to your job location

What you will be doing:

  • Remotely set up and maintain Linux servers (mostly RHEL/CentOS, Debian, and Ubuntu), daemons, and custom software written mostly in Ruby, Python, Perl, and PHP
  • Audit and improve security, reliability, backups, monitoring (with Nagios etc.)
  • Support developer use of major language ecosystems: Perl's CPAN, Python PyPI (pip/easy_install), Ruby gems, PHP PEAR/PECL, etc.
  • Automate provisioning with Chef, Puppet, etc.
  • Work with internal and customer systems and staff
  • Use open source tools and contribute back as opportunity arises
  • Use your desktop platform of choice: Linux, Mac OS X, Windows

What you will need:

  • Professional experience with Linux system administration, networking, firewalls, Apache or nginx web servers, SSL, DNS
  • A customer-centered focus
  • Strong verbal and written communication skills
  • Experience directing your own work, and working from home
  • Ability to learn new technologies
  • Willingness to shift work time to evening and weekend hours when required

Bonus points for experience:

  • Packaging software for RPM, Yum, and apt/dpkg
  • Managing Amazon Web Services, Rackspace Cloud, Heroku, or other cloud hosting services
  • Working with PostgreSQL, MySQL, Cassandra, CouchDB, or other databases
  • Complying or auditing for PCI and other security standards
  • Using load balancers, virtualization (kvm, Xen, VirtualBox, VMware), FC or iSCSI SAN storage
  • With JavaScript, HTML/CSS, Java/JVM, Node.js, etc.
  • Contributing to open source projects

About us

End Point is a 17-year-old Internet consulting company based in New York City, with 31 full-time employees working mostly remotely from home offices. We serve over 200 clients ranging from small family businesses to large corporations, using a variety of open source technologies. Our team is made up of strong ecommerce, database, and system administration talent, working together using ssh, Screen and tmux, IRC, Google+ Hangouts, Skype, and good old phones.

How to apply

Please email us an introduction to jobs@endpoint.com to apply. Include a resume and your GitHub or other URLs that would help us get to know you. We look forward to hearing from you!

PostgreSQL search_path Behaviour

PostgreSQL has a great feature: schemas. So you have one database with multiple schemas. This is a really great solution for the data separation between different applications. Each of them can use different schema, and they also can share some schemas between them.

I have noticed that some programmers tend to name the working schema as their user name. This is not a bad idea, however once I had a strange behaviour with such a solution.

I'm using user name szymon in the database szymon.

First let's create a simple table and add some values. I will add one row with information about the table name.

# CREATE TABLE a ( t TEXT );
# INSERT INTO a(t) VALUES ('This is table a');

Let's check if the row is where it should be:

# SELECT t FROM a;

        t        
-----------------
 This is table a
(1 row)

Now let's create another schema, name it like my user's name.

# CREATE SCHEMA szymon;

Let's now create table a in the new schema.

# CREATE TABLE szymon.a ( t TEXT );

So there are two tables a in different schemas.

# SELECT t FROM pg_tables WHERE tablename = 'a';

 schemaname | tablename | tableowner | tablespace | hasindexes | hasrules | hastriggers 
------------+-----------+------------+------------+------------+----------+-------------
 public     | a         | szymon     | \N         | f          | f        | f
 szymon     | a         | szymon     | \N         | f          | f        | f
(2 rows)

I will just add a row similar to the previous one.

# INSERT INTO szymon.a(t) VALUES ('This is table szymon.a');

Let's check the data in the table "szymon.a".

# SELECT t FROM szymon.a;

           t            
------------------------
 This is table szymon.a
(1 row)

OK, now I have all the data prepared for showing the quite interesting behaviour. As you might see in the above queries, selecting table "a" when there is only one schema works. What's more, selecting "szymon.a" works as well.

What will hapeen when I get data from the table "a"?

# SELECT t FROM a;

           t            
------------------------
 This is table szymon.a
(1 row)

Suddenly PostgreSQL selects data from other table than at the beginning. The reason of this is the schema search mechanism. There is a PostgreSQL environment variable "search_path". If you set the value of this variable to "x,a,public" then PostgreSQL will look for all the tables, types and function names in the schema "x". If there is no such table in this schema, then it will look for this table in the next schema, which is "a" in this example.

What's the defualt value of the search_path variable? You can check the current value of this variable with the following query:

# show search_path;

  search_path   
----------------
 "$user",public
(1 row)

The default search path makes PostgreSQL search first in the schema named exactly as the user name you used for logging into database. If the user name is different from the schema names, or there is no table "szymon.a" then there would be used the "public.a" table.

The problem is even more tricky, even using simple EXPLAIN doesn't help, as it shows only table name omitting the schema name. So the plan for this query looks exactly the same, regardless of the schema used:

# EXPLAIN SELECT * FROM a;
                      QUERY PLAN                      
------------------------------------------------------
 Seq Scan on a  (cost=0.00..1.01 rows=1 width=32)
(1 row)

For plan with more information you should use EXPLAIN VERBOSE, then you will have the plan with schema name, so it will be easier to spot the usage of different schema:

# EXPLAIN VERBOSE SELECT * FROM a;
                         QUERY PLAN                          
-------------------------------------------------------------
 Seq Scan on szymon.a  (cost=0.00..1.01 rows=1 width=32)
   Output: t
(2 rows)

Piggybak on Heroku

Several weeks ago, we were contacted through our website with a request for Heroku support on Piggybak. Piggybak is an open source Ruby on Rails ecommerce platform developed and maintained by End Point. Piggybak is similar to many other Rails gems in that it can be installed from Rubygems in any Rails application, and Heroku understands this requirement from the application’s Gemfile. This is a brief tutorial for getting a Rails application up and running with Piggybak. For the purpose of this tutorial, I’ll be using the existing Piggybak demo for deployment, instead of creating a Rails application from scratch.

a) First, clone the existing Piggybak demo. This will be your base application. On your development machine (local or other), you must run bundle install to get all the application’s dependencies.

b) Next, add config.assets.initialize_on_precompile = false to config/application.rb to allow your assets to be compiled without requiring creating a local database.

c) Next, compile the assets according to this Heroku article with the command RAILS_ENV=production bundle exec rake assets:precompile. This will generate all the application assets into the public/assets/ directory.

d) Next, add the assets to the repo by removing public/assets/ from .gitignore and committing all modified files. Heroku’s disk read-only limitation prohibits you from writing public/assets/ files on the fly, so this is a necessary step for Heroku deployment. It is not necessary for standard Rails deployments.

e) Next, assuming you have a Heroku account and have installed the Heroku toolbelt, run heroku create to create a new Heroku application.

f) Next, run git push heroku master to push your application to your new Heroku application. This will push the code and install the required dependencies in Heroku.

g) Next, run heroku pg:psql, followed by \i sample.psql to load the sample data to the Heroku application.

h) Finally, run heroku restart to restart your application. You can access your application through a browser by running heroku open.

That should be it. From there, you can manipulate and modify the demo to experiment with Piggybak functionality. The major difference between Heroku deployment and standard deployment is that all your compiled assets must be in the repository because Heroku cannot write them out on the fly. If you plan to deploy the application elsewhere, you will have to make modifications to the repository regarding public/assets.

A full set of commands for this tutorial includes:

# Clone and set up the demo app
git clone git://github.com/piggybak/demo.git
bundle install
# add config.assets.initialize_on_precompile = false
# to config/application.rb

# Precompile assets and add to repository
RAILS_ENV=production bundle exec rake assets:precompile
# edit .gitignore here to stop ignoring public/assets/
git add .
git commit -m "Heroku support commit."

# Deploy to Heroku
heroku create
git push heroku master
heroku pg:psql
>> \i sample.psql
heroku restart
heroku open

Postgres alter column problems and solutions


Image from Flickr user ell brown

A common situation for database-backed applications is the need to change the attributes of a column. One can change the data type, or more commonly, only the size limitation, e.g. VARCHAR(32) gets changed to VARCHAR(42). There are a few ways to accomplish this in PostgreSQL, from a straightforward ALTER COLUMN, to replacing VARCHAR with TEXT (plus a table constraint), to some advanced system catalog hacking.

The most common example of such a change is expanding a VARCHAR declaration to allow more characters. For example, your "checksum" column was based on MD5 (at 32 characters), and now needs to be based on Keccak (Keccak is pronounced "catch-ack") (at 64 characters) In other words, you need a column in your table to change from VARCHAR(32) to VARCHAR(64). The canonical approach is to do this:

ALTER TABLE foobar ALTER COLUMN checksum TYPE VARCHAR(64);

This approach works fine, but it has two huge and interrelated problems: locking and time. This approach locks the table for as long as the command takes to run. And by lock, we are talking a heavy 'access exclusive' lock which shuts everything else out of the table. If your table is small, this is not an issue. If your table has a lot of data, however, this brings us to the second issue: table rewrite. The above command will cause Postgres to rewrite every single row of the table, which can be a very expensive operation (both in terms of disk I/O and wall clock time). So, a simple ALTER COLUMN solution usually comes at a very high cost for large tables. Luckily, there are workarounds for this problem.

First, some good news: as of version 9.2, there are many operations that will no longer require a full table rewrite. Going from VARCHAR(32) to VARCHAR(64) is one of those operations! Thus, if you are lucky enough to be using version 9.2 or higher of Postgres, you can simply run the ALTER TABLE and have it return almost instantly. From the release notes:

Reduce need to rebuild tables and indexes for certain ALTER TABLE ... ALTER COLUMN TYPE operations (Noah Misch)

Increasing the length limit for a varchar or varbit column, or removing the limit altogether, no longer requires a table rewrite. Similarly, increasing the allowable precision of a numeric column, or changing a column from constrained numeric to unconstrained numeric, no longer requires a table rewrite. Table rewrites are also avoided in similar cases involving the interval, timestamp, and timestamptz types.

However, if you are not yet on version 9.2, or are making an operation not covered above (such as shrinking the size limit of a VARCHAR), your only option to avoid a full table rewrite is the system catalog change below. However, before you jump down there, consider a different option: abandoning VARCHAR altogether.

In the Postgres world, there are few differences between the VARCHAR and TEXT data types. The latter can be thought of as an unbounded VARCHAR, or if you like, a VARCHAR(999999999999). You may also add a check constraint to a table to emulate the limit of a VARCHAR. For example, to convert a VARCHAR(32) column named "checksum" to a TEXT column:

ALTER TABLE foobar ALTER COLUMN checksum TYPE text;
ALTER TABLE foobar ADD CONSTRAINT checksum_length
  CHECK (LENGTH(checksum) <= 32);

The data type change suffers from the same full table rewrite problem as before, but if you are using version 9.1 or newer of Postgres, the change from VARCHAR to TEXT does not do a table rewrite. The creation of the check constraint, however, will scan all of the existing table rows to make sure they meet the condition. While not as costly as a full table rewrite, scanning every single row in a large table will still be expensive. Luckily, version 9.2 of Postgres comes to the rescue again with the addition of the NOT VALID phrase to the check constraint clause. Thus, in newer versions you can avoid the scan entirely by writing:

ALTER TABLE foobar ADD CONSTRAINT checksum_length
  CHECK (LENGTH(checksum) <= 32) NOT VALID;

This is a one-time exception for the constraint, and only applies as the constraint is being created. In other words, despite the name, the constraint is very much valid after it is created. If you want to validate all the rows that you skipped at a later time, you can use the ALTER TABLE .. VALIDATE CONSTRAINT command. This has the double advantage of allowing the check to be delayed until a better time, and taking a much lighter lock on the table than the ALTER TABLE .. ADD CONSTRAINT does.

So why would you go through the trouble of switching from your VARCHAR(32) to a TEXT column with a CHECK constraint? There are at least three good reasons.

First, if you are running Postgres 9.2 or better, this means you can change the constraint requirements on the fly, without a table scan - even for the 'non-optimal' situations such as going from 64 characters down to 32. Just drop the old constraint, and add a new one with the NOT VALID clause thrown on it.

Second, the check constraint gives a better error message, and a clearer indication that the limitation was constructed with some thought behind it. Compare these messages:

postgres=# CREATE TABLE river( checksum VARCHAR(4) );
CREATE TABLE

postgres=# INSERT INTO river VALUES ('abcde');
ERROR:  value too long for type character varying(4)

postgres=# CREATE TABLE river( checksum TEXT,
postgres-#   CONSTRAINT checksum_length CHECK (LENGTH(checksum) <= 4) );
CREATE TABLE

postgres=# INSERT INTO river VALUES ('abcde');
ERROR:  new row for relation "river" violates check constraint "checksum_length"
DETAIL:  Failing row contains (abcde).

Third, and most important, you are no longer limited to a single column attribute (maximum length). You can use the constraint to check for many other things as well: minimum size, actual content, regex matching, you name it. As a good example, if we are are truly storing checksums, we probably want the hexadecimal Keccak checksums to be *exactly* 64 characters, and not just a maximum length of 64 characters. So, to illustrate the above point about switching constraints on the fly, you could change the VARCHAR(32) to a TEXT and enforce a strict 64 character limit with:

BEGIN;

ALTER TABLE foobar DROP CONSTRAINT checksum_length;

ALTER TABLE foobar ADD CONSTRAINT checksum_length
  CHECK (LENGTH(checksum) = 64) NOT VALID;

COMMIT;

We just introduced a minimum *and* a maximum, something old VARCHAR could not do. We can constrain it further, as we should only be allowing hexadecimal characters to be stored. Thus, we can also reject and characters other than 0123456789abcdef from being added:

BEGIN;

ALTER TABLE foobar DROP CONSTRAINT checksum_length;

ALTER TABLE foobar ADD CONSTRAINT checksum_valid
  CHECK ( LENGTH(checksum) = 64 AND checksum ~ '^[a-f0-9]*$' ) NOT VALID;

COMMIT;

Since we have already added a regex check, we can reduce the size of the CHECK with a small hit in clarity like so:

BEGIN;

ALTER TABLE foobar DROP CONSTRAINT checksum_length;

ALTER TABLE foobar ADD CONSTRAINT checksum_valid
  CHECK ( checksum ~ '^[a-f0-9]{64}$' ) NOT VALID;

COMMIT;

Image from Flickr user loozrboy

Back to the other problem, however: how can we avoid a table rewrite when going from VARCHAR(64) to VARCHAR(32), or when stuck on an older version of Postgres that always insists on a table rewrite? The answer is the system catalogs. Please note that any updating to the system catalogs should be done very, very carefully. This is one of the few types of update I will publicly mention and condone. Do not apply this lesson to any other system table or column, as there may be serious unintended consequences.

So, what does it mean to have VARCHAR(32) vs. VARCHAR(64)? As it turns out, there is no difference in the way the actual table data is written. The length limit of a VARCHAR is simply an implicit check constraint, after all, and as such, it is quite easy to change.

Let's create a table and look at some of the important fields in the system table pg_attribute. In these examples we will use Postgres 8.4, but other versions should look very similar - this part of the system catalog rarely changes.

postgres=# CREATE TABLE foobar ( checksum VARCHAR(32) );
CREATE TABLE

postgres=# \x
Expanded display is on.

postgres=# SELECT attname, atttypid::regtype, atttypmod FROM pg_attribute 
postgres=#  WHERE attrelid = 'foobar'::regclass AND attname = 'checksum';
-[ RECORD 1 ]----------------
attname   | checksum
atttypid  | character varying
atttypmod | 36

The important column is atttypmod. It indicates the legal length of this varchar column (whose full legal name is 'character varying', but everyone calls it varchar). In the case of Postgres, there is also 4 characters of overhead. So VARCHAR(32) shows up as 36 in the atttypmod column. Thus, if we want to change it to a VARCHAR(64), we add 4 to 64 and get a number of 68. Before we do this change, however, we need to make sure that nothing else will be affected. There are other dependencies to consider, such as views and foreign keys, that you need to keep in mind before making this change. What you should do is carefully check all the dependencies this table has:

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
  dependency   | deptype 
---------------+---------
 pg_type:16419 | i

We can see in the above that the only dependency is an entry in the pg_type table - which is a normal thing for all tables and will not cause any issues. Any other entries, however, should give you pause before doing a manual update of pg_attribute. You can use the information returned by the first column of the above query to see exactly what is referencing the table. For example, let's make that column unique, as well as adding a view that uses the table, and then see the effects on the pg_depend table:

postgres=# CREATE UNIQUE INDEX jack ON foobar(checksum);
CREATE INDEX

postgres=# CREATE VIEW martha AS SELECT * FROM foobar;
CREATE VIEW

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
   dependency     | deptype 
------------------+---------
 pg_type:16419    | i
 pg_class:16420   | a
 pg_rewrite:16424 | n

The 'i', 'a', and 'n' stand for internal, auto, and normal. They are not too important in this context, but more details can be found in the docs on the pg_depend table. The first column shows us the system table and oid of the dependency, so we can look them up and see what they are:

postgres=# SELECT typname FROM pg_type WHERE oid = 16419;
 typname 
---------
 foobar

postgres=# SELECT relname, relkind FROM pg_class WHERE oid = 16420;
 relname | relkind 
---------+---------
 jack    | i

-- Views require a little redirection as they are implemented via the rules system
postgres=# SELECT relname,relkind FROM pg_class WHERE oid = 
postgres-#   (SELECT ev_class FROM pg_rewrite WHERE oid = 16424);
 relname | relkind 
---------+---------
 martha  | v

postgres=# \d martha
              View "public.martha"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(32) | 
View definition:
 SELECT foobar.checksum
   FROM foobar;

So what does all that tell us? It tells us we should look carefully at the index and the view to make sure they will not be affected by the change. In this case, a simple index on the column will not be affected by changing the length, so it (along with the pg_type entry) can be ignored. The view, however, should be recreated so that it records the actual column size.

We are now ready to make the actual change. This would be an excellent time to make a backup of your database. This procedure should be done very carefully - if you are unsure about any of the entries in pg_depend, do not proceed.

First, we are going to start a transaction, lock the table, and drop the view. Then we are going to change the length of the varchar directly, recreate the view, and commit! Here we go:

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
   dependency     | deptype 
------------------+---------
 pg_type:16419    | i
 pg_class:16420   | a
 pg_rewrite:16424 | n

postgres=# \d foobar
            Table "public.foobar"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(32) | 
Indexes:
    "jack" UNIQUE, btree (checksum)

postgres=# \d martha
            View "public.martha"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(32) | 
View definition:
 SELECT foobar.checksum
   FROM foobar;
postgres=# BEGIN;
BEGIN

postgres=# DROP VIEW martha;
DROP VIEW

postgres=# LOCK TABLE pg_attribute IN EXCLUSIVE MODE;
LOCK TABLE

postgres=# UPDATE pg_attribute SET atttypmod = 68
postgres-#   WHERE attrelid = 'foobar'::regclass AND attname = 'checksum';
UPDATE 1

postgres=# COMMIT;
COMMIT

Verify the changes and check out the pg_depend entries:

postgres=# \d foobar
            Table "public.foobar"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(64) | 
Indexes:
    "jack" UNIQUE, btree (checksum)

postgres=# CREATE VIEW martha AS SELECT * FROM foobar;
CREATE VIEW

postgres=# \d martha
            View "public.martha"
  Column  |         Type          | Modifiers 
----------+-----------------------+-----------
 checksum | character varying(64) | 
View definition:
 SELECT foobar.checksum
   FROM foobar;

postgres=# SELECT c.relname||':'||objid AS dependency, deptype
postgres-#   FROM pg_depend d JOIN pg_class c ON (c.oid=d.classid)
postgres-#   WHERE refobjid = 'foobar'::regclass;
    dependency    | deptype 
------------------+---------
 pg_type:16419    | i
 pg_class:16420   | a
 pg_rewrite:16428 | n

Success. Both the table and the view are showing the new VARCHAR size, but the data in the table was not rewritten. Note how the final row returned by the pg_depend query changed: we dropped the view and created a new one, resulting in a new row in both pg_class and pg_rewrite, and thus a new OID shown in the pg_rewrite table.

Hopefully this is not something you ever have to perform. The new features of 9.1 and 9.2 that prevent table rewrites and table scanning should go a long way towards that.