End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Ecommerce Solutions: What are the Options?

Lately, I've been evaluating ecommerce options for use on a side hobby/business. I'm obviously a developer, so in theory I could use one of End Point's supported ecommerce frameworks or just write my own framework. But, my bottom line is that I don't need the feature set offered by some of the ecommerce options out there and I don't necessarily have the resources to develop a custom solution.

In addition to personal interest, End Pointers constantly encounter potential clients who aim to get a better understanding of the cost of using open source, our preferred ecommerce solution. I put together two infographics on ecommerce options, ongoing cost, feature sets, and the ability to customize. Before anyone *flips out* about the infographics, note that they represent my broad generalizations regarding the ongoing cost, feature sets and ability to customize. I'm intimately familiar with some of these options and less familiar with a couple of them.


Feature Set versus Ongoing Cost of Ecommerce Solutions


Ability to Customize versus Ongoing Cost of Ecommerce Solutions

Some notes on on the ecommerce solutions shown in the infographics:

  • Online payment service (Paypal): An online payment collection service like PayPal offers a minimal "ecommerce" feature set and might be suitable for someone looking to simply collect money. It also provides almost no ability to customize. The ongoing cost of PayPal is lower than many of the other options, where a percentage of each sale goes to PayPal.
  • Online catalog service (Etsy, eBay): An online catalog service such as Etsy or eBay offers very basic ecommerce listing features, little to no ability to customize, but has relatively low ongoing cost. For example, Etsy charges $0.20 to list a single item for four months and takes 3.5% of the sales fee.
  • Hosted ecommerce (shopify, Big Cartel, Big Commerce, Yahoo Merchant): A hosted ecommerce solution offers the ability to customize the appearance, typically in the form of a custom template language and has a basic ecommerce feature set. Ongoing costs would include the cost of the service, domain registration cost, and payment gateway (e.g. Authorize.NET) fees. Additional cost may apply if development services are required for building a custom template. Shopify's basic solution costs $29/mo. with a max of 100 skus. Shopify offers several other higher-priced options which include more features and have less limitations. Most other hosted ecommerce solutions are similarly priced.
  • Open source ecommerce (Interchange, Spree, Magento, Zen Cart, osCommerce, prestaShop): An open-source ecommerce solution tends to have generic ecommerce features, provides the opportunity for a large amount of customization, but is typically more expensive than hosted ecommerce solutions and online catalog services. The software itself is free, but ongoing costs of hosting (server, domain registration, SSL certificate), development (any piece of customization that does not fit into the generic mold), and payment gateway fees apply. One positive about open source ecommerce is that additional plugins or add-ons are produced by members of the community. If the business needs are satisfied by the community-available extensions, cost from additional development or customization may be eliminated or reduced. I'd also probably group in existing open source plugins or modules for open source CMS solutions like WordPress and Drupal into this category: the generic solution is free, but additional resources may be required for customization.
  • Enterprise ecommerce (ATG, Magento Enterprise): From my experience, I've observed that enterprise ecommerce solutions tend to offer a large feature set and a similar ability to customize as open source frameworks. The ongoing cost is high: for example, Magento Enterprise starts at $12,990/year. In addition to the licensing cost, hosting, development, and payment gateway fees may apply.
  • Custom ecommerce (homegrown): The cost of writing your own ecommerce framework depends on the functionality requirements. The ability to customize is unlimited since the entire solution is custom. The feature set is likely to be proportional to resources spent on the project. Ongoing costs here include hosting, development and payment gateway fees.

Which one do you choose?

It has been my experience that most of End Point's ecommerce clients need some type of customization: a custom appearance, discount functionality, shipping integration, payment gateway integration, social media integration, and other third-party integration. Choosing an option that allows easy customization tends to benefit our customers in the long run. We're pretty biased at End Point towards open source ecommerce solutions, but my opinion recently is that with the advancement of web frameworks and web framework tools (e.g. gems in ruby), development of custom solutions can be done efficiently and may be a better option for a site that is outside the realm of standard ecommerce. For businesses not able to pay development or consulting costs, hosted ecommerce solutions are affordable and provide the essentials needed as the business grows.

What's the difference?

Not long ago a software vendor we work with delivered a patch for a bug we'd been having. I was curious to know the difference between the patch, a .tgz file, and the files it was replacing. I came up with this:

( \
  for i in `( \
      find new -type f | sed "s/new\///" ; \
      find old -type f | sed "s/old\///" ) | \
      sort | uniq`; do \
    md5sum old/$i new/$i 2>&1; \
  done \
) | uniq -u  -c -w 32

Assuming the original .tgz file was unpacked into a directory called "old", and the new one into "new", this tells me which files exist in one directory and not other, and which files exist in both in different forms. Here's an example using a few random files in two directories:

josh@eddie:~/tmp/transient$ ls -l old new
new:
total 16
-rw-r--r-- 1 josh josh 15 2011-03-01 10:15 1
-rw-r--r-- 1 josh josh 12 2011-03-01 10:14 2
-rw-r--r-- 1 josh josh 13 2011-03-01 10:15 3
-rw-r--r-- 1 josh josh 12 2011-03-01 10:16 4

old:
total 16
-rw-r--r-- 1 josh josh 15 2011-03-01 10:15 1
-rw-r--r-- 1 josh josh  5 2011-03-01 10:06 2
-rw-r--r-- 1 josh josh 13 2011-03-01 10:15 3
-rw-r--r-- 1 josh josh 20 2011-03-01 10:18 5

josh@eddie:~/tmp/transient$ ( \
>   for i in `( \
>       find new -type f | sed "s/new\///" ; \
>       find old -type f | sed "s/old\///" ) | \
>       sort | uniq`; do \
>     md5sum old/$i new/$i 2>&1; \
>   done \
> ) | uniq -u  -c -w 32
      1 432c7f1e40696b4fd77f8fd242679973  old/2
      1 a533139557d6c009ff19ae85e18b1c61  new/2
      1 md5sum: old/4: No such file or directory
      1 6f84c6a88edb7c2a453f0f900348960a  new/4
      1 6f38ac81c6bad77838e38f03745e968b  old/5
      1 md5sum: new/5: No such file or directory

Note that this can run into problems when two files in one directory are identical, but that wasn't a likely issue in this case, so I didn't work to avoid that problem.

YUI Extensions and Post Initialization

When using YUI3's provided extension mechanism to enhance (composite, mix in, role, whatever you like to call it, etc.) a Y.Base inherited base class, it is often helpful to have "post initialization" code run after the attributes' values have been set. The following code provides an easy way to hook onto a Y.Base provided attribute change event to run any post initialization code easily.

Debugging PHP extensions with the dynamic linker

If you've ever had to track down a missing dependency or incompatible libraries, chances are good that you were assisted by the ldd command. This helpful utility reports the list of shared library dependencies required by a binary executable. Or in the typical use case, it will tell you which libraries are missing that your application needs to run.

Fortunately most Linux and BSD distributions do a decent job of enforcing dependencies with their respective package managers. But inevitably, there's the occasional proprietary, closed-binary third-party application or built-from-source utility that skirts the convenience of mainstream distributions. If you're lucky, the vendor accurately details which software is required, including specific versions theirs was built against. If you're not, well, you might have to resort to tools like ldd, or even process tracers like strace (Linux), ktrace (OpenBSD) and truss (Solaris).

I recently had the misfortune of troubleshooting a PHP application that was unable to load imagick.so, a native PHP extension to create and modify images using the ImageMagick API. The problem manifested itself innocently enough:

PHP Warning:  PHP Startup: Unable to load dynamic library '/var/www/lib/php/modules
  /imagick.so' - Cannot load specified object in Unknown on line 0

Naturally, my first step was to verify the extension was installed.

$ ls -l /var/www/lib/php/modules/
total 7336
-r--r--r--  1 root  bin       70586 Aug 10  2010 curl.so
-r--r--r--  1 root  bin      395883 Aug 10  2010 gd.so
-rwxr-xr-x  1 root  bin      489386 Aug 10  2010 imagick.so
-r--r--r--  1 root  bin     2105515 Aug 10  2010 mbstring.so
-r--r--r--  1 root  bin       45476 Aug 10  2010 mcrypt.so
-r--r--r--  1 root  bin       63643 Aug 10  2010 mysql.so

Knowing that I'd installed the pecl-imagick package using OpenBSD's pkg_add (which handles dependencies nicely), it seemed unlikely that it was missing any mainstream dependencies. Logically, this started me thinking that it might be a problem with OpenBSD's default chroot for Apache. Needless to say I was disappointed to see the same error when I ran php -m from the command line:

$ php -m 2>&1 | grep imagick
PHP Warning:  PHP Startup: Unable to load dynamic library
'/var/www/lib/php/modules/imagick.so' - Cannot load specified object in
Unknown on line 0

That was unexpected, but not altogether surprising. Let's take a quick look with ldd to see what the extension thinks is missing:

$ ldd /var/www/lib/php/modules/imagick.so
/var/www/lib/php/modules/imagick.so:
Cannot load specified object

That's disturbing... and not much help at all. By contrast, the curl.so extension gives back a full list of shared objects:

$ ldd /var/www/lib/php/modules/curl.so                                                                              
/var/www/lib/php/modules/curl.so:
        Start            End              Type Open Ref GrpRef Name
        00000002065a3000 00000002069b2000 dlib 1    0   0      /var/www/lib/php/modules/curl.so
        000000020ac30000 000000020b07e000 rlib 0    1   0      /usr/local/lib/libcurl.so.15.0
        000000020f3d4000 000000020f807000 rlib 0    2   0      /usr/local/lib/libidn.so.16.30
        0000000204b21000 0000000204f71000 rlib 0    2   0      /usr/lib/libssl.so.15.1
        0000000204150000 00000002046de000 rlib 0    2   0      /usr/lib/libcrypto.so.18.0
        00000002087f0000 0000000208c05000 rlib 0    2   0      /usr/lib/libz.so.4.1
        0000000209100000 000000020950a000 rlib 0    2   0      /usr/local/lib/libintl.so.5.0
        0000000208c05000 0000000209100000 rlib 0    2   0      /usr/local/lib/libiconv.so.6.0

At this point I determine there has to be something wrong with imagick.so; the wild goose chase begins. I reinstall the pecl-imagick package from a variety of sources, thinking the original mirror might have a corrupted package. Next I rebuild the package manually from OpenBSD's ports tree. No change.

Finally, one of the OpenBSD developers suggested the LD_DEBUG environment variable. This tells the run-time link-editor (ld.so) to increase verbosity. The advantage this has over ldd is that it will catch any attempt to load shared objects after startup. In the case of PHP, it will look at any shared objects when php tries to load dynamic extensions with dlopen().

$ sudo LD_DEBUG=1 php -m 2>&1 | more
...
dlopen: loading: /var/www/lib/php/modules/imagick.so
head /var/www/lib/php/modules/imagick.so
obj /var/www/lib/php/modules/imagick.so has /var/www/lib/php/modules/imagick.so as head
linking /var/www/lib/php/modules/imagick.so as dlopen()ed
head [/var/www/lib/php/modules/imagick.so]
examining: '/var/www/lib/php/modules/imagick.so'
loading: libjbig.so.2.0 required by /var/www/lib/php/modules/imagick.so
obj /usr/local/lib/libjbig.so.2.0 has /var/www/lib/php/modules/imagick.so as head
loading: libm.so.5.2 required by /var/www/lib/php/modules/imagick.so
loading: libX11.so.13.0 required by /var/www/lib/php/modules/imagick.so
-->> dlopen: failed to open libX11.so.13.0 <<--
unload_shlib called on /var/www/lib/php/modules/imagick.so
unload_shlib unloading on /var/www/lib/php/modules/imagick.so
dlopen: /var/www/lib/php/modules/imagick.so: done (failed).
PHP Warning:  PHP Startup: Unable to load dynamic library '/var/www/lib/php/modules/imagick.so' - 
   Cannot load specified object in Unknown on line 0

And there's our missing library (libX11.so.13.o). In this case, the dependency was installed on the system, but it's path (/usr/X11R6/lib) wasn't in the shared library cache. I had remembered to install all of the X11 libraries needed for the ImageMagick and GD libraries, but had forgotten to update the cache with ldconfig. A few seconds and a couple commands later, we were back in business.

$ sudo ldconfig -m /usr/X11R6/lib
$ php -m | grep imagick
imagick

Google Earth KML Tour Development Challenges on Liquid Galaxy

Because Liquid Galaxy runs Google Earth, it can easily visualize an organization's GIS data. End Point also develops tours within Google Earth to better present this data. A Liquid Galaxy's networked multi-system architecture presents unique technical challenges to these tours.

Many Google Earth tours incorporate animations and dynamic updates using the <gx:AnimatedUpdate> element. KML features in the Earth environment can be modified, changed, or created during a tour, including the size, style, and location of placemarks, the addition of ground overlays, geometry, and more.

However, these updates are only executed on the Liquid Galaxy master system running the tour, not sent to its slaves. Liquid Galaxy nodes communicate primarily via ViewSync UDP datagrams. These datagrams contain only the master's position in space and time. This means we cannot use <gx:AnimatedUpdate> to animate features across all Liquid Galaxy systems, sharply limiting its utility.

But tours can also use chronological elements to display, animate, and hide features. Using <gx:TimeSpan> or <gx:TimeStamp> within an tour stop enables flying to a specific location in space and time. All Google Earth features may be assigned a TimeStamp, and all Liquid Galaxy screens display the same chronological range of features. This works around the single-system limitation on animation, and enables sweeping changes across all screens. The picture below shows a data animation in progress across three screens; the green dots are filling in the area previously occupied by the red dots, while the time sliders in the upper left corners advance.

Taking this technique a step further, we also "wrapped" some very large KML datasets in NetworkLink elements, and wrapped those NetworkLinks in Folder elements with TimeStamp elements as well. This allowed fine control over when to fetch and render the large set, just-in-time to be featured in its own tour. These large loads can cause the Earth client to stutter, so we also stopped the flight motion and displayed a balloon full of text for the audience. A KML example of this wrapper is below.

<Folder>
       <name>Region Link Two</name>
       <TimeStamp><when>2010-09-19T16:00:00Z</when></TimeStamp>
       <NetworkLink>
               <name>Region Link Two</name>
               <Region>
                       <LatLonAltBox>
                               <north>41.8</north>
                               <south>41.6</south>
                               <east>-86.1</east>
                               <west>-86.3</west>
                       </LatLonAltBox>
                       <Lod>
                               <minLodPixels>256</minLodPixels>
                               <maxLodPixels>-1</maxLodPixels>
                       </Lod>
               </Region>
               <Link>
                       <href>network_view.kml</href>
                       <viewRefreshMode>onRegion</viewRefreshMode>
               </Link>
       </NetworkLink>
</Folder>

These techniques, and the Liquid Galaxy platform, combine to coordinate a moving visual experience across the audience's entire field of view. Animations highlight GIS data and narrate its story, creating an impressive presentation.

Ecommerce Facebook Integration Tips

Over the past couple of months, I've done quite a bit of Facebook integration work for several clients. All of the clients have ecommerce sites and they share the common goal to improve site traffic and conversion with successful Facebook campaigns. I've put together a brief summary of my recent experience.

First, here are several examples of Facebook integration methods I used:

Name & Notes Screenshot Examples
Link to Facebook Page: This is the easiest integration option. It just requires a link to a Facebook fan page be added to the site. A link to Lots To Live For's Facebook page was added to the header.
Facebook Like Button: Implementation of the like button is more advanced than a simple link. Technical details on implementation are discussed below. This was integrated on multiple product page templates. The "like" event and page shows up on the user's wall. Like button's were added throughout Paper Source's site. The screenshot shown above indicates I've liked this product page.

"Liking" a product page shows up on my facebook wall.
Facebook Like Box: Another example of Facebook integration is adding a "Like Box". Acting on the Like Box results in liking the actual Facebook fan page. For example, I added a Like Box to Paper Source's footer. When a user clicks on it, they are Liking Paper Source's Facebook page and will begin to receive updates from Paper Source's Facebook page on their wall. The Facebook Like Box was added to Paper Source's footer.

Clicking on the Like Box is equivalent to liking the Paper Source fan page, shown here.
Sharer: The final example of Facebook integration I implemented for a client was adding "sharer" functionality. This is accomplished by adding a link to "http://www.facebook.com/sharer.php?u=" followed by the URI encoded URL. The user is directed to Facebook and allowed to enter an additional comment and select a thumbnail. This is sent to the user's facebook wall and will be seen in the user's friends news feeds. A Facebook share link is displayed for one of our Spree clients.

When you share an item, you are brought to a page much like the page shown in this screenshot to add a comment or select the thumbnail.

Two of my Facebook friends shared the same article. I saw this on my News Feed.

In addition to the like button and like box, there are several other types of Facebook plugins including an Activity Feed, Recommendations, Login Button, Facepile, Live Stream, and Comments. Facebook's documentation on Plugins can be found here.

Plugin Integration: iframe versus xfbml

I've used iframe integration for the majority of clients and most recently used xfbml for Paper Source. Implementation with an iframe might be easier and require less templates to be modified. Implementation with xfbml provides more flexibility, claims to have more advanced reporting, and allows you to add event listeners to various actions. Here's a code comparison of equivalent functionality for the two options:

iframe xfbml
<iframe
src="http://www.facebook.com/plugins/like.php
?href=<%= URI.encode(request.url) %>
&layout=button_count"
scrolling="no" frameborder="0"
style="border:none;overflow:hidden;
width:90px;height:21px;"
allowTransparency="true">
</iframe>
<p><fb:like layout="button_count"></fb:like></p>
<div id="fb-root"></div>
<script>
var fb_rendered = false;
window.fbAsyncInit = function() {
    FB.init({appId: '*appid*',
      status: true,
      cookie: true,
      xfbml: true});
};
(function() {
    var e = document.createElement('script');
    e.type = 'text/javascript';
    e.src = document.location.protocol +
      '//connect.facebook.net/en_US/all.js';
    e.async = true;
    document.getElementById('fb-root').appendChild(e);
}());
</script>

Gotchas

Throughout development, I've learned a few gotchas:

  • Styling on the Facebook elements is not always easily adjustable. In some cases, the iframe or xfbml element had undesirable styling and the styling of the wrapper div may need to be adjusted to achieve desired results.
  • Facebook crawls or scrapes the "Liked" pages to retrieve page information such as a title, description and thumbnail. Facebook follows rel=canonical URLs to retrieve this information - so it's important that these canonical URLs are correct. A couple of Paper Source's product page templates added an extra forward slash to the URL, and these were followed by Facebook's crawler, resulting in an infinite redirect loop of URLs with additional forward slashes. The result is that Facebook can not scrape the URL and the like button has a flickering effect when clicked.
  • Facebook's crawler caches pages. If there is an error such as the canonical error listed above, it's recommended to wait 24 hours to test until the page has been re-crawled.

GNU Screen + SSH_AUTH_SOCK; my new approach

Over the years, I've played around with several different methods of keeping my forwarded SSH-Agent authentication socket up-to-date in long-running screen sessions (referenced via the $SSH_AUTH_SOCK shell variable). The basic issue here is that Screen sees the process environment at the time it was initially launched, not that which exists when reattaching in a subsequent login session. This means that the $SSH_AUTH_SOCK variable as screen sees it will refer to a socket which no longer exists (as it was removed when you logged out after detaching on the initial login when starting screen).

Some of my previous methods have included a hard-coded variable for the socket itself (downsides: if it's a predictable name you're potentially opening some security issues, plus if you open multiple sessions to the same account, you kill the latest socket), symlinking the latest $SSH_AUTH_SOCK to a hard-coded value on login (similar issues), dumping $SSH_AUTH_SOCK to a file, and aliasing ssh and scp to first source said file to populate the local window's enviroment (doesn't work in scripts, too much manual setup when adapting to a new system/environment, won't work with any other subsystem not already explicitly handled, etc).

Recently though, I've come up with a simple approach using screen's -X option to execute a screen command outside of screen and just added the following to my .bashrc:

screen -X setenv SSH_AUTH_SOCK "$SSH_AUTH_SOCK"

While not perfect, in my opinion this is a bit of an improvement for the following reasons:

  • It's dirt-simple. No complicated scripts to adjust/maintain, just a command that's almost completely self-explanatory.
  • It doesn't kill the environment for existing screen windows, just adjusts the $SSH_AUTH_SOCK variable for new screen windows. This ends up matching my workflow almost every time, as unless a connection dies, I leave the screen window open indeterminately.
  • If you have multiple sessions open to the same account (even if not running both in screen), you're not stomping on your existing socket.
  • Did I mention it's dirt-simple?

There are presumably a number of other environment variables that would be useful to propagate in this way. Any suggestions or alternate takes on this issue?

A Simple WordPress Theme In Action

I’m a big fan of WordPress. And I’m a big fan of building WordPress themes from the ground up. Why am I a big fan of building them from the ground up? Because...

  • It's very easy to setup and build if you plan to move to utilize WordPress's blog architecture for your site, but just have a set of static pages initially.
  • It allows you to incrementally add elements to your theme from starting from the ground up, rather than cutting out elements from a complex theme.
  • It allows you to leave out features that you don’t need (search, comments, archives, listing of aticles), but still take advantage of the WordPress plugin community and core functionality.
  • The learning curve of WordPress APIs and terminology can be complicated. It's nice to start simple and build up.

Here are some screenshots from my simple WordPress theme in action, a site that contains several static pages.

The template is comprised of several files:

File Notes
header.php Includes doctype, header html, and global stylesheets included here. wp_head() is called in the header, which will call any executables tied to the header hook using WordPress’s hook API. wp_list_pages() is also called, which is WordPress’s core method for listing pages.
footer.php Includes footer navigation elements, global JavaScript, and Google Analytics. wp_footer(), is also called here, which will call any executables tied to the footer hook using WordPress's hook API.
index.php Calls get_header() and get_footer(), WordPress's core methods for displaying the header and footer. This also contains static content for the homepage for now (text and images).
page.php Calls get_header() and get_footer(). Uses The Loop, or WordPress's core functionality for display individual posts or pages to display the page content to render the single page static content.
404.php Calls get_header() and get_footer(). This is similar to the index page as it contains a bit of static text and an image and is displayed for any pages not found.
CSS, images, JS Static CSS, images, and JavaScript files used throughout the theme.

Files that are more traditionally seen in WordPress templates excluded from this template are the sidebar.php, archive.php, archives.php, single.php, search.php, and searchform.php. I plan to add some of these later as the website grows to include blog content, but these templates are unnecessary for now.

Below are a couple snapshots of the shared elements between pages.

The header (red) and footer (blue) are shared between the page.php and index.php templates shown here.

You can see the site in the wild here.

Update: Since this article was published, the website shown here has been updated to include a "blog" page, which is one more page that uses the exec-php plugin to list blog articles.

Managing Perl environments with perlbrew

As a Perl hobbyist, I've gotten used to the methodical evolution of Perl 5 over the years. Perl has always been a reliable language, not without its faults, but with a high level of flexibility in syntactical expression and even deployment options. Even neophytes quickly learn how to install their own Perl distribution and CPAN libraries in $HOME. But the process can become unwieldy, particularly if you want to test across a variety of Perl versions.

To contrast, Ruby core development frequently experiences ABI breakages, even between minor releases. In spite of the wide adoption of Ruby as a Web development language (thanks to Ruby on Rails), Ruby developers are able to plod along unconcerned, where these incompatibilities would almost certainly lead to major bickering within the Perl or PHP communities. How do they do it? The Ruby Version Manager.

Ruby Version Manager (RVM) allows users to install Ruby and RubyGems within their own self-contained environment. This allows each user to install all (or only) the software that their particular application requires. Particularly for Ruby developers, this provides them with the flexibility to quickly test upgrades for regressions, ABI changes and enhancements without impacting system-wide stability. Thankfully a lot of the ideas in RVM have made their way over to the Perl landscape, in the form of perlbrew.

Perlbrew offers many of the same features found in RVM for Ruby. It's easy to install. It isolates different Perl versions and CPAN installations in your $HOME and helps you switch between them. It automates your environment setup and teardown. And most importantly, using perlbrew means not having to clutter your default system Perl with application-specific CPAN dependencies.

Getting started with perlbrew couldn't be easier. A quick one-liner is all it takes to install perlbrew in your home directory.

$ curl -L http://xrl.us/perlbrewinstall | bash

If you need to install perlbrew somewhere other than your home directory, just download the installer and pass it the PERLBREW_ROOT environment variable.

$ curl -LO http://xrl.us/perlbrew
$ chmod +x perlbrew
$ PERLBREW_ROOT=/mnt/perlbrew ./perlbrew install

Follow the instructions on screen and you'll be ready to use perlbrew in no time. The perlbrew binary will be installed in ~/perl5/perlbrew/bin, so make sure to adjust your login $PATH accordingly.

Once you're done installing perlbrew there are a couple commands you'll want to run before installing your own Perl versions or CPAN modules. The perlbrew init command is mandatory; this initializes your perlbrew directory. It can also be used later if you need to modify your PERLBREW_ROOT setting. The perlbrew mirror is optional (but recommended) to help you select a preferred CPAN mirror.

$ perlbrew init
$ perlbrew mirror

Next comes the fun part. Start off by verifying the Perl version(s) that perlbrew sees.

$ perlbrew list
* /usr/bin/perl (5.10.1)

Install a newer version of Perl.

$ perlbrew install 5.12.3

Now switch to the newer Perl.

$ perlbrew list
* /usr/bin/perl (5.10.1)
perl-5.12.3

$ perlbrew switch perl-5.12.3

$ perlbrew list
/usr/bin/perl (5.10.1)
* perl-5.12.3

$ perl -v

This is perl 5, version 12, subversion 3 (v5.12.3) built for x86_64-linux

Copyright 1987-2010, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

Alternatively, if you only want to test a different Perl version, try the perlbrew use command (note: this only works in bash and zsh). Unlike the switch command, use is only active for the current shell.

$ perlbrew use system

$ perlbrew list
* /usr/bin/perl (5.10.1)
perl-5.12.3

A quick peek behind the curtain reveals much of the simplicity behind perlbrew.

$ ls -l ~/perl5/perlbrew/
total 2680
-rw-r--r--  1 testy  users      408 Feb 10 23:58 Conf.pm
drwxr-xr-x  2 testy  users      512 Feb 10 23:46 bin
drwxr-xr-x  4 testy  users      512 Feb 11 09:59 build
-rw-r--r--  1 testy  users  1333196 Feb 11 10:33 build.log
drwxr-xr-x  2 testy  users      512 Feb 11 09:59 dists
drwxr-xr-x  2 testy  users      512 Feb 10 23:47 etc
drwxr-xr-x  4 testy  users      512 Feb 11 10:32 perls

$ ls -l ~/perl5/perlbrew/perls/
total 8
drwxr-xr-x  5 testy  users  512 Feb 11 00:38 perl-5.12.3
drwxr-xr-x  5 testy  users  512 Feb 11 10:32 perl-5.13.6

If you're a Perl developer, the perlbrew project may help alleviate a lot of the pain associated with team development or multi-tenant programming environments. Suddenly it becomes much easier to manage your own software requirements, resulting in faster development and testing cycles for you, and fewer headaches for your System Administrators.

Pausing Hot Standby Replay in PostgreSQL 9.0

When using a PostgreSQL Hot Standby master/replica pair, it can be useful to temporarily pause WAL replay on the replica. While future versions of Postgres will include the ability to pause recovery using administrative SQL functions, the current released version does not have this support. This article describes two options for pausing recovery for the rest of us that need this feature in the present. These two approaches are both based around the same basic idea: utilizing a "pause file", whose presence causes recovery to pause until the file has been removed.

Option 1: patched pg_standby

pg_standby is a fairly standard tool that is often used as a restore_command for WAL replay. I wrote a patch for it (available at my github repo) to support the "pause file" notion. The patch adds a -p path/to/pausefile optional argument, which if present will check for the pausefile and wait until it is removed before proceeding with recovery.

The benefit of patching pg_standby is that the we're building on mature production-level code, adding a functionality at its most relevant place. In particular, we know that signal handling is already sensibly handled; (this was something I was less than positive about with when it comes to the wrapper shell script described later). The downside here is that you need to compile your own version of pg_standby in order to take advantage of it. However, it may be considered useful enough of a patch to accept in the 9.0 tree, so future releases could support it out-of-the-box.

After patching, compiling, and installing the modified version of pg_standby the only change to an existing restore_command already using pg_standby would be the addition of the -p /path/to/pausefile argument; e.g.:

restore_command = 'pg_standby -p /tmp/pausefile /path/to/archive %f %p'

After restarting the standby, simply touching the /tmp/pausefile file will pause recovery until the file is subsequently removed.

Option 2: a shell script

The pause-while script is a simple wrapper script I wrote which can be used to gate the invocation of any command by checking if the "pause file" (a file path passed as the first argument) exists. If the pause file exists, we loop in a sleep cycle until it is removed. Once the pause file does not exist (or if it did not exist in the first place), we execute the rest of the provided command string.

Sample invocation:

[user@host<1>] $ touch /tmp/pausefile; pause-while /tmp/pausefile echo hi
... # pauses, notifying of status

[user@host<2>] $ rm /tmp/pausefile
... # shell 1 will now output "hi"

Here's the script:

pause-while:

#!/bin/bash

# we're trapping this signal
trap 'exit 1' INT;

PAUSE_FILE=$1;
shift;

while [ -f $PAUSE_FILE ]; do
 echo "'$PAUSE_FILE' present; pausing. remove to continue" >&2
 sleep 1;
 PAUSED=1
done

[ "$PAUSED" ] && echo "'$PAUSE_FILE' removed; " >&2

# untrap so we don't block the invoked command's expected signal handling
trap INT;

# now we know the pause file doesn't exist, proceed to execute our
# command as normal

exec $@;

We need to trap SIGINT to prevent the wrapped command from executing if the sleep cycle is interrupted.

Putting this to use in our Hot Standby case, we will want to use pause-while as a wrapper for the existing restore_command, thus adjusting recovery.conf to something like this:

restore_command = 'pause-while /tmp/standby.pause pg_standby ... <args>'

With this configuration, when you want to pause WAL replay on the replica simply touch the /tmp/standby.pause pause file and the next invocation of restore_command will wait until that file is removed before proceeding.

The wrapper script approach has the benefit of working with any defined restore_command and is not limited to just working with pg_standby.

Limitations

  • Since this is based on WAL archive restoration, this has a very coarse granularity; recovery can only pause between WAL files, which are 16MB. It is likely that future SQL support functions will support this at arbitrary transaction boundaries and will not have this specific limitation.
  • Neither of these options will work with Streaming Replication. Streaming Replication uses a non-zero exit status of the restore_command as the "End of Archive" marker to flip from archive restoration/catchup mode to WAL Streaming mode. pg_standby's default behavior (even before this patch) is to wait for the next archive file to appear before returning a zero exit status, and returning a non-zero exit status only on error, signal, or because its failover trigger file now exists. This means that if you use pg_standby as the restore_command with Streaming Replication enabled, you will never actually flip over into WAL streaming mode, and will stay pointlessly in rechive restoration mode. (Technically speaking you could touch the failover trigger file; that would get you out of the archive mode, and into WAL streaming mode, but would not result in actually failing over.) It is likely that future SQL support functions for pausing recovery will not have this same dependency/limitation, and will be able to pause recovery when utilizing Streaming Replication.
  • While reviewed/manually tested, these programs have not been production-tested. I've done basic testing on both the shell script and pg_standby patch, however this has not been battle-tested, and likely has some corner cases that haven't been considered (I'm particularly concerned about the shell script's signal handling interactions.)
  • pg_standby has been deprecated and removed in future releases of PostgreSQL. I believe it would still be possible to compile/use pg_standby for future releases based on the version in the 9.0 source tree, but I believe it was removed because of the issues in conjunction with Streaming Replication. Presumably it (and this approach) would still be relevant if people wanted to utilize a traditional log-shipping standby with Hot Standby.

Comments/improvements welcome/appreciated!

A Performance Case Study

I'm a sucker for a good performance case study. So, when I came across a general request for performance improvement suggestions at Inspiredology, I couldn't help but experiment a bit.

The site runs on WordPress and is heavy on the graphics as it's a site geared towards web designers. I inquired to the site administrators about grabbing a static copy of their home page and using it for a case study on our blog. My tools of choice for optimization were webpagetest.org and YSlow.

Here are the results of a 4-step optimization in visual form:

Inspiredology's complete homepage.

The graph on the left shows the page load time in seconds for a first time view. Throughout optimization, page load time goes from 13.412 seconds to 9.212 seconds. Each step had a measurable impact. The graph on the right shows the page load time in seconds for a repeated view, and this goes from 7.329 seconds to 2.563 seconds throughout optimization. The first optimization step (CSS spriting and file combination) yielded a large performance improvement. I'm not sure why there's a slight performance decrease between step 3 and step 4.

And here's a summary of the changes involved in each step:

  • Step 1
    • Addition of CSS Sprites: I wrote about CSS Sprites a while back and A List Apart has an older but still relevant article on CSS Sprites here. Repeating elements like navigation components, icons, and buttons are suitable for CSS sprites. Article or page-specific images are not typically suitable for CSS sprites. For Inspiredology's site, I created two sprited images - one with a large amount of navigation components, and one with some of their large background images. You can find a great tool for building CSS rules from a sprited image here.
    • Combination of JS and CSS files, where applicable. Any JavaScript or CSS files that are included throughout the site are suitable for combination. Files that can't be combined include suckerfish JavaScript like Google Analytics or marketing service scripts.
    • Moved JavaScript requests to the bottom of the HTML. This is recommended because JavaScript requests block parallel downloading. Moving them to the bottom allows page elements to be downloaded and rendered first, followed by JavaScript loading.
  • Step 2
  • Step 3
    • Addition of expires headers and disabling ETags: These are standard optimization suggestions. Jon wrote about using these a bit here and here.
  • Step 4
    • Serving gzipped content with mod_deflate: Also a fairly standard optimization suggestion. Although, I should note I had some issues gzipping a couple of the files and since the site was in a temporary location, I didn't spend much time troubleshooting.
    • A bit more cleanup of rogue html and CSS files. In particular, there was one HTML file requested that didn't have any content in it and another that had JavaScript that I appended to the combined JavaScript file (combined.js).

A side-by-side comparison of webpagetest.org's original versus step 4 results highlights the reduction of requests in the waterfall and the large reduction in requests on the repeat view:

What Next?

At this point, webpagetest.org suggests the following changes:

  • Gzipping the remaining components has a potential of reducing total bytes of the first request by ~10%.
  • Additional image compression has the potential of reducing total bytes of the first request by about ~6%. This metric is based on their image compression check: "JPEG - Within 10% of a photoshop quality 50 will pass, up to 50% larger will warn and anything larger than that will fail." Quite a few of Inspiredology's jpgs did not pass this test and could be optimized further.
  • Use a CDN. This is a common optimization suggestion, but the cost of a CDN isn't always justified for smaller sites.

I would suggest:

  • Revisiting CSS spriting to further optimize. I only spent a short time spriting and didn't work out all the kinks. There were a few requests that I didn't sprite because they were repeating elements, but repeating elements can be sprited together. Another 5 requests might be eliminated with additional CSS spriting.
  • Server-optimization: Inspiredology runs on WordPress. We've used the wp-cache plugin for a couple of our clients running WordPress, which I believe helps. But note that the case study presented here is a static page with static assets, so there is obviously a huge gain to be had by optimizing serving images, CSS, and JavaScript.
  • Database optimization: Again, there's no database in play in this static page experiment. But there's always room for improvement on database optimization. Josh recently made performance improvements for one of our clients running on Rails with postgreSQL using pgsi, our open source postgreSQL performance reporting tool, and had outrageously impressive benchmarked improvements.
  • I just read an article about CSS selectors. The combined.css file I created for this case study has 2000 lines. Although there might be only a small win with optimization here, surely optimization and cleanup of that file can be beneficial.
  • I recently wrote about several jQuery tips, including performance optimization techniques. This isn't going to improve the serving of static assets, but it would be another customer-facing enhancement that can improve the usability of the site.

I highly recommend reading Yahoo's Best Practices on Speeding Up Your Web Site. They have a great summary of performance recommendations, covering the topics described in this article and lots more.

SAS 70 becomes SSAE 16

In recent years it’s become increasingly common for hosting providers to advertise their compliance with the SAS 70 Type II audit. Interest in that audit often comes from hosting customers’ need to meet Sarbanes-Oxley (aka Sarbox) or other legal requirements in their own businesses. But what is SAS 70?

It was not clear to me at first glance that SAS 70 is actually a financial accounting audit, not one that deals primarily with privacy, information technology security, or other areas.

SAS 70 was created by the American Institute of Certified Public Accountants (AICPA) and contains guidelines for assessing organizations’ service delivery processes and controls. The audit is performed by an independent Certified Public Accountant.

Practically speaking, what does passing a SAS 70 audit tell us about an organization? Most importantly that it is financially reliable, and thus hopefully a safe partner for providing critical Internet hosting and data storage services.

On June 15, 2011, the SAS 70 audit will be effectively replaced by the new SSAE 16 attestation standard (Statement on Standards for Attestation Engagements no. 16, Reporting on Controls at a Service Organization). Thus the focus appears to shift from an external auditor investigating an organization, to the organization making claims about itself under the guidance of an auditor.

SSAE 16 was created by the AICPA to make the United States service organization reporting standard compatible with the new international service organization reporting standard, ISAE 3402, which is freely available in PDF format. The SSAE 16 document is available only for a fee.

The AICPA’s FAQ on the SAS 70 to SSAE 16 transition makes an interesting point:

Q. — Will entities now become “SSAE 16 certified”?

A. — No! A popular misconception about SAS 70 is that a service organization becomes “certified” as SAS 70 compliant after undergoing a type 1 or type 2 service auditor’s engagement. There is no such thing as being SAS 70 certified and there will be no such thing as being SSAE 16 certified. An SSAE 16 report (as with a SAS 70 report) is primarily an auditor to auditor communication, the purpose of which is to provide user auditors with information about controls at a service organization that are relevant to the user entities’ financial statements.

This is interesting because many in the industry informally state that they are “SAS 70 Type II certified”. But practically speaking for those of us involved in Internet hosting, is “certification” very different from “passing an audit”? It serves primarily as a requirement checklist item about hosting providers in either case.

Many major hosting providers have completed a SAS 70 Type II audit, including Rackspace (and Rackspace Cloud), Amazon Web Services, SoftLayer (and The Planet, which SoftLayer recently acquired), Verio, Terremark, and ServePath, to mention a few that we have worked with. Presumably these will make an SSAE 16 attestation later this year.

Note that many VPS and cloud hosting providers do not report having been SAS 70 audited. If this is a requirement for your hosting, it's important to look for it early before settling on a provider.

More details about the SAS 70 to SSAE 16 transition are available on the AICPA Service Organization Controls Reporting website.

Web Friendly Tools

Over the past few weeks, I found a few nice tools that I wanted to share:

Spritebox

The first tool I found came across and wanted to share is Spritebox. Spritebox is a WYSIWIG tool to create CSS sprite rules from an image on the web or an uploaded image. Once a sprite image is loaded, regions can be selected, assigned classes or ids, display settings, and background repeat settings. The preview region shows you which part of the sprited image will display in your DOM element. After all sprite regions are defined, CSS is automagically generated, ready for copy and paste into a stylesheet. This is a user-friendly visual tool that's likely to replace my tool of choice (Firebug) for generating CSS sprite rules.

I select the twitter region and assign several CSS properties. I select the header background region and assign several CSS properties.

Typekit

Another tool / service I've come across on the design side of web development is Typekit. Typekit is a font hosting service that allows you to retrieve web fonts and render text with those fonts instead of using Flash or images. I recently noticed severe lag time on font rendering for one of our Spree clients. I was curious about font hosting services, specifically regarding the accessibility, usage, and payment options. Typekit offers four different plans. The lowest plan is free, allows 2 fonts to be used on one site, and the font selection is limited. The highest price-point plan gives you full font library access and can use an unlimited number of fonts on an unlimited number of sites in addition to a few other features.


A "kit" I created for use on a personal site.

I signed up for a free Typekit account and created a "kit" with 2 fonts to be used on my personal site. After publishing my kit, I implement the kit by including some Javascript (shown below), and adding my typekit classes (tk-fertigo-script and tk-ff-enzo-web) to the regions where the kit fonts should apply.

<script type="text/javascript" src="http://use.typekit.com/kitid.js"></script>
<script type="text/javascript">try{Typekit.load();}catch(e){}</script>

I was impressed by typekit's font rendering speed. There are several other font hosting services out there that offer similar paid plans. Regardless of which service is chosen, a hosted font service is an affordable way to use "pretty" fonts, have fast rendering speeds, and have a site with SEO-friendly text.


An example of Typekit in action.

Awesome Screenshot

The final tool I've been using tons is Awesome Screenshot, a Chrome plugin (also available on Safari). It allows you to grab a screenshot, a screenshot region, or the entire page and annotate it with rectangles, circles, arrows, lines and text. You can download the image or upload to provide a shareable link. All the screenshots in this article were created with Awesome Screenshot. This free tool has replaced my screenshot and editing [via Gimp] work flow. I recommend trying this one out!


Awesome Screenshot in action.

Using nginx to transparently modify/debug third-party content

In tracking down a recent front-end bug for one of our client sites, I found myself needing to use the browser's JavaScript debugger for stepping through some JavaScript code that lived in a mix of domains; this included a third-party framework as well as locally-hosted code which interfaced with -- and potentially interfered with -- said third-party code. (We'll call said code foo.min.js for the purposes of this article.) The third-party code was a feature that was integrated into the client site using a custom domain name and was hosted and controlled by the third-party service with no ability for us to change directly. The custom domain name was part of a chain of CNAMEs which eventually pointed to the underlying *actual* IP of the third-party service, so their infrastructure obviously relied on getting the Host header correctly in the request to select which among many clients was being served.

It appeared as if there was a conflict between code on our site and that imported by the third party service. As part of the debugging process, I was stepping through the JavaScript in order to determine what if any conflicts there were, as well as their nature (e.g., conflicting library definitions, etc.). Stepping through our code was fine, however the third-party's JS code was (a) unfamiliar, and (b) minified, so this had the effect of putting all of the JavaScript code more-or-less on one line, which made tracing through the code in the debugger much less useful than I had hoped.

My first instinct was to use a JavaScript beautifier to reverse the minification process, but since I had no control over the code being included from the third-party service, this did not seem to be directly feasible. The third-party code was deployed only on our production site and relied on hard-coded domains which would make integrating it into one of our development instances challenging since we had no control over the contents of the returned resources. Since the relevant feature (and subsequent bugs) was only on the production site, making extensive modifications to how things were done and potentially breaking that or other features for users while I was debugging was obviously out as an option.

Enter nginx. I've been doing a lot with nginx lately as far as using it as a reverse proxy cache, so it's been on my mind lately. So I came up with this technique:

  1. Look up the IP address for the third-party's domain name (used for later purposes).
  2. Install nginx on localhost, listening to port 80.
  3. Modify /etc/hosts to point the third-party's domain name to the nginx server's IP (also localhost in this case).
  4. Configure a new virtual host with the following logical constraints:
    • We want to serve specific files (the beautified JavaScript) from our local server.
    • We want any other request going through that domain to be passed-through transparently, so neither the browser nor the third-party server treat it differently.

Given these constraints, this is the minimal configuration that I came up with (the interesting parts are located in the server block):

/etc/hosts:

example.domain.com 127.0.0.1

nginx.conf:

worker_processes 1;

events {
    worker_connections 10;
}

http {
    include       mime.types;
    default_type  application/octet-stream;
    
    server {
        server_name example.domain.com;
        root /path/to/local_root;

        try_files $uri @proxied;

        location @proxied {
            proxy_set_header Host $http_host;
            proxy_pass http://1.2.3.4;
        }
    }
}

Once I had the above configured/setup, I downloaded/saved the foo.min.js file from the third-party service, ran it through a JS beautifier, and saved it in the local nginx's cache root so it would be served up instead of the actual file from the third-party service. Any other requests for static resources (images, other scripts, etc) would pass-through to the third-party server, so I had my nicely-formatted JavaScript code to step through, the production site worked as normal for anyone else despite potential local changes to the file on my end (i.e., adding JavaScript alert() calls to the file, and no one was the wiser.

A few notes

The try_files directive instructs nginx to first look for a file named after the current URI (foo.min.js in our example) in our local cache, and if this is not found, then fallback to the proxied location block; i.e., relay the request to the original upstream server. We explicitly set the Host header on the proxy request because we want the request to behave normally with respect to name-based hosting, and provide the saved IP address to contact the server in question.

We only needed to preserve/lookup the upstream server's IP address because we're running the nginx server on localhost, so if we used a domain name the lookup would return the same IP defined in /etc/hosts; if the nginx server was running on a different machine, you would be able to just use the domain name as both the server_name and the proxy_pass parameters and set the entry for the host in your local /etc/hosts file to the IP of the nginx server.

A possible extension would be to detect when an upstream request matched a minified URL (via a location ~ \.min\..*\.js$ block) and automatically beautify/cache the content in our local cache. This could be accomplished via the use of an external FastCGI script to retrieve, post-process, and cache the content.

This technique can also be used when dealing with testing changes to a production site on which you are unable or unwilling to make potentially disruptive changes for the purposes of testing static resources. (JavaScript seems the most obvious application here, but this could apply to serving up images or other static content which would be resolvable by the local cache.)

I always need to remind myself to undo changes to /etc/hosts as soon as I'm done testing when using tricks like these. Particularly in something like this which is more-or-less transparent, the behavior would be functionaly identical as long as code/scripts on the third-party site stayed the same, but could easily introduce subtle bugs if the third-party services made changes to their codebase. Since our local copies would mask any remote changes for those non-proxied resources, this could be very confusing if you forget that things are set up this way.

Monitoring with Purpose

If you work on Internet systems all day like we do, there's a good chance you use some sort of monitoring software. Almost every business knows they need monitoring. If you're a small company or organization, you probably started out with something free like Nagios. Or maybe you're a really small company and prefer to outsource your alerts to a web service like Pingdom. Either way, you understand that it's important to know when your websites and mailservers are down. But do you monitor with purpose?

All too often I encounter installations where the Systems Administrator has spent countless hours setting up their checks, making sure their thresholds and notifications work as designed, without really considering what their response might be in the face of disaster (or an inconvenient page at 3am). Operations folk have been trained to make sure their systems are pingable, their CPU temperature is running cool and the system load is at a reasonable level. But what do you do when that alert comes in because the website load is running at 10 for the last 15 minutes? Is that bad? How can you be certain?

The art of monitoring isn't simply reactive in nature. A good SysAdmin will understand that real monitoring takes an active presence. Talk to your DBAs, software engineers and architects. Learn how the various components of your system(s) interact and relate, both in good times and bad. Review your performance trends (graphs) to see how each metric evolves over time. Without understanding the functional scope of your systems, you can't expect to set meaningful thresholds on them.

Last but not least, every alert should be actionable. Getting paged because your application server is down is useless unless you have the proper remediation path documented and tested. Know what actions are needed, who should perform them, and what parties to escalate to in case the remediation fails. Focusing your energies on purposeful monitoring results in fewer false alarms, faster recovery from failures and regression, and an acute understanding of your entire application stack.

Visit at DistribuTECH

I had the chance to attend DistribuTECH in San Diego, CA this past week. DistribuTECH is billed as the utility industry's leader in smart grid conference and exposition. End Point was present at the conference on behalf of Silver Spring Networks. Silver Spring Networks contracted with us to provide a Liquid Galaxy installation for their exhibit.

The Liquid Galaxy did its job from what I could tell. The exhibit was consistently surrounded with conference goers both interested in listening and watching the tours that were being presented as well as wanting to see what the Liquid Galaxy was all about. This was the first time I had seen the Liquid Galaxy and was quite impressed with how well it worked. I saw many people moving their bodies in sync with what was being displayed on the screen, showing that they felt immersed while within the galaxy. One gentleman knelt down while attempting to look under a graph that was being presented on the screen. This same person had returned to the exhibit several times, bringing colleagues back each time to "show off" what he had found.

I spent some time on the conference floor, checking out what was being displayed and seeing how others were getting the attention of the attendees. I could not find anything that compared to the Liquid Galaxy in both wow factor and usability. The fine folks at Silver Spring Networks also seemed impressed with the reaction they were receiving.

One product that I found interesting while walking the floor was a large unit that would freeze itself at night, when power is less expensive, the draw on the grid is less, and it is cooler outside and then use the ice as a coolant to be pumped into the AC unit during the day resulting in a reduction of cost and energy usage to cool a building.

I also took a look at the Silver Spring Networks products on display. One that interested me was their home portal that allowed customers that had Silver Spring smart meters installed by their utility to visit a web portal and view several things including their current usage and compare their usage to that of their neighbors. I can see how someone concerned with the environment could use this information to lessen their power usage.

Keep an eye out here for a few blog posts from Adam on his experience with tour development for the Liquid Galaxy.

In Our Own Words

What do our words say about us?

Recently, I came across Wordle, a Java-based Google App Engine application that generates word clouds from websites and raw text. I wrote a cute little rake task to grab text from our blog to plug into Wordle. The rake task grabs the blog contents, uses REXML for parsing, and then lowercases the results. The task also applies a bit of aliasing since we use postgres, postgreSQL and pg interchangeably in our blog.

task :wordle => :environment do
   data = open('http://blog.endpoint.com/feeds/posts/default?alt=rss&max-results=999', 'User-Agent' => 'Ruby-Wget').read
   doc = REXML::Document.new(data)
   text = ''
   doc.root.each_element('//item') do |item|
     text += item.elements['description'].text.gsub(/<\/?[^>]*>/, "") + ' '
     text += item.elements['title'].text.gsub(/<\/?[^>]*>/, "") + ' '
   end
   text = text.downcase \
     .gsub(/\./, ' ')   \  
     .gsub(/^\n/, '')   \  
     .gsub(/ postgres /, ' postgresql ') \
     .gsub(/ pg /, ' postgresql ')
   file = File.new(ENV['filename'], "w")
   file.puts text
   file.close
 end

So, you tell me: Do you think we write like engineers? How well does this word cloud represent our skillset?

JSON pretty-printer

The other day Sonny and I were troubleshooting some YUI JavaScript code and looking at some fairly complex JSON. It would obviously be a lot easier to read if each nested data structure were indented, and spacing standardized.

I threw together a little Perl program based on the JSON man page:

#!/usr/bin/env perl

use JSON;

my $json = JSON->new;
undef $/;
while (<>) {
    print $json->pretty->encode($json->decode($_));
}

It took all of 2 or 3 minutes and I even left out strictures and warnings. Living on the edge!

It turns a mess like this (sample from json.org):

{"glossary":{"title":"example glossary","GlossDiv":{"title":"S","GlossList":
{"GlossEntry":{"ID":"SGML","SortAs":"SGML","GlossTerm":"Standard Generalized Markup Language",
"Acronym":"SGML","Abbrev":"ISO 8879:1986","GlossDef":{"para":
"A meta-markup language,used to create markup languages such as DocBook.",
"GlossSeeAlso":["GML","XML"]},"GlossSee":"markup"}}}}}

into this much more readable specimen:

{
   "glossary" : {
      "GlossDiv" : {
         "GlossList" : {
            "GlossEntry" : {
               "GlossDef" : {
                  "para" : "A meta-markup language,used to create markup languages such as DocBook.",
                  "GlossSeeAlso" : [
                     "GML",
                     "XML"
                  ]
               },
               "GlossTerm" : "Standard Generalized Markup Language",
               "ID" : "SGML",
               "SortAs" : "SGML",
               "Acronym" : "SGML",
               "Abbrev" : "ISO 8879:1986",
               "GlossSee" : "markup"
            }
         },
         "title" : "S"
      },
      "title" : "example glossary"
   }
}

But today I thought back to that and figured surely something like that must already be at hand if I'd just looked for it. Sure enough, there are many easy options that work conveniently from the shell, similarly to that script:

  • json_xs (Perl JSON::XS)
  • python -mjson.tool (Python 2.6+)
  • prettify_json.rb (Ruby json gem)

And those were just the ones that were likely already on the machine I was using! Hooray for convenience.

Browser popularity

It's no secret that Internet Explorer has been steadily losing market share, while Chrome and Safari have been gaining.

But in the last couple of years I've been surprised to see how strong IE has remained among visitors to our website -- it's usually been #2 after Firefox.

Recently this has changed and IE has dropped to 4th place among our visitors, and Chrome now has more than double the users that Safari does, as reported by Google Analytics:

1. Firefox 43.61%
2. Chrome 30.64%
3. Safari 11.49%
4. Internet Explorer 11.02%
5. Opera 2.00%

That's heartening. :)