End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Getting the Django Admin to sort modified columns

A lot of what I've worked on at End Point is Ovis, a program to keep track of information about our servers. This information includes current operating system, data center, which client owns or uses it, etc. Ovis is built entirely on the Django Admin, with the most important information displayed on the Servers list page.

A very important part of it is tracking server updates.

Knowing when and by who a server was last updated is nice to see, but to put it to good use, we needed to have a column that would show when it was last updated and who last updated it. We also wanted this column to have links to the relevant pages.

Django has a handy way of using functions to create a column based on data that is not part of the model being listed. The documentation for that is here. This is the code we are using:

def latest_update(self):
    try:
        object = self.update_set.latest()
        if object:
            return "%s by %s" % (
                '/admin/ovisapp/update/%s' % object.id,
                object.when_updated.date(),
                '/admin/auth/user/%s' % object.updated_by.id,
                "%s %s." % (object.updated_by.first_name, object.updated_by.last_name[0])
            )
    except:
        return ""

It ends up looking like this:

While this is very useful, it has one very big problem. You are unable to sort your list of objects by this new column. This is because with other columns, it sorts with an easy ORDER BY in the database. But when you are grabbing data from another table, adding links and extra space, words, etc., Django doesn't know what to sort by.

The main part of the sorting is creating a custom Manager to your model, and annotating the QuerySet with aggregate functions. Like this, which gets the newest update object for each server object:

class ServerManager(Manager):
    def get_query_set(self):
        qs = super(ServerManager, self).get_query_set().annotate(Max('update__when_updated'))
        return qs

Set it as your model's manager by adding this line to the model definition:

objects = ServerManager()

Now, to get the column to sort by the aggregate function, set the admin_order_field method attribute to point to your new annotated aggregate function, like this:

def latest_update(self):
...
latest_update.admin_order_field = 'update__when_updated__max'

And it will sort by the date!

Comparing installed RPMs on two servers

Sometimes I'm called on to deal with a problem that shows up only on one of two or more servers that are supposed to be configured identically, or nearly identically. One of the first things I do is run rpm -qa | sort on each machine and diff the output to see which RPM packages may be missing on one or the other server. I've never bothered to package this functionality up into a script because it's so simple.

To exclude minor version differences, you need to specify a custom rpm --queryformat that leaves the version number off.

To understand what you're seeing when it appears that some package is different but seems the same, you're often looking at multiple architectures of packages (e.g. i386 and x86_64) which RPM doesn't show in its default query format.

Finally, to turn the diff output into a list of RPMs to install via yum, I usually do some combination of grep and sed to pick out the RPMs I need.

After all that the process isn't entirely simple anymore, and I recently decided it was easier to script it than explain it all to someone else. I first looked around to see what scripts others have come up with, since this is certainly not a new need. I found the blog post "Compare the RPM Packages Installed on Two Different Servers" which gives a very simple example of the manual labor version I've long done.

In that blog post's comments, people link to various scripts others have done. I checked them out and found that they are all way overcomplicated for my needs, and the simple approach I want just needed to be scripted after all. So here is my script:

I noticed one of those commenters mentioned using comm instead of diff/grep/sed, and so I used that too. Now the script is easier for me too, and helps avoid copying and leaving temporary files sitting around.

To run it, just do:

./rpmdb-compare host1 host2 > mylist

With the output redirected to file mylist, you can edit it to result in a list of RPMs that need to be installed on one server, then do this on that server:

< mylist yum -y install

It's a good idea to test if first without the -y option, which will cause yum to abort the installation and gives you a chance to see if any unexpected dependencies will be dragged in.

Also, don't blindly install every package you don't know the purpose of. Watch out for RPMs that may not belong everywhere due to hardware differences such as Ethernet firmware, RAID controller, IPMI, etc.

A Solution to the Most Common Rails Authentication Problem

Q: What's one of the most common authentication related mistakes?

A: Forgetting to write the code that triggers authentication.

Q: What can we do about it?

A: Make it easier to test authentication.

The most common authentication problem that probably affects every Rails app is forgetting or overlooking the implementation of the authentication. In Rails, this generally means forgetting to add a controller before filter to verify the user is authenticated for actions that should be protected. Let me be the first to admit that I'm guilty of doing this myself but I've noticed it occurring in all Rails apps that I've worked on.

Having seen this problem, committed it myself, and being bothered by it, I've come up with a small solution that is my humble attempt to solve the problem by making it easier to track what is being authenticated and what isn't. Before I show the solution I want to divulge that the current implementation has some shortcomings which I will explain towards the end of the article, but I feel it's still a worthwhile solution in the form of the "good outweighs the bad".

The solution is to provide helpers that make it easy to unit test the authentication of controllers. I'm sure this has been done before but I've not found a standard way to do this so I'm going to propose one. Here is a code example:

The solution in a nutshell:

require 'test_helper'

class InquiriesControllerTest < ActionController::TestCase

  verify_authentication_declared
  verify_require_authenticate :edit, :update
  verify_do_not_require_authenticate :new, :create

end

class InquiriesNotifiersController < ApplicationController
  before_filter :authenticate, only: [:edit, :update]
  before_filter :do_not_authenticate, only: [:new, :create]

  def new
  end

  def create
  end

  def edit
  end

  def update
  end
end
I created a test helper which allows you to add to add three lines of code to verify your authentication:
  1. verify_authentication_declared - Does two different things:
    1. Checks that all actions in your controller are explicitly listed in the tested controller's before_filters. In my example I have two before_filters that call :authenticate and :do_not_authenticate.
    2. Checks that no actions are listed for both :authenticate, and do not :authenticate.

  2. verify_require_authenticate - Allows the developer to specify what actions are intended to be authenticated
  3. verify_do_not_authenticate - Allows the developer to specify what actions don't require authentication, notice that this is different than just not setting the before_filter for :authenticate, what this does is clearly communicate the intent that the action should not be authenticated.

How to implement:

  1. Grab the test helper in this gist and place it in your test folder alongside 'test_helper.rb', name it 'auth_test_helper.rb'
  2. In 'test_helper.rb', require the auth_test_helper and extend TestHelper with it:
  3. require 'auth_test_helper'
    class ActiveSupport::TestCase
      extend AuthTestHelper
    
  4. If your solution uses an authentication before_filter with a different name than authenticate then you can either change the name in 'auth_test_helper.rb' or you can add an alias in your application controller, and then use the alias in your controllers. For example:
    class ApplicationController
      def authenticate
        require_login
      end
    end
    
    def ProductsController < ApplicationController
      before_filter :authenticate, only: [:new]
    
      def new
      end
    end
    

    Important note: the auth tests look for the :only symbol to determine which actions are covered so mod AuthTestHelper if you want to use :except.

    Alternatively, you can use the :authenticate method to actually perform the authentication. This would be the choice if you roll your own authentication:

      def authenticate
        redirect_to login_url, alert: "Please login, and you'll be sent to the page you tried to access." if current_user.nil?
      end
    
  5. Add a method to ApplicationController for do_not_authenticate, the method doesn't actually do anything and serves only to make it possible to declare which actions are not authenticated
  6. class ApplicationController
      def authenticate
        require_login
      end
       
      def do_not_authenticate
      end
    end
    
    def ProductsController < ApplicationController
      before_filter :authenticate, only: [:new]
      before_filter :do_not_authenticate, only: [:edit]
      def new
      end
    
      def edit
      end
    end
    
  7. In your test for the controller add three lines of verify code:

      verify_authentication_declared
      verify_require_authenticate :edit, :update
      verify_do_not_require_authenticate :new, :create
      

    Only :verify_require_authenticate and :verify_do_not_require_authenticate accept parameters and those will be symbols of the methods that they verify.

Shortcomings of this Approach:

  1. It could be argued that this isn't a solution merely moving the problem out into the tests. By that I mean, if the developer doesn't implement the three line verify block then there again can be the problem of forgotten authentication. My idea to address this was to have :verify_authentication_declared automatically executed for each controller test but as of this writing I couldn't get it to technically work.
  2. The verify methods throw exceptions instead of making failed assertions. This is wrong from a testing sense, the verify methods for authentication should be asserts like any other test, and throwing an exception should be reserved for when something unusual happens.

Despite the shortcomings listed above, I still think the good outweighs the bad, and if over time these tests prove themselves to be valuable I'm going to fix the shortcomings, for now I'm going to present the idea and let people play with it.

Source code

Gist for AuthTestHelper

Monitorama, Berlin, EU - Day 2 and final considerations

Moving toward the end of IT conferences your expectations sometime gets lower cause the speakers are tired and so are the attendees. You kind of expect things to get quieter but this wasn't the case with Monitorama EU 2013.

On this second day I found that all of the talks were as interesting, entertaining and inspiring as the ones on the first day.

I enjoyed all the talks proposed today but I got very inspired by the ones from Jeff Weinstein which talked about how you can use data collected for metrics and monitoring to improve the whole company. I also appreciated the speech from Gareth Rushgrove which highlighted how security is actually still underrated in IT companies and how/why you should try to integrate monitoring with security auditing tools.

I've been asked a few times, during the day and the evening, which was my opinion about the conference and the answer was always "absolutely positive!". I always add that though I don't expect to see any rocket science in these conferences I kind of suppose that I'll get a lot of hints, ideas and tips which are a wonderful trampoline for new personal or work-related projects. That is exactly what I got.

The other aspect you unexpectedly get in this kind of technical conferences is an incredibly positive social experience with plenty of information exchanged both on a personal and technical perspective. I always over-stress this aspect because sometimes people go to conferences, listen to the talks, learn something and go home unsatisfied. That's mostly because they missed the point.

If you want to learn new stuff you have a marvellous thing called Internet with GitHub, SlideShare and many useful books out there. What you'll always miss from all this sources is the "human factor" which you only get when you hang out with other people that share interests with you. Who are willing to share their knowledge with you (hint: open source communities) and in general are there to enjoy staying together, just for a few days, with other weird humans which share their same passion: their IT work.

So in the end my impressions are very positive, and I'd like to thank all the guys behind the conference organization to have it run as good as it did. I'm sure I'll join you again, hopefully soon.

eCommerce Innovation Conference 2013

The eCommerce Innovation Conference 2013 is a new conference being held in Hancock, New York, between October 8th and 11th. The conference aims to discuss everything ecommerce with a focus on Perl-based solutions including Dancer and Interchange. It isn't geared directly to any one specific type of person unlike most conferences. The current speakers list include in-house ecommerce software developers, consultants, sales managers, project managers, and marketing experts. The talk topics range from customer relationship management to template engines for Perl.

Mark Johnson and I are both going to be speaking at the conference. Also there will be Mike Heins, creator of Interchange, and Stefan Hornburg, longtime Interchange development group "team captain".

Mark is going to be discussing full page caching in Interchange 5. This is becoming a more frequent request from our larger customers. They want to be able to do full page caching to allow the web browser and a caching proxy server alone to handle most requests leaving Interchange and the database open to handle more shopping-based requests like add to cart or checkout. This is a commonly-used architecture in many application servers, and my colleague David Christensen has several new features already in use by customers to make full-page caching easier, which are expected to go into Interchange 5.8.1 soon.

I will be doing a talk on multi-site setup in Interchange 5. This is a request we have received frequently over the years. Companies may either already have some kind of wholesale website or just want to have multiple websites use the same database and programming but allow for different website designs. They normally need to control what website a product will show up on and possibly adjust the price accordingly. I'll discuss the different methods we have used to accomplish this at End Point.

I see on the schedule that Sam Batschelet will be speaking about the camps system and some new capabilities he's added for perlbrew and Carton, among other things. We are also using camps some places with perlbrew and plenv, so it will be interesting to compare notes. I hope we'll see some discussion and/or contribution to the open source DevCamps project soon!

It promises to be a very nice conference with lots of diverse information!

Monitorama, Berlin, EU - Day 1

If you care about the quality of your IT infrastructure and work, there are times where you really need to focus on a valuable and important aspect: community.

The thing is that most people don't realize how valuable the human factor is when working in the IT field, until they happen to be in such a marvellous conference as Monitorama has been so far.

I was lucky enough to be there, in Berlin from 2013.09.19 to 2013.09.20, to enjoy all the awesome talks and attendees which was present there. And what I'm really saying that besides most of the speeches were quite technically interesting and definitely good quality ones, they definitely didn't only revolves about monitoring per-se.

I won't mention each and every talk, though they all would have deserved it, but I'll say that while I was very inspired by Danese Copper's talk about Open Source value and importance, I was also very entertained Ryan Dotsmith's one about how you could/should learn from failures, either yours or others ones, or the very specific "on the field" one from Katherine Daniels.

On top of that while I generally don't appreciate sponsors having "talks" during this kind of conferences, I actually appreciated how they managed to handle the sponsor advertising part of the conference where you have this little, brief demos from the sponsors which are fairly related and never boring or out of context as I have experienced at other conferences.

A few things I actually got from all talks is that future of monitoring will definitely be all about machine learning and having computers mimic the way human interprets data. We just have to teach computers how to do this kind interpretive tasks as good as we do. It all revolves around this and only then we could all stop saying "You know computers are dumb, you just have to cope with it".

One last thing which needs to be told about this awesome conference and needs to be mentioned is the awesome location named "Golgatha Biergarten" park which hosted our dinner, based on local grilled meat and food and an awesome community binding moment which was the perfect enriching closing for such an awesome day.

So it's hard to hide how thrilled I am to see what's coming next tomorrow.

Apache accidental DNS hostname lookups

Logging website visitor traffic is an interesting thing: Which details should be logged? How long and in what form should you keep log data afterward? That includes questions of log rotation frequency, file naming, and compression. And how do you analyze the data later, if at all?

Allow me to tell a little story that illustrates a few limited areas around these questions.

Reverse DNS PTR records

System administrators may want to make more sense of visitor IP addresses they see in the logs, and one way to do that is with a reverse DNS lookup on the IP address. The network administrators for the netblock that the IP address is part of have the ability to set up a PTR (pointer) record, or not. You can find out what it is, if anything.

For example, let's look at DNS for End Point's main website at www.endpoint.com using the standard Unix tool "host":

% host www.endpoint.com
www.endpoint.com has address 208.43.132.31
www.endpoint.com has IPv6 address 2607:f0d0:2001:103::31
% host 208.43.132.31
31.132.43.208.in-addr.arpa domain name pointer 208.43.132.31-static.reverse.softlayer.com.
% host 2607:f0d0:2001:103::31
1.3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.3.0.1.0.1.0.0.2.0.d.0.f.7.0.6.2.ip6.arpa domain name pointer 2607.f0d0.2001.0103.0000.0000.0000.0031-static.v6reverse.softlayer.com.

The www.endpoint.com name points to both an IPv4 and an IPv6 address, so there are two answers. When each of those IP addresses is looked up, each shows a PTR record pointing to a subdomain of softlayer.com, which gives a clue about where our site is hosted.

(As an aside: Why don't we use a prettier or more specific PTR record? We could set it to almost whatever we want. Well, there are dozens of websites hosted on those IP addresses, so which one should be in the PTR record? There's no obvious choice, and it doesn't matter for normal network functioning, so we just left it the way it was.)

So, is a PTR record like these useful to know about visitors to your website? Sometimes. Let's take a look at a random sample of visitors to a different website we manage. How much can you tell about each of the visitors based on their reverse DNS PTR records? Is it a bot, someone at home or the office, in which country, and who is their Internet provider? How common is it for a visitor's IP address to have no PTR record? And keep in mind that most of the visitors have no idea what their IP address or its PTR record is.

% host 93.137.189.55
55.189.137.93.in-addr.arpa domain name pointer 93-137-189-55.adsl.net.t-com.hr.
% host 66.249.73.121
121.73.249.66.in-addr.arpa domain name pointer crawl-66-249-73-121.googlebot.com.
% host 88.134.68.31
31.68.134.88.in-addr.arpa domain name pointer 88-134-68-31-dynip.superkabel.de.
% host 67.49.156.20
20.156.49.67.in-addr.arpa domain name pointer cpe-67-49-156-20.hawaii.res.rr.com.
% host 123.211.36.234
234.36.211.123.in-addr.arpa domain name pointer CPE-123-211-36-234.lnse3.cha.bigpond.net.au.
% host 91.75.70.162 
Host 162.70.75.91.in-addr.arpa. not found: 3(NXDOMAIN)
% host 209.82.97.10
10.97.82.209.in-addr.arpa domain name pointer mail02.westjet.com.
% host 76.70.117.223
223.117.70.76.in-addr.arpa domain name pointer bas1-toronto26-1279686111.dsl.bell.ca.
% host 101.160.207.115 
Host 115.207.160.101.in-addr.arpa. not found: 3(NXDOMAIN)
% host 184.198.177.214
214.177.198.184.in-addr.arpa domain name pointer 184-198-177-214.pools.spcsdns.net.
% host 84.199.97.130
130.97.199.84.in-addr.arpa domain name pointer 84-199-97-130.iFiber.telenet-ops.be.
% host 182.19.87.24 
Host 24.87.19.182.in-addr.arpa. not found: 3(NXDOMAIN)
% host 62.34.219.216
216.219.34.62.in-addr.arpa domain name pointer i01v-62-34-219-216.d4.club-internet.fr.
216.219.34.62.in-addr.arpa domain name pointer lns-c10k01-v-62-34-219-216.dsl.sta.abo.bbox.fr.
% host 187.86.213.190
Host 190.213.86.187.in-addr.arpa. not found: 3(NXDOMAIN)
% host 15.211.201.84
84.201.211.15.in-addr.arpa domain name pointer zccy01cs104.houston.hp.com.
% host 161.69.46.150
150.46.69.161.in-addr.arpa domain name pointer miv-scan015.scanalert.com.
% host 77.182.146.99
99.146.182.77.in-addr.arpa domain name pointer essn-4db69263.pool.mediaWays.net.
% host 107.0.160.152 
152.160.0.107.in-addr.arpa domain name pointer 107-0-160-152-ip-static.hfc.comcastbusiness.net.

Did you notice that one IP address returned two different PTR records? That is allowed, though uncommon, as I mentioned in my blog post Multiple reverse DNS pointers per IP address a few years back. Many reverse DNS control panels provided by commodity hosting providers won't allow you to assign multiple PTR records, but if you get your reverse DNS delegated to a real nameserver you control, you can do it.

Finding the IP address owner: whois

The reverse DNS PTR can be set misleadingly, such that a forward lookup on the name does not point back to the same IP address. In the end the way to really know who controls that IP address (or at least a network provider who supplies the ultimately responsible person) is with a "whois" lookup. We can check that the 208.43.132.31 IP address really is hosted at SoftLayer, and for which customer, like this:

% whois 208.43.132.31
[Querying whois.arin.net]
[Redirected to rwhois.softlayer.com:4321]
[Querying rwhois.softlayer.com]
[rwhois.softlayer.com]
%rwhois V-1.5:003fff:00 rwhois.softlayer.com (by Network Solutions, Inc. V-1.5.9.5)
network:Class-Name:network
network:ID:NETBLK-SOFTLAYER.208.43.128.0/19
network:Auth-Area:208.43.128.0/19
network:Network-Name:SOFTLAYER-208.43.128.0
network:IP-Network:208.43.132.0/27
network:IP-Network-Block:208.43.132.0-208.43.132.31
network:Organization;I:End Point Corporation
network:Street-Address:920 Broadway, Suite 701
network:City:New York
network:State:NY
network:Postal-Code:10010
network:Country-Code:US
network:Tech-Contact;I:sysadmins@softlayer.com
network:Abuse-Contact;I:abuse@endpoint.com
network:Admin-Contact;I:IPADM258-ARIN
network:Created:2007-06-18 12:15:54
network:Updated:2010-11-21 18:59:43
network:Updated-By:ipadmin@softlayer.com

%referral rwhois://root.rwhois.net:4321/auth-area=.
%ok

So you can see that's really End Point's IP address, at SoftLayer.

Use your local whois tool or search for a web-based one and look up a few of the IP addresses that didn't have reverse DNS PTR records in our log cross-section above. The results are interesting.

Reverse lookups in Apache httpd

Now let's say that as a system administrator you would like to see the PTR records for visitor IP addresses on your Apache httpd website. It may be tempting to use the HostnameLookups configuration directive to do real-time lookups and put them in the log alongside the IP address. It's easy but not wise to put the PTR record instead of the IP address, because it may not point back to the IP address, and even if it does, it can change over time, and will not provide a complete picture of the connection later on.

However, if you read the HostnameLookups documentation, you'll see the authors recommend it not be enabled on busy production servers because of the extra network traffic and delay for visitors, especially for any netblocks with slow DNS servers (and there are many out there). This is important! It really should almost never be enabled for any public site.

Most web server administrators learn this early on and wouldn't dream of enabling HostnameLookups.

However, I recently came across a situation where we inadvertently were doing the equivalent without explicitly enabling HostnameLookups. How? By limiting access based on the remote hostname! Read the documentation on the Allow directive, under the section "A (partial) domain-name":

This configuration will cause Apache to perform a double reverse DNS lookup on the client IP address, regardless of the setting of the HostnameLookups directive. It will do a reverse DNS lookup on the IP address to find the associated hostname, and then do a forward lookup on the hostname to assure that it matches the original IP address. Only if the forward and reverse DNS are consistent and the hostname matches will access be allowed.

This makes perfect sense, but it is a pretty big likely unexpected side effect to using something like:

Allow from .example.com

In our case it was an even less obvious case that didn't make us think of hostnames at all:

Allow from localhost

Here localhost was written, perhaps to save some effort or maybe increase clarity vs. writing out 127.0.0.1 (IPv4) and ::1 (IPv6). Mentally it's so easy to view "localhost" is a direct alias for 127.0.0.1 and ::1 that we can forget that the name "localhost" is just a convention, and requires a lookup like any other name. Those familiar with the MySQL database may know that it actually assigns special confusing meaning to the word "localhost" to make a UNIX socket connection instead of a TCP connection to 127.0.0.1 or whatever "localhost" is defined as on the system!

You may also be thinking that looking up 127.0.0.1 is fast because that is usually mapped to "localhost" in /etc/hosts. True, but every other visitor who is not in /etc/hosts gets the slow DNS PTR lookup instead! And depending on the operating system, you may see "ip6-localhost" or "ip6-loopback" (Debian 7, Ubuntu 12.04), "localhost6" (RHEL 5/6, Fedora 19) in /etc/hosts, or something else. So it's important to spell out the addresses:

Allow from 127.0.0.1
Allow from ::1

Doing so immediately stops the implicit HostnameLookups behavior and speeds up the website. In this case it wasn't a problem, since it was for a private, internal website that couldn't be visited at all by anyone not first allowed through a firewall, so traffic levels were relatively low. That access control is part of why localhost needed to be allowed in the first place. But it would have been very bad on a public production system due to the slowdown in serving traffic.

The right way

If you really want to see PTR records for every visitor IP address, you can use Apache's logresolve log post-processing program. Or you can let an analytics package do it for you.

So, lesson learned: It's not just HostnameLookups you need to keep turned off. Also watch out for the Apache Allow directive and don't use it with anything other than numeric IP addresses!

Interchange Form Testing with WWW::Mechanize

Recently, I encountered a testing challenge that involved making detailed comparisons between the old and new versions of over 200 separate form-containing HTML (Interchange) pages.

Because the original developers chose to construct 200+ slightly-different pages, rather than a table-driven Interchange flypage (curses be on them forever and ever, amen), an upgrade to change how the pages prepared the shopping cart meant making over 200 similar edits. (Emacs macros, yay!) Then I had to figure out how to verify that each of the 200 new versions did something at least close to what the 200 old versions did.

Fortunately, I had easy ways to identify which pages needed testing, construct URLs to the new and old pages, and even a way to "script" how to operate on the page-under-test. And I had WWW::Mechanize, which has saved my aft end more than once.

WWW::Mechanize is a pretty mature (originally 2008) "browser-like" system for fetching and acting on web pages. You can accept and store cookies, find and follow links, handle redirection, forms, you name it -- but not Javascript. Sorry, but there are other tools in the box that can help you if you are working with more interactive pages.

In my case, lack of JS wasn't an issue. I just needed a way to fetch a page, tweak a form element or two, and submit the page's POST for server processing. Then if I could capture the server-side state of my session, I'd be golden.

#!/usr/local/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
use Test::More;

our $BASE = 'http://www.example.com/';

my %common = (
    agent => 'compare-pages',
    autocheck => 1,
    cookie_jar => { },
    quiet => 1,
    redirect_ok => 1,
    timeout => 15,
);
my $old = WWW::Mechanize->new(
    %common,
);
my $new = WWW::Mechanize->new(
    %common,
);

for my $page (@ARGV ? @ARGV : <>) {
    print $page;
    chomp $page;
    $new->get( $BASE . 'newstuff/' . $page . '?mv_pc=RESET');
    my $new_form = $new->form_with_fields('last_product');
    $new->submit();
    $new->form_with_fields('mv_todo');
    $new->submit();
    $new->get( $BASE . 'show-the-dump' );
    $new->content =~ m/#+\s+SESSION\s+#+\n(.+)\n#+\s+END SESSION\s+#+/s;
    my $new_session = eval $1;
    delete $new_session->{carts}{main}[0]{$_} for qw(some fields);

    $old->get( $BASE . $page . '?mv_pc=RESET' );
    my $old_form = $old->form_with_fields('order_item', 'mv_order_deliverydate');
    $old->select('mv_order_deliverydate', {n => 2});
    $old->submit();
    $old->get( $BASE . 'show-the-dump' );
    $old->content =~ m/#+\s+SESSION\s+#+\n(.+)\n#+\s+END SESSION\s+#+/s;
    my $old_session = eval $1;
    delete $old_session->{carts}{main}[0]{$_} for qw(other fields);

    is_deeply($old_session->{carts}{main}, $new_session->{carts}{main}, "$page : carts match") or exit;
}

done_testing;
exit;
  • 2-5: Very few external modules are needed for this. WWW::Mechanize is quite complete (but it has a slew of prerequisites). Test::More is used just to make our comparisons easier.
  • 7: This will be the URL base for our requests.
  • 9-22: we set up two separate user agents so that they don't share cookies, history, or any state information that would confuse our comparisons.
  • 27, 37: retrieving the pages under test. Note that in my case, "newstuff/" distinguished the new version from the original.
  • 28, 38: specifying which form on the retrieved page is to be considered the "current" one. Note that I'm not using the returned value here (although it came in handy during debugging). "form_with_fields" lets you pick a form based on one or more fields named within it. In the event that there's more than one, you get the first (and Mechanize complains with a warning -- but we've turned that off via the "quiet" option, above).
  • 32, 41: In the interests of security, I've not shown the actual page we use to dump the session internals. However, for Interchange users it's just a page with a "[dump]" tag. You might write something that produces plain text, or CSV, or JSON. In my case, the session dump contains Data::Dumper-style output that I can feed into Perl's "eval" function.
  • 35, 44: The two data structures resulting from the "old" and "new" pages aren't exactly alike, so I remove the bits I don't care about.
  • 46: And Test::More to the rescue, saving me from having to re-invent the code that will compare a possibly-complex data structure down to the scalar members. I have it exit after a failure, since in my case one error usually meant a whole family of corrections that needed to be applied to several related pages.

And that's all! My testing now consists of:

$ grep "some pattern that identifies my 200" *.html | perl compare_pages.pl

I also had to adjust my Interchange configuration so my script would be accepted as a "robot":

RobotUA compare-pages

As a result of this testing, I identified at least a few pages where the "old" and "new" forms did not result in the same cart configuration, so I was able to fix that before it went live and caused untold confusion.

I hope this excursion into page-testing has proven interesting.

My Favorite Git Commands

Git is a tool that all of us End Pointers use frequently. I was recently reviewing history on a server that I work on frequently, and I took note of the various git commands I use. I put together a list of the top git commands (and/or techniques) that I use with a brief explanation.

git commit -m "****"
This is a no-brainer – as it commits a set of changes to the repository. I always use the -m to set the git commit message instead of using an editor to do so. Edit: Jon recommends that new users not use -m, and that more advanced users use this sparingly, for good reasons described in the comments!

git checkout -b branchname
This is the first step to setting up a local branch. I use this one often as I set up local branches to separate changes for the various tasks I work on. This command creates and moves you to the new branch. Of course, if your branch already exists, git checkout branchname will check out the changes for that local branch that already exists.

git push origin branchname
After I've done a bit of work on my branch, I push it to the origin to a) back it up in another location (if applicable) and b) provide the ability for others to reference the branch.

git rebase origin/master
This one is very important to me, and our blog has featured a couple of articles about it (#1 and #2). A rebase rewinds your current changes (on your local branch), applies the changes from origin/master (or whatever branch you are rebasing against), and then reapplies your changes one by one. If there are any conflicts along the way, you are asked to resolve the conflicts, skip the commit, or abort the rebase. Using a rebase allows you to avoid those pesky merge commits which are not explicit in what changes they include and helps you keep a cleaner git history.

git push -f origin branchname
I use this one sparingly, and only if I'm the only one that's working on branchname. This comes up when you've rebased one of your local branches resulting in an altered history of branchname. When you attempt to push it to origin, you may see a message that origin/branchname has X commits different from your local branch. This command will forcefully push your branch to origin and overwrite its history.

git merge --squash branchname
After you've done a bit of work on branchname and you are ready to merge it into the master branch, you can use the --squash argument to squash/smush/combine all of your commits into one clump of changes. This command does not perform the commit itself, therefore it must be followed by a) review of the changes and b) git commit.

git branch -D branchname
If you are done with all of your work on branchname and it has been merged into master, you can delete it with this command! Edit: Phunk tells me that there is a difference between -D and -d, as with the latter option, git will refuse to delete a branch with unmerged changes, so -d is a safer option.

git push origin :branchname
Want to delete branchname from the origin? Run this command. You can leave branchname on the origin repository if you want, but I like to keep things clean with this command.

git checkout -t origin/someone_elses_branch
Use this command to set up a local branch to track another developers branch. As the acting technical project manager for one of my clients, I use this command to track Kamil's branch, in combination with the next command (cherry-pick), to get his work cleanly merged into master.

git cherry-pick hashhashhash
Git cherry-pick applies changes from a single commit (identified by hash) to your current working branch. As noted above, I typically use this after I've set up a local tracking branch from another developer to cherry-pick his or her commits onto the master branch in preparation for a deploy.

git stash, git stash apply
I only learned about git stash in the last year, however, it's become a go-to tool of mine. If I have some working changes that I don't want to commit, but a client asks me to commit another quick change, I will often stash the current changes (save them but not commit them), run a rebase to get my branch up to date, then push out the commit, then run git stash apply to restore my uncommitted changes.

Admittedly, several of my coworkers are git experts and have many more git tools in their toolboxes – I should ask one of them to follow-up on this article with additional advanced git commands I should be using! Also take note that for us End Pointers, DevCamps may influence our git toolbox because it allows us to have multiple instances (and copies of the production database) running at a time, which may require less management of git branches.

PostgreSQL 9.3 Released

Yesterday PostgreSQL 9.3 was released. It contains many great new features, below is a simple description of those I think are most important. There are many more than the short list, all of them can be find at PostgreSQL 9.3 Release Notes website.

One of the most important features of the new release is the long list of bug fixes and improvements making the 9.3 version faster. I think it is the main reason for upgrading. There are also many new features which your current application will not possibly use, but faster database is always better.

The new mechanism of background workers gives us quite new possibilities to run a custom process in the background. I've got a couple of ideas for implementing such background tasks like a custom message queue, or postgres log analyzer, or a tool for accessing PostgreSQL using HTTP (and JSON - just to have API like the NoSQL databases have).

Another nice feature, which I haven't checked yet, is data checksums. Something really useful for checking data consistency at data files level. It should make all the data updates slower, but I haven't checked how much slower, there will be another blog post about that.

There is also parallel pg_dump which will lead to faster backups.

The new Postgres version also has switched from SysV to Posix shared memory model. In short: you won't need setting SHMMAX and SHMALL any more.

There are also many new JSON functions, I used some of them in one my previous posts.

Another really great feature is the possibility of creating event triggers. So far you could create triggers on data changes. Since PostgreSQL 9.3 you can create a trigger on dropping or creating a table, or even on dropping another trigger.

Views also changed a lot. Simple views are updatable now, and there are materialized views as well.

The Foreign Data Wrapper mechanism has been enhanced. The mechanism allows you to map an external data source to a local view. There is also the great postgres_fdw shipped with Postgres 9.3. This library enables to easily map a table from another PostgreSQL. So you can access many different Postgres databases using one local database. And with materialized views you can even cache it.

Another feature worth mentioning is faster failover of replicated database, so when your master database fail, the failover switch to slave replica is much faster. If you use Postgres for you website, this simply means shorter time your website is offline, when your master database server fails.

More information you can find in the release announcement.

Fixed Navigation Bar: HTML, CSS, and JavaScript Breakdown

Something that I've seen frequently these days on content rich sites are fixed navigation bars, or small abbreviated header in the form of a horizontal bar at the top of the screen that shows after a user has scrolled below the header. Here's an example:

A live example of an abbreviated fixed navigation bar at the top of the articles at ABCNews.com.
The background has a grey opaque layer for demonstration purposes only.

I recently implemented this functionality for H2O, and I'll go through the tools needed to do this.

HTML Markup

First thing's first, you need the HTML markup for this. The only tricky thing here is that the horizontal bar must be outside of any wrapping dividers on the page that are confining the content to a set width. For example, if your content is limited to 900 pixels in width, the horizontal bar markup must be outside that constraint. Here's what the HTML might look like:

<div id="fixed_bar">
  <div class="wrapper">
    Links & content here.
  </div>
</div>

Note that in the above HTML, the "wrapper" div may be constraining the content width to match the remaining content, such as in the example above. This HTML may go at the beginning or end of the page HTML. I prefer to see it at the top of the page HTML. Another note is that other HTML elements may be used in place of the div, but I chose the div above because it defaults to a block element (an element where the CSS display default value is block).Finally, one more note here is that HTML5 elements can be used in place of the div as well (section or nav might make sense) if the site is HTML5 compliant.

CSS Settings

The secret to this interactive feature lies in the CSS settings. Here's what the CSS for my example code above might look like:

body {
  margin: 0px;
  padding: 0px;
}
#fixed_bar {
  width: 100%;
  position: fixed;
  z-index: 100; //exceed z-index of other elements on the page
  display: none;
  background: transparent url(/images/fixed_bar.png) bottom repeat-x;
  top: 0px;
}
.wrapper {
  width: 900px;
  margin: 0px auto;
}

Here are the important bits of the CSS above:

  • The fixed positioning setting is what keeps the bar in one place as the user scrolls up and down (line 7).
  • The body must have margin and padding settings at 0px to ensure that the fixed bar is flush against the top of the screen (line 2 & 3).
  • The fixed bar spans the width of the browser, but in this case, the .wrapper element is constrained to 900 pixels wide (line 14).
  • The default state of the #fixed_bar element is none, which is hidden upon page load (line 9).
  • The background of the #fixed_bar can be a small image with a gradient to transparency, such as in the example above (line 10).

Scroll Event Listener

Finally, after the HTML and CSS markup is good to go, here's what the interactive JavaScript (via jQuery) might look like:

var offset = 100; // some offset value for which when the header becomes hidden
jQuery(window).scroll(function() { //also an option: jQuery .on('scroll') method
  if(jQuery('#fixed_bar').is(':visible') && jQuery(window).scrollTop() < offset) {
    jQuery('#fixed_bar').fadeOut(200);
  } else if(!jQuery('#fixed_bar').is(':visible') && jQuery(window).scrollTop() > offset) {
    jQuery('#fixed_bar').fadeIn(200);
  }    
});

The jQuery above checks for two scenarios:

  • a) If the #fixed_bar div is visible and the scroll position is less than the offset, fade the #fixed_bar div to a hidden state.
  • b) If the #fixed_bar div is not visible and the scroll position is greater than the offset, fade the #fixed_bar div to a visible state.

These two use cases will toggle the visibility of the fixed bar to a hidden and visible state. With these combined elements of HTML, CSS, and jQuery & JavaScript, a nice user interactivity feature adds to the usability of the site by providing valuable links and content as the user scrolls down the page.