News

Welcome to End Point’s blog

Ongoing observations by End Point people

Better Git it in Your Soul

The article title "Better Git it in Your Soul" refers to the raucous, foot-stomping first track on the classic album Mingus Ah Um by Charles Mingus. The title is appropriate for the subject of version control with Git, as distributed version control as envisioned through Git represents a paradigm shift that must be embraced and understood in a fundamental way in order to shine.

The free software ecosystem abounds with good choices for version control software. Of particular interest is the emergence of distributed version control systems, which offer a fundamentally different approach to change management than do the popular Subversion and its venerable predecessor, CVS.

Choice is a wonderful thing, yet it brings a near-inevitable wringing of hands in its wake: how does an engineering team/company/guru choose between so many options? How can you be sure of choosing the right one? Perhaps you only recently moved from CVS to Subversion; does the prospect of moving again provoke groans, and force consideration of the choice between what is good and what is easy?

At End Point, we only relatively recently began using Subversion when, for a variety of reasons, we looked into the choices more deeply. It became clear that, for all of Subversion's improvements over CVS, it was not the choice for the future. After careful consideration of the offerings out there (carried out primarily by our CTO, Jon Jensen), we determined that distributed systems were the best choice, and that Git is the winner in that category.

Git is not the only choice for distributed version control, by any means. Mercurial (abbreviated, wisely, as "Hg") offers a similar feature set and approach; the two are really quite close to each other. We determined that Git is simply the more mature project. There are other options still. However, rather than consider them all here, we'll focus instead on what we like about Git and the benefits it offers over Subversion.

Distributed Development

Distributed development is at the heart of Git. In 2005, licensing concerns surrounding the use of BitKeeper forced Linus Torvalds to move Linux kernel development to a new version control system. Git is the result, started by Torvalds himself and later growing beyond his and the kernel development community's immediate needs into a full-fledged project in its own right. The Linux kernel development project, having proceeded in a distributed fashion already, needed a system that would continue this path, allowing for easy branching and merging across local and remote sources. Git's fundamental design reflects this, and the result is an incredibly fast, flexible, powerful system that opens up a staggering set of workflow options quite beyond the possibilities of any centralized version control system (such as Subversion or CVS).

How Distributed Development is Achieved

This section gets a bit technical; if your interest in this subject is more at the level of workflow, management, etc., you can safely skip down to the "Distributed Development in Practice" section.

Git "tracks content, not files". All files and directories managed in Git are reduced to basic Git "objects", the identity of each being determined by a SHA1 hash of its content. A file "X" added to Git, therefore, is not stored as "file X" within the Git repository; the file itself is stored as a "file" object identified by a SHA1 hash of the file "X"'s contents along with some Git headers about the data (identifying the Git object as a "file" object, including the file permissions, etc.). The name "X" is not found anywhere within this blob that represents "X". However, the Git object representing the directory in which file "X" lives will contain a reference to the "X" blob (via the SHA1 hash ID) and the name "X" for that reference. This is rather like Unix filesystems: a file on disk is just a file on disk, with no outer-world identity other than the local names given to references to that file on disk from directories that choose to reference it. The names are "local" because they are only meaningful within their respective directory.

Consider what this means: when file X in directory Y is changed, Git does not represent the event as "file Y/X changed thus: ...". Instead, a Git object representing the new state of file X is added to the Git repository (unless this is a state we've already seen for file X, in which case no new object is necessary thanks to the ID-by-content scheme described above). Additionally, a Git object representing the new state of directory Y is added, in which directory Y now references the new object created for X; this means that a change in file Y/X necessitates a change in Y. A change in Y would necessitate a change in its parent directory, to reference the new version of Y, and so on, all the way up to the top of the repository.

The end result is a great set of trees. Any change to the repository contents at any level results in a change to the entire tree. A "commit" in Git, then, is (at its simplest) an object that refers to the top-most tree object (by its SHA1 ID) as it was at the time of the commit. A commit object also may reference 0 to n parent commit objects, which when traced backwards reveal the commit history of the overall project. For workflow purposes, Git requires each commit object and its parent references in order to provide historical consistency. However, any given commit is a full snapshot of the project, and the previous commits are not needed in order to determine the actual contents of the repository at a given time. The commit object is, per usual, identified by the SHA1 hash of its contents, meaning that the name of the commit object is dependent upon its parent references, the state of the tree referenced by the commit, and some other details (the commit message, the timestamp, etc.).

Git's references bring this all together to enable distributed development. Refs may be simple, local refs, like the traditional HEAD reference that points to the "current" revision. Refs may also be remote, pointing to a branch within an entirely different repository that may be on a different machine. While the "commit" command is always local to the repository, commits can be pushed and pulled to and from remote references. Because objects are always named via SHA1 hashes of their contents, determining merge paths and the like between remote repositories is not difficult; either the Git repository has a certain Git object or it doesn't. Determining what changed in a given commit is a matter of comparing the tree object of that revision with that of the previous commit and finding objects that appear in one tree but not the other, or with different reference names. Branches in Git are merely references to a line of commits, and merges can be represented in Git as a commit with multiple parent commits (one commit per merged branch).

The simple genius of the design is quite a delight to behold, if one's leanings are sufficiently nerdy (as is apparently the case for the author). The internals are presented visually to great effect in the excellent article "Git for Computer Scientists".

Distributed Development in Practice

Fear not! For users new to Git, the workflow can closely match that of Subversion or CVS, with the primary difference being that "working copies" are replaced with clone repositories (and their own working copies). A single central Git repository can act as the "master" repository, the canonical representation of your project and its history. Individual users clone that repository (via git clone), meaning that each user gets a new, standalone Git repository containing a remote reference to the central repository. Each user's commits are local to the user's clone repository until the user pushes changes out to the central repository (via git push), which is the point at which some conflict resolution comes into play and the user is forced to get the latest updates from the master repository (a "pull" via git pull). Therefore, a commit could be considered a two-stage process: commit, then push. While it isn't necessary to push immediately after a commit (you can go as long as you want without pushing/pulling), this is rather akin to accumulating large sets of changes in a Subversion working copy without doing an update and commit; with each unpushed change, you're increasing the pain of the eventual conflict resolution.

A Subversion or CVS user at this point would wonder why this is beneficial, given that all we've done is introduce more steps into the regular workflow. However, you do not need to be committing to multiple remote repositories in order to see distributed development in action. A clone repository is a full repository in is own right. You can pursue branched development within that repository. Branching is cheap in Git, so it's easy to use branches for different lines of development. One branch could be used for some longer-running work that you might not expect to push out for a few days, while another branch could be used for quick fixes and adjustments that get pushed out quickly. Experimental work can take place in yet another branch. And so on.

This hopefully illustrates some of the benefits of the Git model. However, the opportunities for collaboration are vastly expanded. Consider: in the above example, there's nothing stopping you from cloning one of your "working copy" repositories, rather than cloning the central master repository. For larger teams of engineers working on many concurrent projects, this aspect of Git can provide huge advantages. However, upon getting comfortable with Git, smaller teams (even just a pair of engineers) can use this method of collaboration for projects of any size, accumulating a set of changes in one repository without touching the "master" repository at all, then pushing the changes out to the master once everything is in place. There is no need to create and manage branches for this kind of work within the master repository; the master can remain entirely clean from such things.

Linus Torvalds illustrates some of the differences inherent in Git's approach within this email discussing practices in Linux kernel development. As he points out, Git places a greater focus on social organization for control and project management; for many projects, particularly smaller ones, a single master repository may suffice, but for larger projects, there may be several centralized, specialized repositories, all of which providing source material for a single public-facing canonical repository. Such is the case of kernel development, in which networking development works with one repository while cryptography works with a different one, and Torvalds himself responsible for merging changes from both into his master repository. The point is: the version control system can be applied in any fashion appropriate to the social needs of the project, and is infinitely more flexible than a centralized version control system as a result. Git grows and changes with your project; Subversion or CVS force a particular workflow and give you few options otherwise.

Other Advantages of Git

Git is blazingly fast compared with Subversion for large sets of files. Let's be honest: Subversion is painfully slow with even modestly-sized directories. Git is not. (Though it still may bog down on big sets of large binaries.)

Despite using a snapshot model for representing data, Git is quite efficient in its use of disk space. When running in a Unix environment, Git will try to use hard links whereever it can to minimize space consumed. The snapshot model in Git is also normalized to a significant degree, though it does fall short of perfect normalization (perfect normalization would require abstracting out the Git object data from the Git object headers); while Git creates a lot of references within commits and trees, the big chunks of data that come from userland files are minimized to a reasonable extent (all Git objects are zlib-compressed to save space).

Branching and merging are a fundamental aspect of Git's design; they have to be in order to support distributed development. Branching in CVS is a painful exercise; while CVS advanced the state-of-the-art in its own time, CVS' design was simply not up to the task of real branching and merging. While claims to have solved the branching problem have been made in reference to Subversion, such claims are simply false: the branching tools may be improved in comparison with CVS, but Subversion still operates in the same revision modeling paradigm as CVS and is therefore fundamentally ill-suited for the task. That paradigm is broken when branching and merging are considered. Branching and merging remain an exceptional activity. In Git, branching is an everyday activity. To refer to this difference as a paradigm shift is not an exaggeration.

Any one of these advantages taken alone may not seem all that exciting, particularly if you don't actually use version control software, or if you work on projects of small scope, minimal complexity, and little collaboration. However, they all matter. Performance matters: on a project involving multiple engineers over the course of several months, full checkouts can be a regular occurence, and the speed of those checkouts affects productivity (not to mention job satisfaction). Branching and merging may be thought of as things that "this project won't need", but such thinking simply reflects the old paradigm in which branching is a specialized activity to be reserved for subprojects of significant scope/complexity or for marking milestones within a given project. Upon consideration of how human beings work and collaborate, and the trial-and-error nature of much of software engineering, it becomes obvious that branching and merging can be an invaluable tool in one's toolkit.

In Conclusion

Subversion offers some real, hard-won improvements over CVS. Many people have worked very hard over many years to achieve this. This contribution to the free software ecosystem should not be overlooked or denigrated. That said, there is simply no technological reason to use Subversion for new projects, period.

Git offers a better way forward. Guides such as Git for SVN Users can get Subversion users started gracefully. Even if you and your team don't immediately use the full power of Git, chances are high that you will use it over time as people gain familiarity with the tools and the project evolves. The power Git offers is so fundamental to the Git design itself that it is practically impossible to not use it upon working with Git for any significant length of time. In any case, any meaningful thing you might do in Subversion or CVS can be done in Git. The reverse cannot be said: Git is far, far more flexible than any centralized system.

When first embarking with Git, it is critical to recognize these fundamental differences in design and ability. While Git can be productively used in a workflow similar to that of centralized version control systems, its scope goes far beyond and the basic design of Git really needs to be absorbed and understood by involved engineers in order to add the most value.

Better use Git. Better Git it in your soul.

Triangle Research Joins End Point

End Point is pleased to announce the merging of the web development and hosting firm Triangle Research into our company as of December 1, 2007. Triangle brings with it much experience in the Interchange development sphere, as well as a vibrant hosting and support business. With this addition to our team, End Point increases its diversity of skills and knowledge while gaining many exciting clients. Carl Bailey, owner of Triangle, will continue his leadership role as he manages his former Triangle employees Richard Templet and Chris Kershaw, along with Dan Collis-Puro, a veteran End Point engineer.

At the same time, End Point ranks are growing further as we welcome two other excellent developers. With sharp programming abilities in the diverse areas of low-level systems and web applications development, Charles Curley and Jordan Adler will be a good fit and we're excited to have them on board.

Bucardo: Replication for PostgreSQL

Overview

Bucardo, an asynchronous multi-master replication system for PostgreSQL, was recently released by Greg Sabino Mullane. First previewed at this year's PostgreSQL Conference in Ottawa, this program was developed for Backcountry.com to help with their complex database needs.

Bucardo allows a Postgres database to be replicated to another Postgres database, by grouping together tables in transaction-safe manner. Each group of tables can be set up in one of three modes:

  1. The table can be set as master-master to the other database, so that any changes to either side are then propagated to the other one.
  2. The table can be set up as master-slave, so that all changes made to one database are replicated to the second one.
  3. It can be set up in "fullcopy" mode, which simply makes a full copy of the table from the master to the slave, removing any data already on the slave.

Master-master replication is facilitated by standard conflict resolution routines, as well as the ability to drop in your own by writing small Perl routines. This custom code can alse be written to handle exceptions that often occur in master-master replication situations, such as a unique constraint on a non-primary key column.

History

Backcountry.com, an online retailer of high-end outdoor gear, needed a way to keep their complex, high-volume, Postgres databases in sync with each other in near real-time, and turned to End Point for a solution. In 2002, the first version of Bucardo was rolled out live, and reliably replicated billions of rows. In 2006, Bucardo was rewritten to employ new features, including a robust daemon model, more flexible configuration and logging, custom conflict and exception handling routines, much faster replication times, and a higher level of self-maintenance. This new version has been in production at Backcountry.com since November 2006.

In September 2007, the source code for Bucardo version 3.0.6 was released under the same license as Postgres itself, the flexible BSD license. A website and mailing lists were created to help foster Bucardo's development. The website can be found at bucardo.org.

RailsConf 2007 Conference Report

From Wednesday, May 17th to Saturday, May 20th, 2007, around 1600 people attended the Rails Conference (RailsConf) in beautiful Portland, Oregon, at the Oregon Convention Center. I was among them, and this is my report.

Conferences offer a varied mix of experiences to the attendee. There is the experience of a new city for many attendees, for example. While everyone refers to the technical sessions as their main draw, the social aspects of a conference are equally important and valuable. RailsConf offered plenty of socializing opportunities by providing continental breakfast, bagged salad or sandwich lunches, and coffee and soda breaks twice per day. There were also many evening parties sponsored by conference expo floor vendors and by some of Portland's Rails development companies and the local Ruby brigade, PDX.rb.

From my informal prodding and questioning of attendees I met during these social times, it seemed almost half the attendees had come to RailsConf to get their first training in Rails. Between 45 to 50 percent of attendees I talked to had no previous real world experience with Rails or Ruby, having worked a number of years in other platforms, mostly in Java, but a few had worked on MS .Net as well. A few of the attendees who where new to Rails had PHP, Perl, Python, ColdFusion, or ASP experience. Those who did have some previous Rails experience had learned it after the fall of 2005, which is when I first started learning Rails. Their average experience was one year. Except for the few Rails core members and conference organizers and speakers I met in the hallways, I met few other attendees who had been using Rails for much longer than a year.

The majority of the conference technical sessions were very valuable and informative. The most memorable one for me was "Clean Code" by Robert Martin, one of the object oriented and component programming gurus of the early '90s, and author of "Agile Software Development - Principles, Patterns and Practices." In this talk, Mr. Martin proceeded to write a command-line parser in Ruby, with tests in RSpec. I really enjoyed his live coding of the project with narration. It was great.

There were many talks on strategies for improving the performance of Rails applications, but I think the one most worth attending was the Cache-Fu talk by Chris Wanstrath. In this talk he explained first how he had worked on Gamespot.com, and had helped develop a PHP system which used memcached to withstand 50 million page hits in a 24 hour period. He then gave an overview of the several Rails plugins and gems that implement memcached in Ruby. He finally showed the latest iteration on all that was learned from the other memcached Ruby implementations, the cache-fu plugin. He then provided best practices on session, page, and action caching with memcached, particularly when used in a clustered web and app server environment.

There were also many other talks about improving one's effectiveness as a Rails developer from a best practices and tools point of view. The presentation titled "Fixtures: Friend or Foe" by Tom Preston-Werner, combined each approach as he gave a best practices survey of how to deal with fixtures for unit and functional testing, as well as presenting a plugin that implemented a combined distillation of the best of the best practices under one tool.

One interesting aspect was the popularity of talks dedicated to discussing the performance of alternative web servers, such as nginx and the evented Mongrel fork. These talks were some of the most popular. I found this interesting, given that nearly half of the attendees were likely new to Rails as a development platform. While web server scalability is important for large scale deployments, many of the attendees were not likely to be working on the next million user web hit. Although it was obvious from the buzz in the hallways that many attendees wish they did.

The one disappointing aspect of RailsConf is one I noticed to be a part of O'Reilly Media's conference planning modus operandi since at least 2005. This was the giving of keynote time to diamond conference sponsors. I understand that diamond sponsors want something more in return of their sponsorship than a few hanging banners and logos on conference materials. On the second day of the conference, Sun Microsystems and Thoughtworks spoke to attendees for almost an hour. While the speakers and their speeches had their merits, the subject was mainly one of product marketing, and not of Rails community, Rails technical knowledge, or even about development skill building. These keynotes, compared to the other four keynotes of the conference, were not very well received. Many people I spoke to felt they took valuable time from what we really wanted to get at as quickly as possible: Rails technical know-how. I don't think O'Reilly, Ruby Central, Sun Microsystems and Thoughtworks know how little they were respected for filling a keynote with sales talk.

One interesting activity of this year's RailsConf was the unofficial "RejectConf," where people were invited to give presentations that were not accepted by the RailsConf organizers. It was held at 10 p.m. on Friday, May 19th, after the last RailsConf Birds-of-a-feather session closed. It was a crowded event at less than 90 people, mostly because of the size of the small warehouse where it was held, three blocks from the Oregon Convention Center. There were three RailsConf sponsor parties with open bars that night, which might explain the low attendance. The warehouse could not fit more people at any rate. Attendees were spilling out the door as the event started.

RejectConf was a Gong Show type of event, where presenters had up to 5 minutes to present their plugins, gems, Rails or Ruby applications, or just their latest idea. The only rule was that it had to be Ruby related, and they could get booed off the stage at any time within their 5 minutes. The most amazing thing is that nobody got booed, everyone was very respectful of the presenters, and many of the presentations graciously took less than their alloted time.

The upside of this was that the hour was taken up with presentations about lots of amazing little projects. Nearly 20 people presented their pet projects, little Ruby things that they were proud to show off, and which had solved a development problem for them in the last recent months. The downside was that it was impossible to take notes! You had to be really attentive, or you could miss half of someone's important message. There were so many people interested in presenting, that there was no time for Q & A at the event.

If I could change one thing about RejectConf, I would hold it halfway through next year's RailsConf, and not on the last night of the conference. That way, people have a better chance of meeting up with that person from that super interesting presentation from RejectConf. The best one could hope for this year was recognizing a face and talking to them in the main conference hallways, during the very last six hours of the conference.

Speaking of the last hours of the event... To me, the first hour of the last day of the conference was the highlight of the event. The Saturday morning keynote was titled The Rails Way. According to the site, The Rails Way is all about teaching best practices in Rails application design. During this keynote, Jamis Buck and Michael Koziarski dissected a Rails application's source code and proceeded to explain how they would rewrite it in a way more typical of an experienced Ruby and Rails developer.

All in all, this last talk epitomizes why so many had signed up for the RailsConf 2007 event. We were not interested in enterprise sales pitches and marketing talk. Many attendees were not interested in the latest web server re-implementation, no matter how many theoretical dummy HTTP requests it can serve per second. Many were not after how to monetize the world's N thousandth social network application.

We came to learn best practices and tools to make us more effective Rails developers. Many talks provided this in spades, but none better than "The Rails Way" keynote and website.

RailsConf showed me the Rails community as very dynamic, diverse, energetic and enthusiastic. There were developers from all walks of life among the attendees.

I enjoyed the conference very much, because of the many Rails tricks learned and the opportunities to connect to the varied attendees. I made some new friends, and the city of Portland is always a pleasure to visit.

(By PJ Cabrera, reposted here.)

Get Out of Technical Debt Now!

It's now been a year since I attended YAPC::NA 2006 in Chicago with Brian Miller and the talk that we've spoken of most frequently, and cited most usefully in our work, was "Get Out of Technical Debt Now!" by Andy Lester.

Most of the concepts and examples he gave were not new to us, nor to most of the attendees, I think. But his debt analogy made a coherent story out of ideas, maxims, and experiences that were previously too disjointed and differently labeled to pull together into a single comprehensible motivational package.

Are we out of technical debt? Not yet. I suppose we never will be entirely as long as we continue to work, since it's impossible to avoid accruing some new debt. But prioritizing debts and paying off those with the highest "interest rate" and benefit means we don't need to despair.

I highly recommend everyone read Andy's slides, watch the video (one version has the slides incorporated), and read the Portland Pattern Repository wiki page on technical debt. Do all three, spread out over a few months, to re-motivate and remind yourself in the busy haze of daily work.

(Addendum:)

Other reading on technical debt:

Red Hat Enterprise Linux 4 Security Report

Mark Cox, director of the Red Hat Security Reponse Team, has published a security report of the first two years of Red Hat Enterprise Linux 4, which was released in February 2005. He discusses the vulnerabilities, threats, time to release of updates, and mitigation techniques the operating system uses.

It is interesting to note that the vast majority of security vulnerabilities affected software not used on servers: The Mozilla browser/email suite, Gaim instant messenger, xpdf, etc. Some of the server vulnerabilities would require certain user input to be exploited, such as running Links or Lynx, calling libtiff, or running a malicious binary. Others require less common setups such as Perl's suidperl or Bluetooth drivers, or local shell access.

Nothing is completely secure, but Red Hat Enterprise Linux, configured well and kept updated, has a very good track record so far.

Evangelizing Test-Driven Development

I read Practices of an Agile Developer shortly after it was published, and I got pretty fired up about many ideas in it, with particular interest in test-driven development. From that point I did progressively more with testing in my day-to-day work, but everything changed for me once I went all-out and literally employed "test-driven development" for a minor project where I once wouldn't have worried about testing at all.

If you're not familiar with the principle, it basically boils down to this: When you are developing something, write the tests first.

I originally greeted this idea with skepticism, or viewed it as unrealistic. It also struck me as overkill for small projects. However, as I've been writing more tests, and finally came around to writing tests first, it's really demonstrated its value to me. I'll list an abstract set of benefits, and then provide a hopefully-not-too-tedious example.

Benefits

1. Cleaner interfaces

In order to test something, you test its interfaces. Which means you think through how the interface would really need to work from the user's perspective. Of course, one should always plan a clean interface, but writing a test helps you think more concretely about what the exact inputs and output should be. Furthermore, it serves to effectively document those interfaces by demonstrating, in code, the expected behaviors. It isn't exactly a substitute for documentation, but it's worlds better than code with no tests and no docs. In fact, I would argue that code with working tests and no docs is better than code with docs but no tests.

2. Separation of concerns

When trying to write test cases for some new widget interface, it can become clear early on when you're trying to put too much into your magical widget, and really need to be building separate, less magical, widgets.

3. Coverage

Manual testing of something in the web-app space tends to mean banging on the website front-end. This does not guarantee coverage of all the functional possibilities in your component. On the other hand, if you commit yourself to writing the tests for a method before you implement a method, you've got coverage on all your methods.

Of course, as your interface/component increases in complexity, you need to test for corner cases, complex interactions, etc. Just having one test per component method would only suffice in very simple cases, and probably not even then.

You can use Devel::Cover (in Perl) to check the percentage of code covered by your tests, meaning you can empirically measure how rigorous your testing is.

4. Reliability

Manual testing depends on the whims, time, patience, and attention to detail of a human test user. Users are people, and not to be trusted, no matter how good their intentions. Users do not do the same thing the same way every time. Users forget. They get impatient. They make bad assumptions. Etc. Users are people, people are animals, and animals are shifty. Would you trust a dog to test your app? A racoon? I would not. Mammals are silly things. Birds, reptiles, fish, etc. are also silly.

5. Reusability

Manual tests take time, and that time cannot be recovered. Furthermore, the manual test does not produce anything lasting; your testing procedures and results are excreted away into the ether, never to benefit anyone or anything ever again. They might as well have never happened, at least once you change anything at all in your code, data, or the environment it all runs in, when everything needs to be tested again.

Tests written alongside the test target component are concrete scripts that can be used and used again. They can be run any time you change the target component as a basic sanity check. They can be used any time you feel like it, even if it isn't strictly necessary. They can be incorporated into a larger framework to run periodic tests of a full staging environment. Thus, the time spent developing a test is an investment in the future. The time spent manually testing a component that could be tested in automated fashion simply cannot add any kind of long-term value.

6. Precision

Manual testing works in big chunks; you're testing the overall observable behavior of a system. This is valuable and has its place. But when something goes wrong, you then need to peel back the layers of your system onion to spot the piece(s) that cause your problem.

With tests written for each method/interface/etc. as you go, you exercise each portion of the interface in isolation (or as close to isolation as can be realistically achieved). You will find your little problem spots long before you ever get to manual testing. You can enter the manual testing portion with much greater confidence knowing that all the little pieces in your system have been empirically demonstrated to "work". Your manual testing can focus only on higher-level concerns, and will usually take much less time.

Objections refuted

"But writing tests takes time, and I don't want to pay extra for it!"

Nobody really wants to pay for anything. However, you expect the software you pay for to work, which means you expect the developers to test it. Therefore, given that the developers (and you) both are going to test the software, it follows that you do have time to write the tests.

In my experience, writing the test first -- or at least alongside -- the new component does not negatively impact the overall delivery time of a new component. It can actually positively impact delivery time compared with the tedium of manual testing; it is very inexpensive to run the same test script over and over again while debugging, while it is more time-consuming to click "reload" or resubmit a form in your web browser and then manually monitor the behaviors. Humans get tired of scanning lots of output looking for trouble and can miss problems; automated tests always work the same way, so the effort put into them is an investment that pays off as long as the code exists.

It may vary for each individual, I suppose. For me, writing the test first made a lot of sense.

"You don't really mean 'write the test first', do you? That doesn't make sense."

It seems overly structured at first, until you get into the swing of it.

For my latest little project, I needed to create a newsletter recipient list. That meant adding a table to the database and creating a few actions that would manipulate the table's contents. I used Moose to build a helper object module; this object should handle all the details of manipulating the table, all done through a simple interface with methods that approximately matched the language of the problem domain. In this case: "add_recipients", "remove_recipients", and "recipients", all working on top of a "newsletter_recipients" table; can you guess what each method does?

So, I fired up vim, with my new Perl module in place. Then, I used vsplit to vertically split my editor screen with an empty test script in the other window. I have my test script use Test::More's require_ok() function to make sure my new module can be loaded:

use lib '/path/to/my/custom/lib';
use Test::More tests => 1;

require_ok('My::Newsletter::RecipientList');
I save the script and run it. Lo and behold, my test script failed! Oh no! So I go back to vim, go to the window with my new module, and create the package skeleton:

package My::Newsletter::RecipientList;

use strict;
use warnings;
use Moose;

1;
Save the file. Get out of vim, run the script. It passes.

Then I decide that my new object needs a database handle to be passed in, and also needs the newsletter name to be specified. Both would be object instance attributes. And given that it's a Moose-derived object, it'll use "new" as the constructor.

Back to vim, adding to the test:

use Test::More tests => 4;
use DBI;

my $module = 'My::Newsletter::RecipientList';
my $newsletter = 'some_newsletter';

my $dbh = DBI->connect(...);
my $lister = $module->new(
    dbh => $dbh,
    newsletter_name => $newsletter_name,
);

# Test object blessing
isa_ok(
    $lister,
    $module,
);

# Test the dbh attribute
cmp_ok(
    $lister->dbh,
    '==',
    $dbh,
    'Database handle attribute',
);

# Test the newsletter name
cmp_ok(
    $lister->newsletter_name,
    'eq',
    $newsletter,
    'Newsletter name attribute',
);
Save the script, run it, and I now have three failing tests. Go back to vim, go to my module, and add the code that provides the dbh and newsletter_name attributes (which is trivial in Moose). Save the module, run the script, and now those three tests pass.

Repeat. Until you're done.

My example above is fairly tedious; I'm effectively testing that my test script is using the correct Perl search path, and that Moose is setting up attributes correctly; I haven't tested anything specific to the problem domain. However, I found that as I sat down to write the next test for my module's interface, the tests started flowing, and writing the tests became a way of sorting out mentally how the interface should behave. At one point, I wrote something like 6 or 7 tests all in a row before returning to the module to implement the stuff that would actually get tested.

Final thoughts

Object-oriented programming books and classes talk about documenting your preconditions and postconditions, and documenting the "object contract". It doesn't always work that well to figure these things out on paper or in your head ahead of time, though, and documenting the stuff doesn't always help much either. However, writing the tests effectively codifies the object contract, the expectations, everything. And it gives you something tangible you'll be able to use for the rest of the project's life.

These methods obviously don't lend themselves easily to every kind of development project. It would be entirely different trying to write tests for behaviors that are implemented in an ITL/PHP/JSP/ASP code-heavy page, for instance. However, any time you're getting into significant business logic, you really shouldn't have that stuff in your page anyway. It's much better off living separately in a module that you can run and test completely outside the context of your application server.

Furthermore, you don't need to have some big test suite framework in place to deal with this. Just put your tests in a reasonable place (I like using a t/ subdirectory alongside the module being tested) and have them use Test::More. If they use Test::More, then it's really easy to have them integrate into something larger under Test::Harness later on. A bunch of tests sitting in one directory is vastly preferable to no tests at all. But using something like Test::More is much better than writing one-off Perl scripts that use your own custom routines and whatnot, for the Test::Harness integration and because it doesn't require you to parse the output at all to determine success or failure. Test::More is really, really easy to use, yet does most of what you'd ever need, at least what's possible to generalize.

Try it, and I think you'll like it.

(This was originally an internal email Ethan sent on November 3, 2006, and has been lightly edited by Jon Jensen.)

USPS changes the Web Tools Rate Calculator API

End Point offers integration with online shipping APIs to provide "live lookups" of rates.

Advantages of "live lookups":

  • Current rates
  • Includes additional costs such as fuel surcharges
  • No manual maintenance of rate tables

Disadvantages of "live lookups":

  • Dependent on the availability and performance of the rate service
  • Planning, programming and rolling out API changes

CH CH CH CH CHANGES!

Speaking of changes, the USPS has changed shipping rates as of May 14, 2007 (non-tech-friendly details here). The changes include updates to rates, package attributes and shipping methods. These changes impact the XML-based Web Tools Rate Calculator, in some cases breaking lookups altogether. As of press time, the USPS hasn't documented the changes to the API. Broken lookups appear to be confined mostly to international shipping.

Many of the changes represent a simplification and restructuring of international shipping methods, detailed here. This tweaking of international shipping methods is definitely an improvement — there were too many confusing options before. Unfortunately, these tweaks aren't backwards compatible — meaning nearly all old-style international API lookups are broken. The USPS still (as of press time) hasn't documented technical changes to the API.

TECHIE BACKGROUND - OR "HOW THINGS GOT BROKEN"

When you send an international rate request, the USPS API returns a "response" XML packet that includes the available shipping methods, e.g. "EXPRESS MAIL INTERNATIONAL (EMS)". Most of these shipping method names have changed in the response, leaving rate calculation code developed against the last released API in a broken state — you can't match a requested rate to a specific response.

Fortunately, the USPS has provided an undocumented staging API for customer testing, so we've been able to deduce most of the changes to the shipping method names through a battery of tests. Unfortunately, until the USPS releases new documentation we're left with an educated guess as to how to fix broken lookups.

CONCLUSION

The changes to international shipping methods are refreshing. However, the USPS release of an undocumented API into production for all customers has left developers guessing, especially since those changes aren't backwards-compatible.

SELECTED TIMELINE OF API CHANGES

  • March 21, 2007. USPS notifies users that the Rate Calculator API will be changing
  • April 26, 2007. USPS notifies users that they can use the staging API for testing ... on April 19.
  • May 14, 2007. USPS switches to the still undocumented production API, breaking most international rate calculation lookups in the process.
(By Dan Collis-Puro, reposted here.)

New edition of The Book of JavaScript reviewed

February 21, 2007: New edition of The Book of JavaScript reviewed

The Book of JavaScript (2nd edition) is a new and comprehensive introduction to the JavaScript language presented in an entertaining, practical format. I have significant practical experience with JavaScript, so I do not consider myself in the target audience for this book; however, I still found much of it useful so it will remain as a valuable reference on my bookshelf.

My full review of the book was just published at OS News.

Interchange 5.4.2 released

Today a new version on the Interchange 5.4 stable branch was released. This was primarily a bugfix release, as documented in the release notes summary.

Greg Sabino Mullane's PostgreSQL tips and how-to articles

End Point engineer Greg Sabino Mullane is a prolific writer of PostgreSQL tips, suggestions, and how-to articles in his Planet PostgreSQL blog. Some posts involve emergency procedures to diagnose and fix an ailing database, while others are helpful recipes developers can use. The strengths and weaknesses of various approaches are compared, and there are lots of neat things to learn along the way.

Here's my list of favorites from those Greg has posted since last March, in reverse chronological order:

Enjoy!

Hardware Monitoring with Nagios on OpenBSD

At End Point we use Nagios and its remote client, NRPE, to monitor servers we manage and alert us to any problems. Aside from the usual monitoring of remote accessibility of services such as a website, database, SSH, etc., it's very helpful to have monitoring of memory usage, disk space, number of processes, and CPU load.

In this detailed article Dan Collis-Puro shows how to go even further and monitor the CPU and case temperature, and fan speeds, to alert administrators to hardware failures so they can be remedied before they become catastrophic.