End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Camps presentation at UTOSC 2008

Friday evening I did a presentation on and demonstration of our "development camps" at the 2008 Utah Open Source Conference in Salt Lake City. Attendees seemed to get what camps are all about, asked some good questions, and we had some good conversations afterwards. You can read my presentation abstract and my slides and notes, and more will be coming soon at the camps website.

I'll post more later on some talks I attended and enjoyed at the conference.

nginx and lighttpd deployments growing

Apache httpd is great. But it's good to see Netcraft report that nginx and lighttpd continue to grow in popularity as well. Having active competition in the free software web server space is really beneficial to everyone, and these very lightweight and fast servers fill an important niche for dedicated static file serving, homegrown CDNs, etc. Thanks to all the developers involved!

Moose roles

Perl programmers,

Moose roles give a really nice way of maximizing code reuse within an object system, while favoring composition over inheritance. This makes for shallower inheritance trees, reduced method dispatch spaghettification, and a more comprehensible, maintainable, extensible codebase overall. Revel in the glory.

That is all.

Acts As Xapian - It Just Works

I just recently started listening to the podcast done by the guys at RailsEnvy. It's an excellent resource for keeping up on what's new in the Rails world and it's how I found out about the new acts_as_xapian search plugin for Rails. The podcast mentioned this blog post which contains a very thorough rundown of all the different full-text search options currently available for rails. The timing of this article couldn't have been better since I was in the market for a new solution.

I was approaching a deadline on a client project here at End Point and I was having lots of trouble with my existing search solution which was acts_as_ferret. Setting up ferret was relatively easy and I was very impressed with the Lucene syntax that it supported. It seemed like a perfect a solution at first but then came "the troubles."

Ferret is extremely fragile. The slightest problem and your server will just crash. What was causing the crash? Unfortunately the server logs won't give you much help there. You will receive some cryptic message coming from the C++ library if you're lucky. Note that I skipped the suggested Drb server setup since this was a development box.

After a while I would notice something wrong in my model code that might have caused an error while updating the search index. Unfortunately this was impossible to verify since I could not predictably reproduce the error. So in the end, I think there may have been issues with my model fields but ferret was of no help in tracking these problems down. The final straw came when the client started testing and almost immediately crashed the server after doing a search.

Enter acts_as_xapian. Jim Mulholland's excellent tutorial was pretty much all I needed to get it up and running on my Mac. Documentation for acts_as_xapian is a bit thin. It consists primarily of the afore mentioned tutorial and a very detailed README. The mailing list is starting to become more active, however, and you are likely to get a response there to any thoughtful questions you might have.

One major difference with xapian (vs. ferret) is that it does not rebuild your index automatically with each model update. When you modify an ActiveRecord instance it will update the acts_as_xapian_jobs table with the id and model type of your record so that the index can be updated later. The index is then updated via a rake command that you can easily schedule via cron. You can also rebuild the entire index using a different rake command but that shouldn't really be necessary.

I was a bit concerned about the lack of a continuously updated index but I came to realize that it has some significant advantages. The biggest advantage is that it's much faster to update your model records since you are not waiting for the re-indexing to complete on the same thread. It also means you can skip the step of setting up a separate Drb server for ferret in your production environment.

With xapian you can index "related fields" in other models by constructing a pseudo-attribute in your model that returns the value of the associated model as a text string. Ferret allows you to do this as well, but unlike ferret, xapian gives excellent feedback about any mistakes you might have made while constructing them. If you have a nil exception somewhere in one of these related fields, xapian will complain and tell you exactly what line it's bombing out.

I was also able to setup paging for my search results with paginating_find which I prefer to will_paginate (just a personal preference -- nothing wrong with will_paginate). There is also a cool feature that will suggest other possible terms ("Did you mean?") if your search returns no results. So far the only disappointment has been the lack of an obvious way to do searches on specific fields.

If you are in the market for a new full-text search solution for Rails, you should really give xapian a try.

Review of The Book of IMAP

Former End Point employee Ryan Masters had his book review of the No Starch Press published over at OSNews.com. Sounds like it was a decent book.

On excellence

The interwebular collective cup runneth over with blog articles addressing the subject of what makes good software engineers good. It is a topic about which many opinions are expressed. What is less commonly addressed, however, is the possibility that the very qualities that make for good software engineers may also make for good technical leaders, good managers, and just good coworkers, period.

At End Point, we toss the term "ownership" around quite a lot, in a variety of contexts. When a particular task or responsibility goes from one person to another, we mention "ownership" when we need to communicate the significance and scope of the responsibility in question. The term may apply to a software engineering task, in which "owning" the problem means taking responsibility for all aspects of the engineering work, across the software stack, from prototyping to full development to deployment. It may also apply to a managerial or leadership role, for which "ownership" implies responsibility for all parties involved on a given project, task, or team, with the "soft" issues of human beings mattering at least as much -- and probably more -- as the "hard" issues of machines, software, etc.

Ultimately, ownership comes down to this: taking complete personal responsibility for all aspects of the problem at hand.

The owner does not wait helplessly for guidance from others; rather, the owner pushes things to a conclusion, and if guidance is truly necessary the owner pushes for it repeatedly until guidance is received. The owner does not put hard or potentially unpleasant things off, but instead embraces them and addresses them immediately. The owner does not limit him- or herself to a narrow comfort zone: he is not afraid to go to different parts of the software stack, she does not hesitate to call the business person to have difficult discussions about scope creep or design flaws or difficult tradeoff decisions. The owner never allows for the actions of others to serve as an excuse, and never regards any aspect of the task to be only somebody else's problem. The owner treats the deliverable as a reflection of him- or herself. The owner looks objectively at the deliverable and does not shy away from criticism, instead embracing it for the learning opportunities that can come from reflection and analysis.

In other words, to act with ownership is to act as if all aspects of the project or duty are important and are the owner's responsibility. That does not mean acting with ownership requires personally doing everything; owners can and should delegate as appropriate. But delegation does not mean that the responsibility is now somebody else's problem: the owner engages in regular communication with the involved parties, and keeps the task or project in question on track. The owner understands that it still ultimately belongs to him or her. Furthermore, the owner recognizes ownership in others, appreciates it for what it is, expresses gratitude and respect for it, and does not feel threatened by it.

In our work, this kind of ownership is the root of excellence. The domain of ownership may vary wildly between individuals, but the degree of ownership exhibited directly correlates to the excellence the individual demonstrates within that domain. Ownership applies to everything, whether it means writing the most reliable, maintainable code possible, designing the most scalable systems, or ensuring that everybody on your team is productive and happy. If you are an owner, you understand this: there are no excuses, and it's up to you. Furthermore, to make it happen, the owner also understands: at some point, the talking ends and the doing begins.

Ownership for a given problem may be granted to a particular individual. With a team of people, formal identification of a problem's owner is necessary for operational clarity. However, such ownership is ultimately little more than a formality: true ownership can only be taken, and never granted. This model is at the heart of any meritocracy, and is true to the general principles of free software development.

Critically, ownership is part of an oral tradition, a cultural phenomenon. Ownership breeds excellence across teams. It spreads as people naturally admire it and emulate it when they encounter such excellence first-hand. It is never complete; no one individual with a full schedule will successfully display ownership across the board on all issues. We fall down sometimes. We overextend ourselves. We have to pick our battles. But part of ownership is recognizing mistakes so that we can do better.

It is my distinct pleasure to have witnessed such ownership, on numerous occasions. I've seen a coworker leap into system-wide problems -- on systems this coworker ordinarily has little or nothing to do with -- and not rest until every aspect of the system is in order and all members of the team involved with the support of that system understand exactly what it is that needed to be done. I've seen another tackle fiendishly difficult technical problems involving complex data modeling or hairy lock management in Postgres, pushing through all the roadblocks until success was achieved, no matter how challenging the problem and no matter how much investigation and mental gymnastics were required. Yet I've seen the same people show the same dedication and caring for work done on much smaller systems, on less interesting problems, in less dramatic scenarios. This is a small sampling from countless examples.

In my experience, the only proper response to such excellence is admiration, and it inspires a desire to emulate these fantastic coworkers to be similarly excellent and to earn their respect.

Again: it's a cultural phenomenon. People do not learn this by reading books. Reading about it, and talking about it, are essential for understanding. Ultimately, though, doing is all that matters.

In the liner notes to his Live at the Blue Note box set, the great pianist Keith Jarrett thanks his illustrious colleagues (Jack DeJohnette and Gary Peacock) for reminding him that "the only standards worth having are the highest ones." What more need be said?

Subversion or CVS metadata exposure

At the talk "Rails Security" by Jonathan Weiss at LinuxTag 2008, he mentioned (among other things) a possible security problem for sites being run out of a Subversion (or CVS or even RCS) working copy, where the metadata inside the .svn/ or CVS/ directories may be exposed to the world. This post by someone else explains it nicely.

Interchange appears not to be vulnerable to this by default as it will only serve files that end in .html, and all the .svn/ and CVS/ filenames have no suffix, or end with .svn-base, so are not served by Interchange.

But if the docroot is served from a Subversion or CVS checkout, its metadata files are likely served to the world -- relatively harmless, but can reveal internal file paths, hostnames, and OS account names.

For PHP or SSI, on the other hand, this could be a disaster, as the complete source to all files could be revealed, since the .svn-base suffix will cause Apache not to parse the code as PHP but pass through the source.

If you use Subversion, CVS, or RCS on any project, I recommend you look into how your files are being served and see if there's anything being exposed. Checkouts from Git, Mercurial, or Bazaar are not likely to be a problem, since they only have metadata directories (.git, .hg, .bzr) and associated files at the root of the checkout, which would often be outside the docroot.

(This is based on my earlier mailing list post to the interchange-users list.)

Some handy cryptography/networking tools

Here's a list of some nifty cryptography/networking tools Kiel's pointed out lately:

  • socat - multipurpose relay; think netcat gone wild -- we used this recently to tunnel UDP DNS queries over ssh
  • cryptcat - netcat with twofish encryption (the Debian package adds a man page)
  • rsyncrypto - partial transfer-friendly encryption (modified CBC for smaller change windows similar to gzip; less secure than regular CBC)

And a pretty unrelated but useful Red Hat Magazine article on the new yum-security plugin.

Alaska Basin

From Thursday to Saturday I backpacked with a friend and some of our kids into Alaska Basin (in the Tetons, in Wyoming), saw some beautiful scenery, and became reacquainted with the other kind of bugs for a while.

The lake is Sunset Lake, where we went Friday night. I frolicked in the snowmelt water and lost my new glasses in the silt, but came back the next morning and found them after wading out 20 feet or so. It was a great trip.

On "valid" Unix usernames and one's sanity

Today poor Kiel ran into an agonizing bug. A few weeks ago, building a custom RPM of perl-5.10.0 (that is, the Perl distribution itself) wasn't a problem. The unit tests passed with nary a care.

But today it no longer worked. I'll omit details of the many false paths Kiel had to go down in trying to figure out why an obscure test in the Module::Build package was failing. Eventually I took a look and noted that he'd tried all the logical troubleshooting. Time to look at the ridiculous. What if the test was failing because the last time he built it successfully it was under the user "rpmbuild", while he was now trying with user "rpmbuild-local"?

That was exactly the problem. Module::Build's tilde directory (~username) parser was of the (false) opinion that usernames consist only of \w, that is, alphanumerics and underscores. The reality is that pretty much anything is valid in a username, though some characters will cause trouble in various contexts (think of / : . for example).

I explained in more detail in CPAN bug #33492 which reports someone else's experience with the test failing when the username had a backslash in it, such as the Active Directory name "RIA\dillman".

Fun times.

OpenAFS Workshop 2008

This year's Kerberos and OpenAFS Workshop was very exciting. It was the first I've attended since the workshop was large enough to be held separately from USENIX LISA, and it was encouraging to see that this year's workshop was the largest ever, with well over 100 in attendance, and over 10 countries represented. Jeff Altman of Secure Endpoints did a great job on coordinating the workshop. Kevin Walsh and others at New Jersey Institute of Technology did a fantastic job in hosting, providing the workshop with a good venue and great service.

My summary of the workshop is "energy and enthusiasm" as several projects that have been in the development pipeline are starting to bear fruit.

On the technical side, the workshop keynote kicked off the week with a presentation from Alistair Ferguson from Morgan Stanley, where he noted that the work on demand attach file servers has reduced their server restart times from hours, down to seconds, greatly easing their administrative overhead while making AFS even more highly-available.

Of particular technical note, Jeff Altman reported that the Windows client has had lots of performance and stability changes, with major strategic changes being delivered later this year. Specifically, support for Unicode objects is coming in June, support for disconnected operation is coming in the Fall, and a long-awaited native file system driver will be delivered in December. This work will combine to make the Windows client not just a full-featured AFS client, but also a more solid Windows application.

Hartmut Reuter presented another exciting development work: Object Storage for AFS. This extension to both the AFS client and file server allows for AFS data to be striped across multiple servers (thus allowing for higher network utilization) as well as mirrored (giving higher availability). While this work is not yet in OpenAFS, it is in production at CERN and KTH, and work is underway to integrate it into an OpenAFS release.

A major organizational boost was discussed during the workshop: OpenAFS was accepted as a sponsoring organization in the Google Summer of Code and received support for 6 students. Among other projects, these students will be working on support for disconnected operations, enhancements to the Windows client, and improving the kafs implementation of the AFS client sponsored by Red Hat.

The most significant announcement at the workshop is that work is underway to create an organizational entity to support OpenAFS. The OpenAFS Elders have announced the intention to have a 501(c)(3) corporation started in July that will serve as the legal entity behind OpenAFS. From a code standpoint, the licensing of OpenAFS will not change, but from an operational standpoint, people will be able to donate goods, services, and intellectual property to OpenAFS, something that is not currently possible. The foundation will not offer support services as there are currently several companies doing so, but it will be focused on the non-profit components of AFS.

There were several other very interesting talks at the workshop, but the overall message was clear: users and developers are extending OpenAFS and keeping it fresh and viable as the distributed filesystem of choice.

RPM --nodeps really disables all dependency logic

I was surprised about something non-obvious in RPM's dependency handling for the second time today, the first time having been so many years ago that I had completely forgotten.

When testing out an RPM install without having all the required dependencies installed on the system, it's natural to do:

rpm -ivh $package --nodeps

The --nodeps option allows RPM to continue installing despite the fact that I'm missing a handful of packages that $package depends on. This shouldn't be done as a matter of course, but for a quick test, is fine. So far so good.

However, I found out by confusing experience that --nodeps not only allows otherwise fatal dependency errors to be skipped, but it also disables RPM's entire dependency tracking system!

I was working with 3 RPMs, a base interchange package and 2 ancillary interchange-* packages which depend on the base package, such as here:

interchange-5.6.0-1.x86_64.rpm
interchange-standard-5.6.0-1.x86_64.rpm
interchange-standard-demo-5.6.0-1.x86_64.rpm

Then when I installed them all at once:

rpm -ivh interchange-*.rpm --nodeps

I expected interchange to be installed first, followed by either of the interchange-standard-* packages that depend on it.

However, --nodeps disables RPM's tracking of those dependencies, causing them to be installed in what happened to be a pessimistic order that breaks many things. Since the interch user and group that the interchange package creates doesn't exist yet, files can't be owned by the correct user/group. And since the configuration file /etc/interchange.cfg doesn't exist yet, the interchange-standard-demo package can't register itself there.

I wasn't able to see this till I had Kiel join me in a shared screen and watch as I typed my install command. As I spoke aloud --nodeps to Kiel, I suddenly remembered my past experience with this and felt appropriately stupid.

What I really want is not to have no dependency checking at all, but rather something like a hypothetical --ignore-deps-errors option. Changing the behavior of --nodeps to do just that would probably be friendlier overall, but perhaps there's a reason for its current behavior ...

As an aside, I will note that the RPM specfile PreReq tag has been deprecated and is now a synonym for Requires.

Listing installed RPMs by vendor

The other day I wanted to see a list of all RPMs that came from a source other than Red Hat, which were installed on a Red Hat Enterprise Linux (RHEL) 5 server. This is straightforward with the rpm --queryformat (short form --qf) option:

rpm -qa --qf '%{NAME} %{VENDOR}\n' | grep -v 'Red Hat, Inc\.' | sort

That instructs rpm to output each package's name and vendor, then we exclude those from "Red Hat, Inc." (which is the exact string Red Hat conveniently uses in the "vendor" field of all RPMs they pacakge).

By default, rpm -qa uses the format '%{NAME}-%{VERSION}-%{RELEASE}', and it's nice to see version and release, and on 64-bit systems, it's also nice to see the architecture since both 32- and 64-bit packages are often installed. Here's how I did that:

rpm -qa --qf '%{NAME}-%{VERSION}-%{RELEASE}.%{ARCH} %{VENDOR}\n' | grep -v 'Red Hat, Inc\.' | sort

With that I'll see output such as:

fping-2.4-1.b2.2.el5.rf.x86_64 Dag Apt Repository, http://dag.wieers.com/apt/
git-1.5.6.5-1.x86_64 End Point Corporation
iftop-0.17-1.el5.x86_64 (none)

There we see the fping package from the excellent DAG RPM repository, along with a few others.

To see a list of all symbols that can be used:

rpm --querytags

End Point's Spanish website

We've had a Spanish version of our website for about a year now, and we keep the content there current with our main English website. We haven't promoted it much, so I figured I'd mention it here and see if any English speakers feel like checking it out. :) We currently have only a few Spanish speakers at End Point, and if a non-English-speaker calls our main office, it may take a bit of shuffling to route the caller to the right person.

But more to the point, we've done a few interesting multilingual projects. One of them is a private business-to-business website localized in US English, UK English, French, Canadian French, German, Italian, Japanese, Simplified Chinese, Traditional Chinese, Portuguese, Brazilian Portuguese, and Spanish. We're experienced with popular character set encodings and Unicode in web protocols, Postgres and MySQL databases, Perl, and Ruby. We're always interested in taking on more such projects as they tend to be challenging and fun.

The how and why of Code Reviews

Everyone believes that code reviews are highly beneficial to software and web site quality. Yet many of those who agree in principle don’t follow through with them in practice, at least not consistently or thoroughly. To find ways to improve real-world practice, I attended Code Reviews for Fun and Profit, given by Alex Martelli, Über Tech Lead at Google, during OSCON 2008.

One barrier to good reviews is when developers are reluctant to point out flaws in the code of more experienced programmers, perhaps due to culture or personal dynamics. In Open Source projects, and at End Point, the reverse is often true: corrections earn Nerd Cred. But if it is an issue, one good workaround is to ask questions. Instead of “If you use a value of zero, it crashes,” say “What happens if you use a value of zero?”

There are several prerequisites that should be taken care of before code reviews are started. First, a version control system is required (we prefer Git at End Point). Second, a minimal amount of process should be in place to ensure reviews occur, so that some commits do not fall through the cracks. Third, automatable tasks, such as style, test coverage, and smoke tests, should be completed by the computer.

When reviewing code, there are many things to check for. The code should be clear, readable, with consistent, well-named variables and functions. Re-use is important, because the most readable code is the kind that isn’t there: it’s in some other library that’s already tested, reviewed, and proven. Error logs and debug files have clear and consistent messages, exceptions are thrown and caught, returned error values are checked. Also look for memory leaks, security issues, race conditions, premature optimization, and portability issues.

Check for well-written tests, that high level integration tests cover corner cases as well as expected paths, that dependence injection is used correctly when needed. Documentation that is in sync with the source code, has consistent terms, and places ancillary information in external files/links. Terse code comments that explain why, not what, without repeating the code. The UI is clean.

To make code reviews easier and more likely to happen, ensure that they are small. Just 200–400 lines, comments included, depending on the language. Linus rejects large patches out of hand for this reason. Code reviewers should not spend more than 60–90 minutes at a time doing code review, once in the morning and once in the afternoon, because effectiveness drops off quickly. It is not like coding, which is the act of creation; you can spend many hours in that mode without ill effect.

Pair programming doesn't reduce the need for code review. There is a propensity for good pairs to groupthink, so that they will not see problems that an outsider could. Furthermore, code review must still be done for others to have familiarity with the code.

If you have a mountain of legacy code that needs review, it may be best to start by just changing one small piece at a time, and reviewing that as you go.

The most important tool to assist with code review is e-mail. While one person should always have the responsibility of completing the review, the use of e-mail will allow all team members to become familiar with every part of the codebase, and it encourages them to perform additional review because it takes little time. There are a variety of other software tools:

Code reviews have a host of positive benefits: they find bugs, correct inadequate documentation, repair flawed tests, and ensure readable code. They connect team members to each other through code, each becoming better for it. Given enough eyeballs, all bugs are shallow.

Testing Concurrency Control

When dealing with complex systems, unit testing sometimes poses a bigger implementation challenge than does the system itself.

For most of us working in the web application development space, we tend to deal with classic serial programming scenarios: everything happens in a certain order, within a single logical process, be it a Unix(-like) process or a thread. While testing serial programs/modules can certainly get tricky, particularly if the interface of your test target is especially rich or involves interaction with lots of other widgets, it at least does not involve multiple logical lines of execution. Once concurrency is brought into the mix, testing can become inordinately complex. If you want to test the independent units that will operate in parallel, you can of course test each in isolation and presumably have simple "standard" serial-minded tests that are limited to the basic behaviors of the independent units in question. If you need to test the interaction of these units when run in parallel, however, you will do well to expect Pain.

One simple pattern that has helped me a few times in the past:

  • identify what it is that you need to verify
  • in your test script, fork at the relevant time (assuming a Unix-like OS, which is what you're using, isn't it?)
  • in the child process(es), perform the logic that ought to bring about the conditions you want to verify, then exit
  • in the parent process, verify the conditions

A more concrete example: I designed a little agent containing the logic to navigate a simple RESTful interface for rebuilding certain resources. The agent would be invoked (thus invoking the relevant resource rebuild) in response to certain events. In order to keep demand on the server made by the agent throttled to a reasonable extent, I wanted some local concurrency control: the agent would not attempt to rebuild the same resource in parallel, meaning that while one agent requests a rebuild and subsequently waits for the rebuild to complete, a parallel agent request would potentially block. Furthermore, while one agent is rebuilding, any number of agents could potentially launch in parallel, all but one of which would immediately return having done nothing, with the remaining agent blocking on the completion of the original. Upon the original agent's completion, the waiting agent issues its own rebuild. Therefore, on one machine running the agent (and we only ran the agent from one machine, conveniently), no more than one agent rebuild should ever occur at any given time. Furthermore, for any n agents launched at or around the same time for some n >= 2, the actual rebuild in question should happen exactly twice.

This is actually easier to test than it is to explain. An agent that performs the rebuild issues an HTTP request to a configurable URL (recall that the agent navigates a RESTful interface). So our test can create a temporary HTTP server, point the agent at it, and validate the number of requests received by the server. In pseudocode, it ends up being something like this:

  • get a free port on the loopback interface from the operating system
  • in a loop, have the test script fork n times for some n > 2
  • in each child process, sleep for a second or two (to give the parent process time to complete its preparations), then instantiate the agent, pointed at the loopback interface and appropriate port, and invoke the agent. The child process should exit normally after the agent returns, or die if an exception occurs
  • in the parent process, after all the forking is done, launch a local web server on loopback interface with the OS-provided port (I used HTTP::Server since I was doing all of this in Perl)
  • as requests come in to that server, push them onto a stack
  • deactivate the server after some modest period of time (longer than the interval the children slept for)
  • reap the child processes and gather up their exit values
  • the test passes if: the server received exactly two requests to the correct URL (which you check in the stack) and all n child processes exited normally

This example is hardly a common case, but it illustrates a way of approaching these kinds of scenarios that I've found helpful on several occasions.

Perl on Google App Engine

People are working on getting Perl support for Google App Engine, led by Brad Fitzpatrick (of Livejournal, memcached, etc. fame) at Google.

They've created a new module, Sys::Protect, to simulate the restricted Perl interpreter that would have to exist for Google App Engine. There's some discussion of why they didn't use Safe, but it sounds like it's based only on rumors of Safe problems, not anything concrete.

Safe is built on Opcode, and Sys::Protect appears to work the same way Safe + Opcode do, by blocking certain Perl opcodes. All the problems I've heard of and personally experienced with Safe were because it was working just fine -- but being terribly annoying because many common Perl modules do things a typical Safe compartment disallows. That's because most Perl module writers don't use Safe and thus never encounter such problems. It seems likely that Sys::Protect and a hardened Perl Google App Engine environment will have the same problem and will have to modify many common modules if they're to be used.

Moving on, posters are talking about having support for Moose, Catalyst, CGI::Application, POE, Template::Toolkit, HTML::Template ... well, a lot. I guess that makes sense but it will be a lot of work and complicates the picture compared to the simple Python and custom Django-only initial unveiling of Google App Engine.

If you're interested in Perl support for Google App Engine, log into your Google account, visit the "issue" page, and click on the star by the title to vote in favor of Perl support.

Switching from Sendmail to Postfix on OpenBSD

It's easy to pick on Sendmail, and with good reason. A poor security record, baroque configuration, slowness, painful configuration, monolithic design, and arcane configuration. Once you know Sendmail it's bearable, and long-time experts aren't always eager to give it up, but I wouldn't recommend anyone deploy it for a serious mail server these days. But for a send-only mail daemon or a private, internal mail server, it works fine. Since it's the default mailer for OpenBSD, and I haven't been using OpenBSD as a heavy-traffic mail server, I've usually just left Sendmail in place.

A few years ago some of our clients' internal mail servers running Sendmail were getting heavy amounts of automated output from cron jobs, batch job output, transaction notifications, etc., and they bogged down and sometimes even stopped working entirely under the load. It wasn't that much email, though -- the machines should've been able to handle it.

After trying to tune Sendmail to be more tolerant of heavy load and having little success, I finally switched to Postfix (which we had long used elsewhere) and the CPU load immediately dropped from 30+ down to below 1, and mail delivery worked without interruption during busy times.

If I'd known how easy it is to switch OpenBSD from Sendmail to Postfix, I would've done it long ago. I wrongly figured it'd be hard since Sendmail is part of the base system, and none of that seemed very pluggable without hacking on things. I found out it was easy only by finally just trying it myself, following the very simple instructions, and having no trouble. I did this first on OpenBSD 3.9 and now again on OpenBSD 4.3, and the process was the same.

First, pick an OpenBSD mirror, and navigate to the appropriate packages directory. Then set up your environment for easy pkg_add usage. For example:

export PKG_PATH=ftp://ftp.openbsd.org/pub/OpenBSD/4.3/packages/i386

There are several varying OpenBSD Postfix packages, offering support for lookups in LDAP, MySQL, Postgres, or SASL, or a simple build without any of those dependencies:

# pkg_add postfix
Ambiguous: postfix could be postfix-2.5.1p0 postfix-2.5.1p0-ldap postfix-2.5.1p0-mysql postfix-2.5.1p0-pgsql postfix-2.5.1p0-sasl2 postfix-2.6.20080216p1 postfix-2.6.20080216p1-ldap postfix-2.6.20080216p1-mysql postfix-2.6.20080216p1-pgsql postfix-2.6.20080216p1-sasl2

We'll use the simple build:

pkg_add postfix-2.6.20080216p1

The output from the package installation tells you most of what you need to know, but I'll break it down here with a little more detail.

Run crontab -e as root and comment out this Sendmail job:

# sendmail clientmqueue runner
#*/30    *       *       *       *       /usr/sbin/sendmail -L sm-msp-queue -Ac -q

The sendmail compatibility is implemented by a wrapper script similar to how Debian's alternatives system does it (and which Red Hat borrowed as well). In OpenBSD, the wrapper is a binary that uses the configuration in /etc/mailer.conf to decide what to actually run, as opposed to using symlinks as the alternatives system does. You can see this here:

# ls -lFa /usr/sbin/sendmail /usr/bin/newaliases /usr/bin/mailq
lrwxr-xr-x  1 root  wheel  21 Aug  1 14:50 /usr/bin/mailq@ -> /usr/sbin/mailwrapper
lrwxr-xr-x  1 root  wheel  21 Aug  1 14:50 /usr/bin/newaliases@ -> /usr/sbin/mailwrapper
lrwxr-xr-x  1 root  wheel  21 Aug  1 14:51 /usr/sbin/sendmail@ -> /usr/sbin/mailwrapper

To make the switch to Postfix, run:

/usr/local/sbin/postfix-enable

Now you're ready to configure /etc/postfix/main.cf as needed. The defaults should be fine for a server sending outgoing mail only, though if you followed the OpenBSD installer's instructions to use only the short name for the hostname, you need to either set the mydomain parameter manually in main.cf, or else edit /etc/myname to use a fully-qualified domain name instead of the hostname only (and update immediately with the hostname command as well). I do the latter and haven't had any trouble with it before.

Stop Sendmail and start Postfix the same way the boot script will do it:

pkill sendmail
/usr/sbin/sendmail -bd

Send a test message and make sure you receive it:

echo "A special test message" | mail -s testing your_account@the.domain

Note that if you send your message to somewhere offsite, spam filters may reject it if your sending server doesn't have a real hostname, a reverse DNS pointer for the IP address, etc. You can just send locally to avoid that, but of course you won't be able to send mail offsite until you deal with those problems.

Add these settings to /etc/rc.conf.local so Postfix will start on boot:

sendmail_flags="-bd"
syslogd_flags="-a /var/spool/postfix/dev/log"

Now reboot to make sure everything comes up correctly on its own and to get syslogd going right. Send yourself another test message, and you can move on!

Many thanks to the Postfix developers for the excellent mail server software and to the OpenBSD developers for a nice easy way to switch the system mail daemon.