End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

64-bit Windows naming fun

At OSNews.com the article Windows x64 Watch List describes some of the key differences between 64-bit and 32-bit Windows. It's pretty interesting, and mostly pretty reasonable. But this one caught my eye:

There are now separate system file sections for both 32-bit and 64-bit code

Windows x64's architecture keeps all 32-bit system files in a directory named "C:\WINDOWS\SysWOW64", and 64-bit system files are place in the the oddly-named "C:\WINDOWS\system32" directory. For most applications, this doesn't matter, as Windows will re-direct all 32-bit files to use "SysWOW64" automatically to avoid conflicts.

However, anyone (like us system admins) who depend on VBScripts to accomplish tasks, may have to directly reference "SysWOW64" files if needed, since re-direction doesn't apply as smoothly.

I've been using 64-bit Linux since 2005 and found there to be some learning curve there, with distributors taking different approaches to supporting 32-bit libraries and applications on a 64-bit operating system.

The Debian Etch approach is to treat the 64-bit architecture as "normal", for lack of a better word, with 64-bit libraries residing in /lib and /usr/lib as always. It's recommended to run a 32-bit chroot with important libraries in the ia32-libs package going into /emul/ia32-linux. Ubuntu is similar, but its ia32-libs puts its ia32-libs files into /usr/lib32.

The Red Hat approach called "multilib" keeps 32-bit libraries in /lib and /usr/lib with new 64-bit libraries living in /lib64 and /usr/lib64. (I mentioned this a while back while discussing building a custom Perl on 64-bit Red Hat OSes.)

Each way has its tradeoffs, and causes a bit of trouble. That's just the cost of dealing with multiple architectures in a single running OS, where no such support was previously needed.

But the Windows way? Putting your 32-bit libraries in C:\WINDOWS\SysWOW64 and your 64-bit libraries in C:\WINDOWS\system32? It hurts to see the names be exactly backwards. That's really tops for confusion.

Filesystem I/O: what we presented

As mentioned last week, Gabrielle Roth and I presented results from tests run in the new Postgres Performance Lab. Our slides are available on Slideshare.

We tested eight core assumptions about filesystem I/O performance and presented the results to a room of filesystem hackers and a few database specialists. Some important things to remember about our tests: we were testing I/O only - no tuning had been done on the hardware, filesystem defaults or for Postgres - and we did not take reliability into account at all.  Tuning the database and filesystem defaults will be done for our next round of tests.

Filesystems we tested were ext2, ext3 (with or without data journaling), xfs, jfs, and reiserfs.

Briefly, here are our assumptions, and the results we presented:

  1. RAID5 is the worst choice for a database. Our tests confirmed this, as expected.
  2. LVM incurs too much overhead to use. Our test showed that for sequential or random reads on RAID0, LVM doesn't incur much more overhead than hardware or software RAID.
  3. Software RAID is slower. Same result as LVM for sequential or random reads.
  4. Turning off 'atime' is a big performance gain. We didn't see a big improvement, but you do generally get 2-3% improvement "for free" by turning atime off on a filesystem.
  5. Partition alignment is a big deal. Our tests weren't able to prove this, but we still think it's a big problem. Here's one set of tests demonstrating the problem on Windows-based servers.
  6. Journaling filesystems will have worse performance than non-journaling filesystems. Turn the data journaling off on ext3, and you will see better performance than ext2. We polled the audience, and nearly all thought ext2 would have performed better than ext3. People in the room suggested that the difference was because of seek-bundling that's done in ext3, but not ext2.
  7. Striping doubles performance. Doubling-performance is a best-case scenario, and not what we observed. Throughput increased about 35%.
  8. Your read-ahead buffer is big enough.  The default read-ahead buffer size is 128K. Our tests, and an independent set of tests by another author, confirm that increasing read-ahead buffers can provide a performance boost of about 75%.  We saw improvement leveling out when the buffer is sized at 8MB, with the bulk of the improvement occurring up to 1MB. We plan to test this further in the future.

All the data from these tests is available on the Postgres Developers wiki.

Our hope is that someone in the Linux filesystem community takes up these tests and starts to produce them for other hardware, and on a more regular basis. We did have 3 people interested in running their own tests on our hardware from the talk!  In the future, we plan to focus our testing most on Postgres performance.

Mark Wong and Gabrielle will be presenting this talk again, with a few new results, at the PostgreSQL Conference West.

Postfix, ~/.forward, and SELinux on RHEL 5

For the record, and maybe to save confusion for someone else who runs into this:

On Red Hat Enterprise Linux 5 with SELinux in enforcing mode, Postfix cannot read ~/.forward files by default. It's probably not hard to fix -- perhaps the .forward files just need to have the right SELinux context set -- but we decided to just use /etc/aliases in this case.

Red Hat acquires Qumranet

I missed the news a week and a half ago that Red Hat has acquired Qumranet, makers of the Linux KVM virtualization software. They say they'll be focusing on KVM for their virtualization offerings in future versions of Red Hat Enterprise Linux, though still supporting Xen for the lifespan of RHEL 5 at least. (KVM is already in Fedora.)

Given that Ubuntu also chose KVM as their primary virtualization technology a while back, this should mean even easier use of KVM all around, perhaps making it the default choice on Linux. (Ubuntu supports other virtualization as well.)

Also, something helpful to note for RHEL virtualization users: Red Hat Network entitlements for up to 4 Xen guests carry no extra charge if entitled the right way.

In even older Red Hat news, Dag Wieers wrote about Red Hat lengthening its support lifespan for RHEL by one year for RHEL 4 and 5.

That means RHEL 5 (and thus also CentOS 5) will have full support until March 2011, new media releases until March 2012, and security updates until March 2014. And RHEL 4, despite its aging software stack, will receive security updates until February 2012!

That's very helpful in making it easier to choose the time of migration without being pushed too soon due to lack of support.

Competence, Change Agents, Software, and Music

Seth Godin wrote an interesting article on the subject of competence; it resonated with me personally for a variety of reasons.

The article uses musicians, and Bob Dylan in particular, as an example of how "competence" can pale in comparison to "incompetence" in terms of the quality of the results. In particular, it asserts that competent musicians consistently play the music in question the same way, and suggests that the lack of such consistency could be thought of as incompetence. Bob Dylan thus becomes an incompetent musician who is nevertheless really great due to the emotional content of his performances; beyond that, he is a "change agent" because of his brilliance. And that's the crux of the article: the "incompetent" people are the change agents who advance the state of the art, while the "competent" people resist change and thus hold things back.

As a fairly serious practicing musician myself, I'll assert in response: this is not an accurate representation of musicianship, and the issue extends to the core of the article's argument.

Playing music the same way every time is not an indication of competence. It's an indicator of insufficient imagination and demonstrates a lack of mastery. The different musical traditions of the world vary considerably in the precision of their musical project specs (e.g. scores with full orchestrated notation versus "charts" with melody over chord symbols versus no notation at all), but the musician always has ample room to interpret. The jazz musician gives the appearance of spontaneity in wild improvisational flights of fancy, while the classical pianist playing Bach may seem to be playing things the same way twice. But there's plenty of interpretative, improvisational nuance going on in the Bach performance, it's just subtler and doesn't necessarily have to do with the order and combination of pitches played. The jazz musician's appearance of spontaneity is a studied spontaneity that is practiced and accumulated over time just like instrumental technique; the improvised material consists largely of material the musician has already mastered and played, in various combinations, many times over.

A musician who aspires to play something the same way every time is a musician who is trying to learn a piece, but who is not trying to master the piece so it can become an expressive vehicle. The musician may ultimately be able to execute the piece, but will probably not ever give a particularly compelling performance of it. Furthermore, it is commonly the case that truly dedicated listeners are potentially better-equipped than are casual listeners to separate the ho-hum performances from the truly exceptional.

So, how does this relate to the business world?

The person the article categorizes as "competent" may truly be simply competent, solving problems in the nearly same way every time, delivering consistent results. It may also be that the "competent" person is truly a master, with that mastery expressed in the small details that aren't necessarily so obvious to people unfamiliar with the craft. In software engineering, it takes all kinds to make things run. There are some engineers who consistently display a great creative impulse, who think outside the cliché, who can be used to approach difficult problems with ingenuity and boldness. There are some engineers who stay more focused on a particular toolset, methodology, etc., who will not necessarily display "outside the box" thinking, but will demonstrate complete command of the tools of their craft, delivering rock-solid, maintainable solutions to problems large and small, easy and difficult.

In the world of software engineering, or any other craft that involves building stuff to spec, what does it mean to solve problems the same way every time? If we're literally talking about writing similar code over and over, then the problem needs to be re-positioned: we should be talking about building a generic solution so humans don't need to waste their time with redundant custom solutions. Furthermore, if a proven method, design pattern, etc. has worked effectively for a problem in the past, and a similar problem comes up now, disregarding the "competent" solution of relying on past success to inform today's plans would be deeply unwise. Perhaps an "incompetent" person could come up with something even better, so the important thing is to have the flexibility to embrace change when appropriate.

The real win, I think, is to have the full spectrum of possibilities adequately represented. Hire people who are really smart, take pride in their work, and who show humility. The last point is critical: the extremes of "competent" versus "incompetent" as laid out in the article arguably represent archetypal factions that cannot appreciate the value that the other faction adds. A dose of humility ensures that all parties can appreciate the others' contributions.

Now, the original article is talking about "change agents", and I'm not really addressing that. But, to go back to music for a second, let's consider a rather important figure in western art music (that tradition most people call "classical music"): Johann Sebastian Bach.

Bach was working during the phase when the "baroque" period ended and the "classical" period began. He was known and respected for his skill in composition and keyboard performance, but he was regarded as something of a relic. He was composing at the very extreme edges of the "baroque" tradition, while the simplified "classical" tradition was coming into vogue. In that light, he was not regarded as an innovator.

Yet an innovator he was. Beyond his innovative experimentation with equal-tempered instruments, Bach achieved a level of sophistication and mastery of counterpoint (the weaving together of multiple concurrent melodies so that each melody stands on its own while all working together to achieve coherent harmonic progressions) that remains unparalleled. Everything he did was a logical extension of the tradition in which he operated; his prolific body of work and the stunning mastery it demonstrates would not be possible without a complete dedication to his musical tradition. He did not set out to be different; he mastered the compositional techniques of the day to such a degree that he had complete freedom in how he exercised those techniques. Yet he approached problems using the same techniques time and time again; his music unfolds in a clear, logical manner that to the well-versed can often be quite predictable.

JS Bach was a "change agent". The western art music tradition would never be the same after him. Countless major composers that followed were heavily influenced by Bach's work. Yet his work is that of the supremely competent craftsman. A staggering, brilliant, unerring competence.

UTOSC 2008 wrap-up

Using Vyatta to Replace Cisco Gear

At the 2008 Utah Open Source Conference I attended an interesting presentation by Tristan Rhodes about the Vyatta open source networking software. Vyatta's software is designed to replace Cisco appliances of many sorts: WAN routers, firewalls, IDSes, VPNs, and load balancers. It runs on Debian GNU/Linux, on commodity hardware or virtualized.

A key selling point is the price/performance benefit vs. Cisco (prominently noted in Vyatta's marketing materials), and the IOS-style command-line management interface for experienced Cisco network administrators. Regular Linux interfaces are available too, though Tristan wasn't positive that writes would stick in all cases, as he's mostly used the native Linux tools for monitoring and reading, not writing.

Pretty cool stuff, and Vyatta sells pre-built appliances and support too. The Vyatta reps were handing out live CDs, but I haven't had a chance to try it out yet. Presentation details are here.

Google App Engine 101

Jonathan Ellis did a presentation and then hands-on workshop on Google App Engine, which I found especially useful because he's a longtime Python and Postgres user. His talk on SQLAlchemy last year made me think he wouldn't gloss over the huge differences in the runtime environment of GAE vs. regular Django, for example having GQL and BigTable instead of SQL and a relational database. And he didn't. They're quite different, and one is very primitive to use. I'll let you guess which one. :)

In fact, the day of the conference he wrote a blog post, App Engine Conclusions, where he says: "I've reluctantly concluded that I don't like it." His reasoning makes sense to me, and maybe it will improve enough later to be really nice. We'll see. Of course that's all ignoring the hosting lock-in too.

His presentation details are here.

Writing Documentation with Open Source Tools

Paul Frields (of the Fedora Project) and Jared Smith (of Asterisk fame) showed how to use DocBook XML to write documentation. It was a practical talk, we asked questions, they tag-teamed the answers and live demonstrations, showed us the Red Hat tool "Publican" and Gnome's yelp documentation viewer that can present DocBook XML natively. Good stuff, though XML sure hasn't gotten any less verbose.

The presentation details include a link to the slides.

Automated System Management with Puppet

Andrew Shafer did a presentation on Puppet, and I was sad to miss the beginning of it. But what I heard was quite enjoyable.

The message I took away is this: Without some overlap of the traditionally separate domains or disciplines of system administrator and programmer, no software tool is going to be able to magically manage all your systems for you. Puppet provides a domain-specific language for specifying what resources should be available. (Resources are comprised of packages, files, and services.) You still have to say what you want, but there's a nice way to do that in a cross-platform way, once. Paraphrasing Einstein, it's a simple as it can be, but not simpler.

The questions were good, but I had the feeling from a few of them that people wanted things to be simpler than possible. :)

Andrew's presentation slides tell the story pretty well even on their own.

LOLCATS

A nice bonus was the UTOSC crew giving out fortune cookies with LOLCATS fortunes. Mine read:

i'm in ur cookie
given ur fortune

That was a delight. And I happened to meet up right about then with Josh Tolley, author of PL/LOLCODE.

Machine virtualization on the Linux desktop

In the past I've used virtualization mostly in server environments: Xen as a sysadmin, and VMware and Virtuozzo as a user. They have worked well enough. When there've been problems they've mostly been traceable to network configuration trouble.

Lately I've been playing with virtualization on the desktop, specifically on Ubuntu desktops, using Xen, kvm, and VirtualBox. Here are a few notes.

Xen: Requires hardware virtualization support for full virtualization, and paravirtualization is of course only for certain types of guests. It feels a little heavier on resource usage, but I haven't tried to move beyond lame anecdote to confirm that.

kvm: Rumored to have been not ready for prime time, but when used from libvirt with virt-manager, has been very nice for me. It requires hardware virtualization support. One major problem in kvm on Ubuntu 8.04 is with the CD/DVD driver when using RHEL/CentOS guests. To work around that, I used the net install and it worked fine.

VirtualBox: This was for me the simplest of all for desktop stuff. I've used both the OSE (Open Source Edition) in Ubuntu and Sun's cost-free but proprietary package on Windows Vista. The current release of VirtualBox only emulates i386 32-bit machines at the moment, though! (No 64-bit guests, though a 64-bit host is fine.) It's also been a little buggy at times -- I've had a few machine crashes when running both an OpenBSD 4.3 and a RHEL 5 guest, though I wasn't able to reproduce the problem and it's possible it wasn't a VirtualBox issue.

I should note that some manufacturers have a BIOS option to disable hardware virtualization, and that it is sometimes disabled by default. When booting a new machine, check for that, especially in servers you won't necessarily want to take down later.

A final note about RHEL 5's net install: Why, oh why, does the installer ask for an HTTP install location as separate web site and directory entries, instead of a universally used and easy URL? And further, when the install source I'm using goes down (as download mirrors occasionally do), why are my only options to reboot or retry? Would it have been so hard to allow me the option of entering a new download URL? Yes, I know, I need to send in a patch.

Know your tools under the hood

Git supports many workflows; one common model that we use here at End Point is having a shared central bare repository that all developers clone from. When changes are made, the developer pushes the commit to the central repository, and other developers see the relevant changes on subsequent pulls.

We ran into an issue today where after a commit/push cycle, suddenly pulls from the shared repository were broken for downstream developers. It turns out that one of the commits had been created by root and pushed to the shared repository. This worked fine to push, as root had read-write privileges to the filesystem, however it meant that the loose objects which the commit created were in turn owned by root as well; fs permissions on the loose objects and the updated refs/heads/branch prevented the read of the appropriate files, and hence broke the pull behavior downstream.

Trying to debug this purely on the reported messages from the tool itself would have resulted in more downtime at a critical time in the client's release cycle.

There are a couple of morals here:

  • Don't do anything as root that doesn't need root privileges. :-)
  • Understanding how git works at a low level enabled a speedy detection of the (*ahem*) root cause of the problem and led to quick correction of the underlying permissions/ownership issues.

Fun with 72GB disks: Filesystem performance testing

If you haven't heard, the Linux Plumbers Conference is happening September 17-19, 2008 in Portland, OR. It's a gathering designed to attract Linux developers - kernel hackers, tool developers and problem solvers.

I knew a couple people from the Portland PostgreSQL User Group (PDXPUG) interested in pitching an idea for a talk on filesystem performance. We wanted to examine performance conventional wisdom and put it to the test on some sweet new hardware, recently donated for performance testing Postgres.

Our talk was accepted, so the three of us have been furiously gathering data, and drawing interesting conclusions, ever since. We'll be sharing 6 different assumptions about filesystem performance, tested on five different filesystems, under five types of loads generated by fio, a benchmarking tool designed by kernel hacker Jens Axboe to test I/O.

Look forward to seeing you there!

Small changes can lead to significant improvements

Case in point: We've been investigating various system management tools for both internal use and possibly for some of our clients. One of these, Puppet from Reductive Labs has a lot of features that I like a lot and good references (Google uses it to maintain hundreds of Mac OS X laptop workstations).

I was asked to see if I could identify any performance bottlenecks and perhaps fix them. With the aid of dtrace (on my own Mac OS X workstation) and the Ruby dtrace library it was easy to spot that a lot of time was being eaten up in the "checksumming" routines.

As with all system management tools, security is really important and part of that security is making sure the files you are looking at and using are exactly the files you think they are. Thus as part of surveying a system for modified files, they are each checksummed using an MD5 hash.

To speed things up, at a small reduction in security, the Puppet checksumming routines have a "lite" option which only feeds the first 512 bytes of a file into the MD5 algorithm instead of the entire file, which can be quite large.

As with most security packages these days, the way you implement an MD5 hash is to get a "digest" object, initialized to use the MD5 algorithm. When Puppet checksums a file, it opens it and reads it in 512 byte chunks, handing each chunk to the digest to ... digest. If the "lite" option is set, it stops after the first chunk.

Hard to see how we can improve on that, but it can be done. All of the digest methods, anticipating how they're going to be used most of the time, have a "file" option. You create the digest then hand it the file path.

The digest does the rest, and since it's part of a package which is part of the Ruby distribution, it's probably coded in compiled C and not interpreted.

Based on a number of benchmark tests using files both large and small we've found this small change yields about a 10% increase in performance and since this operation may be done hundreds of times for a single update run, that can add up.

(If you're interested in some of the raw numbers, please take a look at http://stevemac.endpoint.com/puppet.html for a summary of the original analysis and links or visit the Puppet developers forum at http://groups.google.com/group/puppet-dev and scan the archives).

Stepping into version control

It's no little secret that we here at End Point love and encourage the use of version control systems to generally make life easier both on ourselves as well as our clients.  While a full-fledged development environment is ideal for maintaining/developing new client code, not everyone has the time to be able to implement these quickly.

A situation we've sometimes found with clients editing/updating production data directly.  This can be through a variety of means; direct server access, scp/sftp, or web-based editing tools which save directly to the file system.

I recently implemented a script for a client who uses a web-based tool for managing their content in order to provide transparent version control.  While they are still making changes to their site directly, we now have the ability to roll back any changes on a file-by-file basis as they are created, modified, or deleted.

I wanted something that was: 1) fast, 2) useful, and 3) stayed out of the user's way.  I turned naturally to git.

In the user's account, I executed git init to create a new git repository in their home directory.  I then git added the relevant parts that we definitely wanted under version control.  This included all of the relevant static content, the app server files, and associated configuration: basically anything we might want to track changes to.

Finally, I determined the list of directories which we would like to automatically detect any newly created files.  These corresponded to the usual places where new content was apt to show up.  I codified the automatic update of the git repo in a script called git_heartbeat, which is called periodically from cron.

The basic listing for git_heartbeat:

#!/bin/bash
# automatically add any new files in these space-separated directories
AUTO_ADD_DIRS="catalogs/acme/pages htdocs"

# make sure we're in the proper git root directory
cd /home/acme

# actually add any newly created files in $AUTO_ADD_DIRS
find $AUTO_ADD_DIRS -print0 | xargs -0 git add

DATE=`date`

git commit -q -a -m "Acme Co git heartbeat - $DATE" > /dev/null

A couple notes:

  1. git commit -a takes care of the modification/deletion of any already tracked files.  The git add ensures that any newly created files are currently in the index and will be included with the commit.
  2. if no files have been added, removed, or deleted, no checkpoint is created.  This ensures that every commit in the log is meaningful and corresponds to an actual change to the site itself.
  3. Compared to other VCSs which keep metadata in each versioned subdirectory (such as Subversion), this approach stays out of the user's way; we don't have to worry about the user accidentally overwriting/deleting data in their upload directories and thus corrupting the repository.
  4. This approach is fast; it runs near instantaneously for thousands of files, so we could even push the cron interval to every minute if desired.  For our purposes, this system works great as is.
  5. Once the git tools are installed, there is no need to set up a central repository; git repos are very cheap to create/use and for a use case such as this, require little to no maintenance beyond the initial setup.

Areas of improvement/known issues:

  1. This script could definitely be improved insofar as providing more informative information as to which files were added/modified/deleted.  However, git's own tools can come in quite useful; for instance, git log --stat will show the files which each heartbeat commit affected.
  2. Since this is set up as a general cron job running every hour (the period is configurable, obviously), it does preclude extended stagings for non-heartbeat commits; basically, anything which takes longer than the heartbeat interval will be inadvertently committed.

Standardized image locations for external linkage

Here's an interesting thought: http://www.boingboing.net/2008/09/01/publishers-should-al.html

Nutshell summary: publishers should put cover images of books into a standard, predictable location (like http://www.acmebooks.com/covers/{ISBN}.jpg).

This could be extended for almost any e-commerce site where the product image might be useful for reviews, links, etc.

At very least, with Interchange action maps, a site could capture external references to such image requests for further study. (E.g., internally you might reference a product image as [image src="images/products/current{SKU}"], but externally as "/products/{SKU}.jpg"; the actionmap wouldn't be used for the site, but only for other sites linking to your images.)

Authorize.Net Transaction IDs to increase in size

A sign of their success, Authorize.net is going to break through Transaction ID numbers greater than 2,147,483,647 (or 2^31), which happens to exceed the maximum size of a signed MySQL int() column and the default Postgres "integer".

It probably makes sense to ensure that your transaction ID columns are large enough proactively - this would not be a fun bug to run into ex-post-facto.

Major rumblings in the browser world

Wow. There's a lot going on in the browser world again all of a sudden.

I recently came across a new open source browser, Midori, still in alpha status. It's based on Apple's WebKit (used in Safari) and is very fast. Surprisingly fast. Of course, it's not done, and it shows. It crashes, many features aren't yet implemented, etc. But it's promising and worth keeping an eye on. It's nice to have another KHTML/WebKit-based browser on free operating systems, too.

Now today news has come out about Google's foray into the browser area, with a browser also based on WebKit called Chrome. It'll be open source, include a new fast JavaScript engine, and feature compartmentalized JavaScript for each page, so memory and processor usage will be easy to monitor per application, and individual pages can be killed without bringing the whole browser down. Code's supposed to become available tomorrow.

A new generation of JavaScript engine for Mozilla is now in testing, called TraceMonkey. It has a just-in-time (JIT) compiler, and looks like it makes many complex JavaScript sites very fast. It sounds like this will appear formally in Firefox 3.1. Information on how to test it now is at John Resig's blog.

And finally, Microsoft is adding a new "InPrivate" browsing mode to Internet Explorer 8, which now has a public beta. Unlike all of the above, it will ... not be open source. :)

Nice to see so much movement.