End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Open Source Bridge: the aftermath

I've been planning the Open Source Bridge conference in my spare time over the past 9 months here in Portland, OR. We finally made it happen June 17-19, 2009 at the Oregon Convention Center. I had the pleasure of co-chairing the event with Audrey Eschright, and was extremely happy that End Point decided to also sponsor the event.

The conference was organized around the idea of "open source citizenship", and what things we as individual contributors, companies and users of free and open source software do to participate. We came together to share how we do things, what we've already done, and what we might be doing in the future.

The conference held 76 sessions, from over 100 speakers and panelists and 475 total participants over three days. There were 8 rooms full of talks for about 9 hours every day. We had a 24-hour hacker lounge at the conference hotel, and it was packed every evening -- including our closing night, when we wrapped right at midnight.

Above, are my slides for the opening remarks, and there's even a video of the opening session, the keynotes and all the sessions that happened in the Fremont room will be available at osbridge.blip.tv.

Some really great reviews of the conference are coming through. If you'd like to relive it in text through a few other people, have a look at these writeups below. Forgive me for quoting the exuberant titles. I am so pleased that people had a great experience at the conference.

We're definitely holding the conference again next year in June!

LinuxTag 2009 day 1

Today was the first day of LinuxTag 2009. Representing the Interchange project are Stefan Hornburg of LinuXia Systems in Germany, Davor Ocelić of Spinlock Solutions in Croatia, and I, Jon Jensen, of End Point in the U.S.

Tuesday afternoon we set up the booth (here still underway):

That was a fairly quiet affair since many exhibitors showed up later that afternoon or early Wednesday morning. But it was nice to get it all done early. The setup involves the network and power wiring behind the scenes, hanging the signs, unloading the marketing materials, and getting all the equipment tested (and then put away again for the night). At night back in our apartment we made some updates to the slide presentation to include many more examples of some of the busy and interesting sites we have current data on that appear in the Interchange Hall of Fame.

We're sharing a booth with the YaCy distributed search project, and have had a few good discussions with their people.

Booth traffic was probably about the same this year as it was the first day last year -- a little slow. We talked with several people who were interested in hosted e-commerce solutions such as Interchange is.

In two cases, very interested visitors were not at all clear about how the open source partnership between project, individuals, and businesses works, and we were able to explain it. (Hopefully clearly enough that it still made sense after we were done talking!)

Specifically, one visitor represents a hosting company that wanted to pay to include Interchange in their hosting offerings. We are of course happy to take his money but have no set price to offer because, as he later succinctly put it, "Sie leben von Ihren Dienstleistungen." That is, "You live from your [custom] services." Exactly.

We explained the service model, his ability to download the software for free not just to evaluate, but to permanently use and resell as a hosted service to others. At the end it was clear it would've been simpler for him to hear it costs something like €500 or €5000 per year to be part of our hosting partner program. Yet that wouldn't have answered the question of what support we provide, how his programmers can contribute back to the Interchange community as they customize the software, etc. So the elaboration is necessary. And we explained that each of us three Interchange developers represents our three different consulting companies, and that's who you actually do business with, not "Interchange" per se.

The other visitor who didn't have a background with open source software wanted something similar, a fixed price to be allowed to deploy the software and customize it for his own consulting customers. A similar discussion was had. The bottom line is, the software's free, but work you hire us to do specifically for you is not. That's not too complicated once you get used to it!

In between talking with visitors, we talked about some work Stefan's been doing on the WellWell catalog, the need for new experimental Interchange branches which is much easier now with our new Git repository, and some custom work that's been done before that needs to be genericized and committed to mainline Interchange. We also dropped in on the Freenode #interchange IRC channel and worked a bit with Gert van der Spoel, René Hertell, David Christensen, and others.

Otherwise, I had time to attend one talk, Die Mathematik hinter RAID, by Michael Schwartzkopff. He worked through the math to show the probability of various kinds of failures when using RAID 1 and RAID 5, discussed RAID 6 and Sun's ZFS RAID-Z2. It was quite interesting and a good reminder that as hard disk capacity grows, what once seemed like incredibly small chances of failure (a one-bit read error, or a failure of a disk's mechanism) become both more likely and more catastrophic when they do occur.

So we're off to a good start, with three more days to go.

nofollow in PageRank Sculpting

Last week the SEO world reacted to Matt Cutts' article about the use of nofollow in PageRank sculpting.

Google uses the PageRank algorithm to calculate popularity of pages in the web. Popularity is only one factor in determining which pages are returned in search results (relevance to search terms is the other major factor). Other major search engines use similar popularity algorithms. Without describing the algorithm in detail, the important takeaways are:

  • PageRank of a single page is influenced by all inbound (external links) links
  • PageRank of a single page is passed on to all outgoing links after being normalized and divided by the total number of outgoing links

So, given page C with an inbound links from page A and B, where page A and B have equal page rank X, page A has 3 total external links and B has 5 total external links, page C receives more PageRank from page A than page B.


From an external link perspective, it's great to get as many links as possible from a variety of sources that rank high and have a low number of external links. From an internal site perspective, it's important to examine how PageRank is passed throughout a site to apply the best site architecture. In addition to designing a site architecture that pleases users and passes link juice throughout a site effectively, the rel="nofollow" tag was adopted by several major search engines and was used as an additional tool to stop the flow of link juice from one page to another. The nofollow tag can also be used to identify paid links (early implementation) or to avoid passing links to external sites completely.

In the example above, rel="nofollow" could be added to 2 links on page B which would result in the same PageRank passed from page B to page C as from page A to page C.


Then, at a recent SEO conference, Matt Cutts (head of the Google spam team) made a comment about how the PageRank algorithm changed its use of nofollow and just last week, it was announced that the PageRank algorithm would no longer use the nofollow attribute in PageRank sculpting. Any link with the nofollow attribute will no longer reduce the count of outgoing page links to improve link juice passed on to other pages, but link juice will still not be passed from one link to another with the nofollow attribute.

In the ongoing example, the link juice passed from page B to page C will be less than from page A to C because it has more outgoing links, even if they are nofollow links.


One SEOmoz article I read suggests that SEO best practices will now be to recommend blog owners to disallow comments that may contain external links to prevent the dilution of link juice. Other potential solutions would be to filter out links from user generated content (comments or qna specifically), use iframes to display any user generated content, or embed flash or java with external links. The nofollow attribute may be used to stop the flow of link juice to external pages, however, it may no longer be used for internal PageRank sculpting.

Learn more about End Point's technical SEO services.

Getting Started with Demand Attach

As OpenAFS moves towards a 1.6 release that has Demand Attach Fileservers (DAFS), there is a need to thoroughly test Demand Attach. Getting started can be tricky, so this article highlights the important steps to configuring a Demand Attach fileserver.

OpenAFS CVS HEAD does not come with Demand Attach enabled by default, so you'll need to build your own binaries. You should consult the official documentation, but the major requirement is to pass the --enable-demand-attach-fs option to configure. You should also note that DAFS is only supported on namei fileservers, not inode.

Once you've built and installed the binaries, you need to be careful to remove your existing fileserver's bos configuration (i.e., fs) and put a dafs one in place; e.g.,

$ bos stop localhost fs -localauth
$ bos delete localhost fs -localauth

Once the fs bnode is deleted, you need to install the new binaries and create the dafs entry. You should pass your normal command line arguments to the fileserver and volserver processes:

$ bos create localhost dafs dafs "/usr/afs/bin/fileserver -my-usual-options" \
    /usr/afs/bin/volserver \
    /usr/afs/bin/salvageserver /usr/afs/bin/salvager

Once the entry is created, the bosserver will automatically bring up the processes, so you should check the logfiles to make sure everything is ok. Note that a vos listvol will show volumes as online, even if they are only pre-attached (pre-attached means that the fileserver was able to read the volume header, but has not yet brought the volume fully online). You can watch the FileLog to see when the fileserver requests a salvage be done.

After initial configure, build, and bos configuration, your Demand Attach fileserver is not significantly different from your normal fileserver. You create, move, back up, restore, and move volumes just as with a traditional fileserver.

For more details about DAFS, take a look at the OpenAFS wiki entry. Be sure to give feedback to the mailing list.

Packaging Ruby Enterprise Edition into RPM

It's unfortunate that past versions of Ruby have gained a reputation of performing poorly, consuming too much memory, or otherwise being "unfit for the enterprise." According to the fine folks at Phusion, this is partly due to the way Ruby does memory management. And they've created an alternative branch of Ruby 1.8 called "Ruby Enterprise Edition." This code base includes many significant patches to the stock Ruby code which dramatically improve performance.

Phusion advertises an average memory savings of 33% when combined with Passenger, their Apache module for serving Rails apps. We did some testing of our own, using virtualized Xen servers from our Spreecamps.com offering. These servers use the DevCamps system to run several separate instances of httpd for each developer, so reducing the usage of Passenger was crucial to fitting into less than a gigabyte of memory. Our findings were dramatic: one instance dropped 100MB down to 40MB. (The status tools included with Passenger were very helpful in confirming this.)

There has been some discussion on the Phusion Passenger and other mailing lists about packaging Ruby Enterprise Edition for Red Hat Enterprise Linux and its derivatives (CentOS and Fedora). Packages are available from Phusion for Ubuntu Linux, but many of our clients prefer RHEL's reputation as a stable platform for e-commerce hosting. So we've packaged ruby-enterprise into RPM and made them available to give back to the Rails community.

We want our SpreeCamps systems to be easy to maintain, following the "Principle of Least Astonishment." By default, Phusion's script installs ruby-enterprise into /opt, so invocation must include the full path to the executable. This would be unsettling to a developer who mistakenly installed gems to Red Hat's rubygems path while intending to install gems usable by REE and Passenger. It is important to install the ruby and gem executables into all users' $PATH.

We took a cue from our customized local-perl packages. These packages install themselves into /usr/local. This means that all executables reside in /usr/local/bin; no $PATH modifications are necessary to utilize them via the command-line. Our ruby-enterprise packages are configured the same way. (If another /usr/local/bin/ruby exists, package installation will fail before clobbering another ruby installation.) Applications which specify #!/usr/bin/ruby will continue to use Red Hat's packaged ruby.

Similar to a source-based installation, once these packages are installed you may do gem install passenger and any other gems your application needs. Phusion's REE installer also installs several "useful gems". However we elected not to include these in the main ruby-enterprise RPM package. More, smaller packages limited to a particular module or piece of software, is better than one or two big fat RPMs with a bunch of stuff you may or may not need. We will likely package individual gems in the near future.

These packages are publicly available from our repository. We've just begun using these but are finding them reliable and very helpful so far. Any of you who would like to are welcome to try them out via direct download, or much easier, adding our Yum repository to your system as described here:

https://packages.endpoint.com/

Once you've done that, a simple command should get you most of the way there:

yum install ruby-enterprise ruby-enterprise-rubygems

If you prefer to download them directly, the .rpm packages are available on that site as well, just browse through the repo.

The .spec file is available for review and forking on GitHub: http://gist.github.com/108940

Many thanks to list member Tim Charper for providing an example .spec, and my colleagues at End Point for reviewing this work.

We appreciate any comments or questions you may have. This package repo is for us and our clients primarily, but if there's a package you need that isn't in there, let us know and maybe we'll add it.

Using the new-style Google Analytics pageTracker functions in Interchange

For a while now there have been two different ways to setup the JavaScript calls to report traffic back to Google Analytics. The older method uses functions names that mention "urchin," while the newer method uses a function named "pageTracker". This post describes an approach for using the new method at a standard Interchange store.

You can see an example of the new method of reporting a page view here. Nothing Interchange-related is required for normal page tracking, but you may want to use a variable for the Google Account Number, of which more below.

If you have your Google Analytics account setup to treat the website as an E-commerce site, then you can also add the order tracking tags to your receipt page, so that it sends order data over to Google Analytics at the time of conversion. The order tracking tags can be viewed here. This gist shows the typical Interchange tags you might want to use to transmit the order specifics. Of course you might need to change the field used for the category for the products since not everyone uses the prod_group field from the products table to hold this information.

As you can see, both normal and the order-conversion scripts need to be modified to contain the individual Google Analytics account number for the website. I tend to set up an Interchange variable such as GOOGLE_ANALYTICS_ID in the variable.txt file or catalog.cfg.

Learn more about End Point's analytics expertise.

Inside PostgreSQL - Clause selectivity

One of the more valuable features of any conference is the so-called "hall track", or in other words, the opportunity to talk to all sorts of people about all sorts of things. PGCon was no exception, and I found the hall track particularly interesting because of suggestions I was able to gather regarding multi-column statistics, not all of which boiled down to "You're dreaming -- give it a rest". One of the problems I'd been trying to solve was where, precisely, to put the code that actually applies the statistics to a useful problem. There are several candidate locations, and certainly quite a few places where we could make use of such statistics. The lowest-hanging fruit, however, seems to be finding groups of query clauses that aren't as independent as we would normally assume. Between PGCon sessions one day, Tom Lane pointed me to a place where we already do something very similar: clausesel.c

"Clause selectivity" means much the same thing as any other selectivity: it's the proportion of rows from a relation that an operation will return. A "clause", in this case, is a filter on a relation, such as the "X = 1" and the "Y < 10" in "WHERE X = 1 AND Y < 10". PostgreSQL uses functions in clausesel.c to find clauses whose combined selectivity differs from the product of their individual selectivities. For instance, in "WHERE X < 4 AND X < 5", the "X < 5" is redundant; the clauses' combined selectivity is simply that of "X < 4". With "WHERE Y > 4 AND Y < 10", clausesel.c can determine that we really want the selectivity of the clause "4 < Y < 10". It's also smart enough to recognize "pseudo-constants": values from non-volatile functions, or from the outer relation of a nested loop. Although these values aren't truly constants, they remain constant at the level of the query where the clause will be applied, and can be treated as constants.

With any luck, one day clausesel.c will also know enough to notice cases where, for instance, although "foo.x = 3" and "foo.y > 10" are individually true for much of table "foo", there are very few rows where both conditions are true.

The importance of offline community

Today, the June issue of the Open Source Business Review debuts. It features nine women who are active in open source development - as developers, organizers and business leaders.

I wrote about offline community, and how techies in Portland, OR manage to get connected to each other, and how they've encouraged participation from women. You don't often find women at a user group or even a tech-focused meetup at a restaurant or bar. Portland, somehow, has managed to encourage the women in the community to participate and lead. I talk about what I think the factors were that led to the higher percentage of women involved in the community today.

Let me know what you think!