End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

PostgreSQL with SystemTap

Those familiar with PostgreSQL know it has supported DTrace since version 8.2. The 8.4beta2 includes support for several new DTrace probes. But for those of us using platforms on which DTrace doesn't exist, this support hasn't necessarily meant much. SystemTap is a relatively new, Linux-based package with similar purpose to DTrace, available on Linux, and is under heavy development. As luck would have it, PostgreSQL's DTrace probes work with SystemTap as well.

A few caveats: it helps to run a very new SystemTap version (I used one I pulled from SystemTap's git repository today), and in order for SystemTap to have access to userspace software, your kernel must support utrace. I don't know precisely what kernel versions include the proper patches; my Ubuntu 8.04 laptop didn't have the right kernel, but the Fedora 10 virtual machine I just set up does.

Step 1 was to build SystemTap. This was a straightforward ./configure, make, make install, once I got the correct packages in place. Step 2 was to build PostgreSQL, including the --enable-dtrace option. This also was straightforward. Note that PostgreSQL won't build with the --enable-dtrace option unless you've already installed SystemTap. Finally, I initialized a PostgreSQL database cluster and started the database.

Here's where the fun starts. SystemTap's syntax differs from DTrace syntax. Here's an example probe SystemTap would accept:

probe process("/usr/local/pgsql/bin/postgres").function("eqjoinsel")
{
        printf ("%d\n", pid())
}

This tells SystemTap to print out the process ID (which comes from the SystemTap pid() function) each time the PostgreSQL eqjoinsel function is called. That's the function to estimate join selectivity with most equality operators, and gets called a lot, so it's a decently useful test. It also shows that SystemTap can probe inside programs without an explicitly defined probe. I saved this file as test.d, and ran it like this:

[josh@localhost ~]$ sudo stap -v test.d
Pass 1: parsed user script and 52 library script(s) in 160usr/220sys/641real ms.
Pass 2: analyzed script: 1 probe(s), 1 function(s), 1 embed(s), 0 global(s) in 40usr/60sys/331real ms.
Pass 3: translated to C into "/tmp/stapDD5a4p/stap_c0b737cdffdb48cec3fd55b631bb0656_1057.c" in 30usr/160sys/211real ms.
Pass 4, preamble: (re)building SystemTap's version of uprobes.
Pass 4: compiled C into "stap_c0b737cdffdb48cec3fd55b631bb0656_1057.ko" in 1510usr/3430sys/8052real ms.
Pass 5: starting run.
4521
4521
4521
4521

4521 is the process ID of the PostgreSQL backend I'm connected to, and it gets printed every time I type "\dt" in my psql session.

Now for something more interesting. Although SystemTap lets me probe whatever function I want, it's nice to be able to use the defined DTrace probes, because that way I don't have to find the function name I'm interested in, in order to trace something. Here are some examples I added to my test.d script, pulled more or less at random from the list of available DTrace probes in the PostgreSQL documentation. Note that whereas the documentation lists the probe names with dashes (or are these hyphens?), to make it work with SystemTap, I needed to use double-underscores, so "transaction-start" in the docs becomes "transaction__start" in my script.

probe process("/usr/local/pgsql/bin/postgres").mark("transaction__start")
{      
        printf("Transaction start: %d\n", pid())
}

probe process("/usr/local/pgsql/bin/postgres").mark("lwlock__condacquire") {
        printf("lock wait start at %d for process %d on cpu %d\n", gettimeofday_s(), pid(), cpu())
}

probe process("/usr/local/pgsql/bin/postgres").mark("sort__start") {
        printf("transaction abort at %d for process %d on cpu %d\n", gettimeofday_s(), pid(), cpu())
}

probe process("/usr/local/pgsql/bin/postgres").mark("smgr__md__write__done") {
        printf("smgr-md-write-done at %d for process %d on cpu %d\n", gettimeofday_s(), pid(), cpu())
}

...which resulted in something like this when I ran pgbench:

[josh@localhost ~]$ sudo stap -v test.d
Pass 1: parsed user script and 52 library script(s) in 130usr/150sys/286real ms.
Pass 2: analyzed script: 7 probe(s), 4 function(s), 2 embed(s), 0 global(s) in 30usr/30sys/120real ms.
Pass 3: translated to C into "/tmp/stapW9yfAQ/stap_f6f3ffd834ef5b249edcf7d1ca19dce2_3025.c" in 10usr/150sys/163real ms.
Pass 4, preamble: (re)building SystemTap's version of uprobes.
Pass 4: compiled C into "stap_f6f3ffd834ef5b249edcf7d1ca19dce2_3025.ko" in 1380usr/2690sys/4155real ms.
Pass 5: starting run.
Transaction start: 4894
Transaction start: 4894
lock wait start at 1243552147 for process 4907 on cpu 0
Transaction start: 4907
Transaction start: 4907
lock wait start at 1243552147 for process 4907 on cpu 0
Transaction start: 4907
lock wait start at 1243552174 for process 2770 on cpu 0
smgr-md-write-done at 1243552174 for process 2770 on cpu 0
smgr-md-write-done at 1243552174 for process 2770 on cpu 0
smgr-md-write-done at 1243552174 for process 2770 on cpu 0

This could be a very interesting way of profiling, performance testing, debugging, troubleshooting, and who knows what else. I'm interested to see SystemTap become more ubiquitous. I should note that I have no idea how SystemTap compares to DTrace or whether it will manage to do for Linux what DTrace can do on other operating systems. Time will tell, I guess.

UPDATE: As has been pointed out in the comments, compiling PostgreSQL with --enable-dtrace is only necessary if I want to use the built-in "taps" (the SystemTap word, apparently, for its equivalent of DTrace probes). Probing by function call, or any of the other probe methods SystemTap supports, works without --enable-dtrace.

UPDATE 2: It's important to note that the defined DTrace probes include sets of useful variables that DTrace and SystemTap scripts might be interested in. For instance, it's possible to get the transaction ID within the transaction__start probe. In SystemTap, these variables are referenced as $arg1, $arg2, etc. So in a transaction__start probe, you could say:

printf("Transaction with ID %d started\n", $arg1)

Git rebase: Just-Workingness Baked Right In (If you're cool enough)

Reading about rebase makes it seem somewhat abstract and frightening, but it's really pretty intuitive when you use it a bit. In terms of how you deal with merging work and addressing conflicts, rebase and merge are very similar.

Given branch "foo" with a sequence of commits:

foo: D --> C --> B --> A

I can make a branch "bar" off of foo: (git branch bar foo)

foo: D --> C --> B --> A
bar: D --> C --> B --> A

Then I do some development on bar, and commit. Meanwhile, somebody else develops on foo, and commits. Introducing new, unrelated commit structures.

foo: E --> D --> C --> B --> A
bar: X --> D --> C --> B --> A

Now I want to take my "bar" work (in commit X) and put it back upstream in "foo".

  • I can't push from local bar to upstream foo directly because it is not a fast-forward operation; foo has a commit (E) that bar does not.
  • I therefore have to either merge local bar into local foo and then push local foo upstream, or rebase bar to foo and then push.

A merge will show up as a separate commit. Meaning, merging bar into foo will result in commit history:

foo: M --> X --> D --> C --> B --> A
      \
       E --> D --> C --> B --> A
(The particulars may depend on conflicts in E versus X).

Whereas, from branch "bar", I could "git rebase foo". Rebase would look and see that "foo" and "bar" have commits in common starting from D. Therefore, the commits in "bar" more recent than D would be pulled out and applied on top of the full commit history of "foo". Meaning, you get the history:

bar: X' --> E --> D --> C --> B --> A

This can be pushed directly to "foo" upstream because it contains the full "foo" history and is therefore a fast-forward operation.

Why does X become X' after the rebase? Because it's based on the original commit X, but it's not the same commit; part of a commit's definition is its parent commit, and while X originally referred to commit D, this derivative X' refers instead to E. The important thing to remember is that the content of the X' commit is taken initially from the original X commit. The "diff" you would see from this commit is the same as from X.

If there's a conflict such that E and X changed the same lines in some file, you would need to resolve it as part of rebasing, just like in a regular merge. But those changes for resolution would be part of X', instead of being part of some merge-specific commit.

Considerations for choosing rebase versus merge

Rebasing should generally be the default choice when you're pulling from a remote into your repo.

git pull --rebase

Note that it's possible to make --rebase the default option for pulling for a given branch. From Git's pull docs:

To make this the default for branch name, set configuration branch.name.rebase to true.

However, as usual with Git, saying "do this by default" only gets you so far. If you assume rebase is always the right choice, you're going to mess something up.

Probably the most important rule for rebasing is: do not rebase a branch that has been pushed upstream, unless you are positive nobody else is using it.

Consider:

  • Steph has a Spree fork on Github. So on her laptop, she has a repo that has her Github fork as its "origin" remote.
  • She also wants to easily pull in changes from the canonical Spree Github repo, so she has that repo set up as the "canonical" remote in her local repo.
  • Steph does work on a branch called "address_book", unique to her Github fork (not in the canonical repo).
  • She pushes her stuff up to "address_book" in origin.
  • She decides she needs the latest and greatest from canonical. So she fetches canonical. She can then either: rebase canonical/master into address_book, or merge.

The merge makes for an ugly commit history.

The rebase, on the other hand, would make her local address_book branch incompatible with the upstream one she pushed to in her Github repo. Because whatever commits she pushed to origin/address_book that are specific to that branch (i.e. not on canonical/master) will get rebased on top of the latest from canonical/master, meaning they are now different commits with a different commit history. Pushing is now not really an option.

In this case, making a different branch would probably be the best choice.

Ultimately, the changes Steph accumulates in address_book should indeed get rebased with the stuff in canonical/master, as the final step towards making a clean history that could get pulled seamlessly onto canonical/master.

So, in this workflow, a final step for publishing a set of changes intended for upstream consumption and potential merge into the main project would be, from Steph's local address_book branch:

# get the latest from canonical repo
git fetch canonical
# rebase the address book branch onto canonical/master
git rebase canonical/master
# work through any conflicts that may come up, and naturally test
# your conflict fixes before completing
...
git push origin address_book:refs/heads/address_book_release_candidate

That would create a branch named "address_book_release_candidate" on Steph's Github fork, that has been structured to have a nice commit history with canonical/master, meaning that the Spree corefolks could easily pull it into the canonical repo if it passes muster.

What you would not ever do is:

git fetch canonical
# make a branch based off of canonical/master
git branch canonical_master canonical/master
# rebase the master onto address_book
git rebase address_book

As that implies messing with the commit history of the canonical master branch, which we all know to be published and therefore must not be subject to history-twiddling.

Google I/O 2009 day 1

I'm at Google I/O at the Moscone Center in downtown San Francisco, and today was the first day. Everything was bustling:

The opening keynote started with Google CEO Eric Schmidt, and I was worried wondering how he would make over an hour be interesting. He only took a few minutes, then Vic Gundotra, VP of Engineering, led the rest of the keynote which had many presenters showing off various projects, starting with 5 major HTML 5 features already supported in Chrome, Firefox, Safari, and Opera:

Matt Waddell talked about Canvas, the very nice drawing & animation API with pixel-level control. Brendan Gibson of Backcountry.com used this at SteepandCheap.com and sister sites for the cool People on Site graphs (with a workaround for Internet Explorer which doesn't support Canvas yet). Also a quick demo of Bespin, an IDE in the browser.

Matt Papakipos showed off o3d, 3-D in the browser with just HTML 5, JavaScript, and CSS. Also the new <video> tag that makes video as easy as <img> is. Geolocation has come a long way with cell tower and wi-fi ID coverage over much of the globe.

Jay Sullivan, VP of Mozilla, showed off Firefox 3.5's upcoming features. Basically all of the above plus app cache & database (using SQLite) and web workers (background JavaScript that won't freeze the browser).

Michael Abbott, SVP of Palm, showed off their webOS 1.0 which uses HTML 5.

A good summary of the 5 big features of HTML 5 is in Tim O'Reilly's blog post about it.

Kevin Gibbs & Andrew Bowers of Google gave some numbers about Google App Engine: 200K+ developers, 80K+ apps. Coming in App Engine: background processing, large object storage, database export, XMPP, incoming email. He also showed off Google Web Toolkit a bit, with code written in Java that compiles down to JavaScript with per-browser tweaks automatically handled.

DeWitt Clinton, Tech Lead at Google, showed Google Web Elements, embeddable Google apps similar to the way YouTube & AdSense have always worked. Currently conversations, maps, search. A blog post by Tim O'Reilly gives more details about Web Elements.

Romain Guy, Software Engineer at Google, showed off Android's coming text to speech functionality. Then all attendees were told we'll be receiving a new Google Ion (aka HTC Magic) phone, the unlocked developer edition, with a SIM card for T-Mobile giving 30 days of unlimited 3G data & domestic voice so we can play with it. That was enthusiastically received. Certain attendees such as myself were hoping there'd be a discounted way to buy one at the conference, so this surprise worked out nicely. :) Various people wrote this up in more detail. Here's mine getting unpacked:

The rest of the conference was split into various tracks, and I stuck mostly with Google App Engine talks which were good. Most useful was Brett Slatkin's on using Datastore's list properties with separate entities just for lists that can be used just for their indexes in queries without serializing/deserializing the lists which avoids a lot of CPU overhead but is a little tricky to set up.

The after-hours party (dinner, music, silly video games, etc.) is now winding up, and a semi-drunk guy is walking around with a garbage can asking for laptops we want to throw away. I still need this one for a while longer, so I declined his helpful offer.

Writing Procedural Languages - slides

Although I'll be working to change this, the slides for my "Writing a PostgreSQL Procedural Language" tutorial available from the PGCon website are from an earlier iteration of the talk. The current ones, which I used in the presentation, are available here, on Scribd.

PGCon thus far

Though it might flood the End Point blog with PGCon content, I'm compelled to scribble something of my own to report on the last couple of days. Wednesday's Developers' Meeting was an interesting experience and I felt privileged to be invited. Although I could only stay for the first half, as my own presentation was scheduled for the afternoon, I enjoyed the opportunity to meet many PostgreSQL "luminaries", and participate in some of the decisions behind the project.

Attendance at my "How to write a PostgreSQL Procedural Language" tutorial exceeded my expectations, no doubt in part, at least, because aside from the Developers' Meeting it was the only thing going on. Many people seem interested in being able to write code for the PostgreSQL backend, and the lessons learned from PL/LOLCODE have broad application. It was suggested, even, that since PL/pgSQL converts most of its statements to SQL and passes the result to the SQL parser, PL/LOLCODE would have less parsing overhead than PL/pgSQL. Ensuing discussions of high performance LOLCODE were cancelled due to involuntary giggling.

Between talks I've had the opportunity to meet a wide variety of PostgreSQL users and contributors, and been interested to see various peoples' ideas for future development. Perhaps it will result in a blog post one day, but suffice it to say there's lots of activity under way. Most surprising to me has been the interest in my (still embryonic) work with multi-column statistics. On a number of different occasions people have unexpectedly asked me about it. Thanks to a hallway conversation with Tom Lane, another of the hard problems involved has a possible solution, the probable subject of yet another blog post.

Thanks to the organizers, sponsors, speakers, helpers, etc. who have made the conference possible so far; I'm looking forward to today.

PgCon: the developer's meeting and the 2009 keynote

Yesterday, I spent the entire day at a Postgres Developers meeting, discussing what happened over the last year, and how we're going to tackle a series of critical problems in the next year. We talked about how to get the Synchronous Replication and Hot Standby patches completed, important adoption issues, our continued participation in the SQL Standards committee (a surprising number of people were interested!), moving forward with alpha releases after commitfests (woo!), and creating a better infrastucture for managing modules and addons to Postgres.

That evening, a few of us were treated by Paul Vallee of Pythian Group to dinner and a trip to another of Ottawa's great local pubs. We discussed the future of open source databases and the relative quality of beer in Ottawa, Portland and the UK. Of course, I think Portland has the best beer ;)

This morning, Dan introduced everyone to the start of the sessions, and then Dave, Magnus and I managed to get through the keynote. It was mostly an opportunity to announce 8.4 Beta2, plug a few of the talks and mention all the different individuals involved in development. And have a laugh about our conference tshirts.

I have an hour and a half until I give the Power psql talk and then tonight is the big EnterpriseDB party. And one more talk tomorrow. And lightning talks. What a full conference :)

PgCon: Preparing the keynote, more talks and today is Developer Meeting day

I spent most of Tuesday polishing up slides for my VACUUM strategy talk, reviewing the Power psql talk slides, working a little bit and then meeting up with all the new arrivals.

Dave Page and Greg Stark rescued Magnus and I from the coffee shop and we settled in at the Royal Oak for the evening. Dave, Magnus and I decided on the theme "Why people are choosing Postgres" for our keynote, and we managed to produce a few slides to guide us!

Peter Eisentraut was there and I chatted briefly about his fun FUSE project for Postgres that he'll be giving a Lightning Talk about on Friday. (There is still time to give a lightning talk, by the way! Find me, or just update the wiki and I'll add you to the agenda.)

I also saw CB (one of the database gurus) from Etsy there, and I'm hoping to meet up with him and a few more people this evening. Tom Lane and I chatted a little bit about my experience at MySQL Conference, and how things seem to be going with Drizzle.

All in all, had a great evening and I even survived Dave's frequent refilling of my beer glass. I'm looking forward to today's Developer Meeting.

PGCon: First day in Ottawa

I arrived in Ottawa late Sunday night a little in advance of the conference. I'm spending a couple days working on the final bits of my slides, and spending a little time with friends in the Postgres community that I only get to see once a year!

I started the morning with Dan Langille, the PGCon organizer, Magnus Hagander, and Josh Berkus. During that conversation, I managed to avoid being assigned to give the keynote on Thursday by myself, but instead enlisted Magnus and Dave Page to come up with something together with me. They gave a keynote together at PgDay EU, so I figured I would be in good company.

One project that I've helped with in the past is the code that runs planet.postgresql.org. Magnus Hagander and I spent most of yesterday renaming the project, identifying the next few features we'd like to add, and getting the source tree moved over to git.postgresql.org.

I'm hoping we have a little more time between tweaking slides to get our new features finished and deployed to the production server today.

Competitors to Bucardo version 1

Last time I described the design and major functions of Bucardo version 1 in detail. A natural question to ask about Bucardo 1 is, why didn't I use something else already out there? And that's a very good question.

I had no desire to create a new replication system and work out the inevitable kinks that would come with that. However, nothing then available met our needs, and today still nothing I'm familiar with quite would. So writing something new was necessary. Writing an asynchronous multimaster replications system for Postgres was not trivial, but turned out to be easier than I had expected thanks to Postgres itself -- with the caveats noted in the last post.

But, back to the landscape. What follows is a survey of the Postgres replication landscape as it looked in mid-2002 when I first needed multimaster replication for PostgreSQL 7.2.

pgreplicator

PostgreSQL Replicator is probably the most similar project to Bucardo 1. It was released in 2001 and does not appear to have had any updates since October 2001. I don't recall why I didn't use this, but from reviewing the documentation I suspect it was because it hadn't been updated for PostgreSQL 7.2, it used PL/Tcl, and required a daemon to run on every node. But the asynchronous store-and-forward approach, the use of triggers and data storage tables is similar to Bucardo 1.

dbmirror

I don't remember whether this was around in 2002, but it's part of PostgreSQL contrib now. It is master/slave replication only.

Slony-I

I don't think Slony-I existed in 2002 -- version 1.0 was released in 2004. But in any case, it only does master/slave replication.

Slony2

There has been no code released from this project and the website is now gone.

erserver

Master/slave replication, abandoned in favor of Slony-I. Website is now gone.

Postgres-R

This was a research project that worked with PostgreSQL 6.4. Some Postgres-R design documents were published. An effort to port it to PostgreSQL 7.2 (the pgreplication project) did not appear to have gotten very far. In 2008 it seems to have been partially revived. I don't know what the current status is.

PGCluster

This didn't exist in 2002. I'm not sure where it's at now. I believe it uses synchronous replication.

pgpool

This isn't the kind of "replication" I wanted; it's database load balancing and multiplexing. The pgpool listener is a single point of failure, and all databases must be accessible or data will be lost on a database server that is down.

Usogres

Master/slave replication for backup purposes.

Mammoth PostgreSQL + Replication

This didn't exist in 2002. It is only master/slave replication. It began as proprietary software but I believe is open source now.

EnterpriseDB Replication Server

A proprietary offering that came out in 2005 or 2006, for master/slave replication only. Has apparently been replaced by Slony, or perhaps was always rebranded Slony.

pgComparator

An rsync-like tool for comparing databases. Didn't exist in 2002. Probably much better than Bucardo 1's compare operation.

DBBalancer

Kind of like pgpool, more of a connection pooler. Hasn't been updated since 2002.

DRAGON

"Database Replication based on Group Communication." Links to this project were defunct.

DBI-Link

DBI-Link isn't about replication.

(Summary)

I assembled this list some time back and have made some updates to it. I'm sure there are more to consider today. Please comment if you have any corrections or additions.

The design of Bucardo version 1

Since PGCon 2009 begins next week, I thought it would be a good time to start publishing some history of the Bucardo replication system for PostgreSQL. Here I will cover only Bucardo version 1 and leave Bucardo versions 2 and 3 for a later post.

Bucardo 1 is an asynchronous multi-master and master/slave database replication system. I designed it in August-September 2002, to run in Perl 5.6 using PostgreSQL 7.2. It was later updated to support PostgreSQL 7.4 and 8.1, and changes in DBD::Pg's COPY functionality. It was built for and funded by Backcountry.com, and various versions of Bucardo have been used in production as a core piece of their infrastructure from September 2002 to the present.

Bucardo's design is simple, relying on the consistently correct behavior of the underlying PostgreSQL database software. It made some compromises on ideal behavior in order to have a working system in a reasonable amount of time, but the compromises are few and are mentioned below.

General design

Bucardo 1 needed to:

  • Support asynchronous multimaster replication.
  • Support asynchronous master/slave replication of full tables and changes to tables.
  • Leave frequency of replication up to the administrator, which came by default since each replication event is a separate run of the program.
  • Preserve transaction atomicity and isolation across databases.
  • Continue collecting change information even when no replication process is running.
  • Be fairly efficient in storing changes and in bandwidth usage sending them to the other database.
  • Have a default "winner" in collision situations, with special handling possible for certain tables where more intelligent collision merges could be done.
  • Not require any database downtime for maintenance, upgrades, etc.
  • Be fairly simple to understand and support.
  • Support a data flow arrangement such that the replicator is behind a firewall and reaches out to an external database, but doesn't require inbound access to the internal database.

Operations

There are four types of database operations Bucardo 1 can perform:

  • peer - synchronize changes in one or more tables between two peer databases (multi-master)
  • pushdelta - copy only changed rows from a table or set of tables from a master database to a slave database
  • push - copy an entire table or set of tables from a master database to a slave database
  • compare - compare all rows of one or more tables between two databases

I will discuss each of these operations in turn.

Peer sync

The peer sync operation is the most groundbreaking feature of Bucardo 1. The much smaller Backcountry.com of 2002 wanted to have an internal master database in their office, which housed their customer service and warehouse employees, buyers, and management. Their office had a low-bandwidth and not entirely reliable Internet connection. Their e-commerce web, application, and database servers were at a colocation facility with a fast Internet connection, and they wanted an identical master database to reside there, so that in the case of any disruption in connectivity between their office and colocation facility, both locations could continue to function independently, and their databases would automatically synchronize after connectivity was restored.

To summarize, what they needed is multi-master replication. Their needs would be satisfied with asynchronous multi-master replication. That meant that it was acceptable for the databases to be current with each other with 1-2 minutes of lag time. (Synchronous multi-master replication requires a continuous connection between the two master databases, and transactions are not allowed to commit until the transaction is completed on both databases.)

I want to review some of the features that are required for multi-master replication to work. First, it needs to have ACID properties just as the underlying database itself. The most relevant properties for our multi-master replication system are atomicity and isolation. A transaction must be entirely visible on a given database, or not visible at all.

For example, let us imagine that a customer ecommerce order consists of exactly 1 row in the "orders" table, which references 1 row in the "users" table, and the following tables may have 0 or more rows pointing to the "orders" table:

  • order_lines
  • order_notes
  • credit_cards
  • payments
  • gift_certificates
  • coupon_uses
  • affiliate_commissions
  • inventory

To add an order to the source database, a transaction is started, rows are added to relevant tables, the transaction is committed, and then those rows will all appear to other database users at once. Until the transaction is committed, no changes are visible. If an error occurs, the entire transaction rolls back, and it will never have been seen by any other database user.

This ensures that warehouse employees, customer service representatives, etc. will never see a partial order. This is especially important since we don't want to ship an order that is missing some of its line items, or double-charge a credit card because we didn't have a payment record yet. And an order without its associated inventory records would have trouble shipping at the warehouse.

This is all standard ACID stuff. But since I was writing a multi-master replication system from scratch, I had to assure the same properties across two database clusters, for which PostgreSQL had no facilities.

Changes are tracked by having a "delta table" paired with every table that's part of the multi-master replication system. The table has three columns: the primary key in the table being tracked, the wallclock timestamp, and an indicator of whether the change was due to an insert, update, or delete. Every change in the table being tracked is recorded by rules and triggers that insert a corresponding row in the delta table.

This is what the delta table for "orders" looks like (simplified a bit for readability):

                      Table "public.orders_delta"
    Column     |     Type    |                Modifiers 
---------------+-------------+-----------------------------------------
 delta_key     | varchar(14) | not null
 delta_action  | char(1)     | not null
 last_modified | timestamp   | not null default timeofday()::timestamp
Check constraints:
    "delta_action_valid" CHECK (delta_action IN ('I','U','D'))
Triggers:
    orders_delta_last_modified BEFORE INSERT OR UPDATE ON orders_delta
        FOR EACH ROW EXECUTE PROCEDURE update_last_modified()

The new row data itself in the tracked table is not copied, because the data is right there for the taking. It is enough to note that a change was made. If multiple changes are made, only the most recent version of the row is available, but that is fine because that's the only one we need to replicate.

Because nothing outside of the database is required to track changes, the tracking continues even when Bucardo 1 is not running. As long as the delta table exists and can be written to, and the tracking rules and triggers are in place on the tracked table, the changes will be recorded.

Bucardo 1 achieves atomicity and isolation of the replication transaction with this process:

  1. Open a connection to the first database, set transaction isolation to serializable, and disable triggers and rules.
  2. Open a connection to the second database, set transaction isolation to serializable, and disable triggers and rules.
  3. For each table to be synchronized in this group:
    1. Verify that the table's column names and order match in the two databases.
    2. Walk through the delta table on the first database, making identical changes to the second database. Empty the delta table when done.
    3. Walk through the delta table on the second database, making identical changes to the first database. Empty the delta table when done.
    4. Make a note of any changes that were made to the same rows on both databases ("conflicts"). By default, we resolve the conflicts silently by allowing the designated "winner" database's change be the one that remains. For certain tables such as "inventory", appropriate table-specific conflict resolution code was added that merged the changes instead of designating a winner and loser version of the row.
  4. Once all changes have succeeded, commit transactions on both databases.

This last step of the process does not satisfy the ACID durability requirement. Since Bucardo 1 was designed on PostgreSQL 7.2, with no 2-phase commit possible, there is a chance that one database will fail to commit its transaction after the other database already did, and the changes will be lost on one side only. This has never happened in practice, mostly due to the fact that committing a transaction in PostgreSQL is a nearly instantaneous operation, since the data is already in place and no separate rollback or log tables need to be modified. But it is certainly possible that it could happen, and it is an undesirable risk. With real 2-phase commit now available in PostgreSQL, complete durability could be achieved.

All of a sudden, the changes on each side are now available to the other side, all at once. Only entire orders are visible, never partial orders.

ACID consistency is achieved by assuming that due to PostgreSQL's integrity checks on the source database, the data was already consistent there, and it is copied verbatim to the destination database where it will still be consistent. Thus, CHECK constraints, referential integrity constraints, etc. are expected to be identical between the two databases. Bucardo 1 does not propagate database schema changes.

Thus the main principles to provide fairly reliable replication are:

  1. All related tables must be synchronized within the same transaction.
  2. Synchronization must always be done in both directions in the same transaction, so that the code can detect simultaneous change conflicts.
  3. The most recent change to a given row must of course be the last change, so changes should be replayed in order. (We optimize this by not copying over row changes that we know will be deleted later in the same transaction.)

Things to consider with multi-master replication:

  1. Conflicts are less likely the more often the synchronization is performed. But conflicts can still happen, and must be resolved somehow. Creating a generic conflict resolution mechanism is difficult, but declaring a "winning" database is easy and special conflict resolution logic can be added for certain tables where lost changes would be troublesome.
  2. Very large change sets can take a long time to synchronize. For example, consider an unintentionally large update like this:

    UPDATE inventory SET quantity = quantity + 5

    That may change hundreds of thousands of rows, all in a single transaction. Our replication system need to make all those changes in a single transaction to the other database, but it must do so over a comparatively slow Internet connection. As transactions run longer, they often encounter locks from other concurrent database activity, and rollback. Then the process must start over, but now there are even more changes to copy over, so it takes even longer. In the worst situations, the synchronization simply cannot complete until other concurrent database activity is temporarily stopped, so that no locks will conflict. And that means downtime of applications, and manual intervention of the system administrator.

    Perhaps you could ship over all the data to the other database server ahead of time, then begin transactions on both databases and make the changes based on the local copy of the data, and expect the changes to be accepted more quickly since the network is no longer a bottleneck. But the destination database won't have been idle during that copying, which needs to be accounted for.

    Statement replication does not have this same weakness, but it has many weaknesses of its own.

  3. Sequences need to be set up to operate independently without collisions on the two servers in a peer sync. Two easy ways to do this are:
    1. Set up sequences to cover separate ranges on each server. For example, MAXVALUE 999999 on the first server, and MINVALUE 1000000 on the second server. Make sure to spread the ranges far enough apart that they'll never likely collide.
    2. Set up sequences to supply odd numbers on one server, and even on the other. For example, START 1 INCREMENT 2 on the first server, and START 2 INCREMENT 2 on the second server.
  4. A primary key is required. Currently, it must be a single column, and must be the first column in the table.
  5. Because each table's primary key may be of a different datatype, and to keep queries on delta tables as simple as possible, Bucardo 1 uses a separate delta table for each table being tracked.
  6. A more pluggable system for adding table-specific collision handling would be nice.
  7. The delta table column "delta_action" isn't actually necessary -- inserts and updates are already handled identically, and deletes can be inferred from the join on the tracked table. The "delta_action" is perhaps a nice bit of diagnostic information, and not burdensome as a CHAR(1), but otherwise could be removed.
  8. It's important that the delta table's "last_modified" column be based on wallclock time, not transaction start time, because we only keep the most recent change, and if all changes within a transaction are tagged by transaction start time, we'd end up with an arbitrary row as the "most recent" one, resulting in inconsistent data between the databases.

Pushdelta

The pushdelta operation uses the same kind of delta tables and associated triggers and rules that the peer sync uses, but is a one-way push of the changed rows from master to slave. It is useful for large tables that don't have a high percentage of changed rows.

The pushdelta operation currently only supports a single target database. The ability to use pushdelta from a master to multiple slaves would be useful.

Push

The push operation very simply copies entire tables from the master to one or more slaves, for each table in a group. It requires no delta tables, triggers, or rules.

Table pushes can optionally supply a query that will be used instead of a bare "SELECT *" on the source table. Any query is allowed that will result in matching columns for the target table. We've used this to push out only in-stock inventory, rather than the whole inventory table, for example.

No primary key is required on tables that are pushed out in full.

The push operation uses DELETE to empty the target table. It would be good to optionally specify that TRUNCATE be used instead, and to take advantage of the PostgreSQL 8.1 multi-table truncate feature on tables with foreign key references.

Compare

The compare operation compares every row of the tables in its group, and displays any differences. It is a read-only operation. It can be used to make sure that tables to be used in multi-master replication start out identical, and later, to verify correct functioning of peer, pushdelta, and push operations.

The compare operation is fairly slow. It reads in all primary keys from both tables first, then fetches each row in turn. It could be made much more efficient.

Options

Optionally, tables can be vacuumed and/or analyzed after each operation.

In earlier versions of Bucardo 1, there was also an option to drop and rebuild all indexes automatically, to reduce index bloat, but beginning with PostgreSQL 7.3, primary key indexes could not be dropped when foreign keys required them, and the index bloat problem was dramatically reduced in PostgreSQL 7.4, mostly eliminating the need for the feature.

Limitations

Some of these are limitations that could easily be lifted, but no need had arisen. Some are minor annoyances, and others are major feature requests.

  1. For peer, pushdelta, and compare operations, a primary key is required. There are currently limitations on that key:
    1. Only single-part primary keys are supported.
    2. The primary key is assumed to be the first column. It would be easy to allow specifying another column as the primary key, or to interrogate the database schema directly to determine the key column, but we've never needed it.
  2. If an operation of one type is already underway, other operations of the same type will be rejected. It would be much more convenient for the users to add the newly requested operation to a queue and perform it when the current operation has finished.
  3. The program stands alone, performing a single operation and exiting. It was designed to run from cron. A persistent daemon that accepts requests in a queue or by message passing could better handle the many operations needed on a busy server.
  4. The program could use PostgreSQL's LISTEN and NOTIFY feature to learn of changes in a table and run a peer sync based on that notification, instead of being run on a timed schedule or on demand.
  5. Delta tables and triggers must be created or removed manually, though our helper script makes that fairly easy. It would be nice to have Bucardo automatically create delta tables and triggers as needed, or remove them when no longer needed (so that the overhead of tracking changes isn't incurred).
  6. Delta tables clutter the schema of the tables they are connected to. PostgreSQL didn't yet have the schema (namespace) feature when Bucardo 1 was created, but it would be nice to centralize the delta tables and functions in a separate schema.
  7. The datatypes of the fields in tables being replicated are not compared; only the names and order are compared.
  8. The configuration file syntax is fairly unpleasant.
  9. Only tables can be synchronized. It would be good to add support for views, sequences, and functions as first-class objects that could be pushed from master to slave or synchronized between two masters.
  10. It would be more convenient, and could reduce the chance of trouble due to misconfiguration, if Bucardo would interrogate the database to learn of all foreign key relationships between tables so that it could automatically create groups of tables that need to be processed together. Trigger functions and rules can cause changes to one table's row to modify rows in other table(s), in an opaque way that is resistant to introspection, but Bucardo could offer a location for users to declare what other tables a function can affect, and use that in building its dependency tree.
  11. There is no unit test suite.
  12. The insert trigger and update_last_modified function are written in PL/pgSQL, and are the only dependency on PL/pgSQL. They are both simple functions and should work fine as plain SQL functions, but it seems like there was a reason I had to use PL/pgSQL -- I just can't remember why anymore.
  13. In Bucardo 1, permission to insert to the various delta tables must be granted to any user that would change the base tables, or changes will be prevented by PostgreSQL. For a database with many users of varying access levels, this is a pain. It would be better to define the function to run as SECURITY DEFINER, and create the function as the superuser. Then no explicit permission would need to be granted on any delta table, and the delta tables would be inaccessible except through the Bucardo 1 API (except to the superuser). That would necessitate a change to using functions for updates and deletes, which currently are tracked by rules.

Future

Bucardo 1 performed admirably for Backcountry.com for over 4 years. The most serious problems, already mentioned above, have been the lack of a queue for push and pushdelta requests, limitations of running one-off processes from cron, limited row collision resolution, and bogging under a large insert or update that happens inside a single transaction.

Greg Sabino Mullane then created Bucardo 2, which is a rearchitected system built around all new code. It has all the important features of Bucardo 1, addressed most of Bucardo 1's deficiencies, and added many of the desired features listed above. We hope to publish some design notes about Bucardo 2 in the near future.

The Name

I originally gave Bucardo 1 the fairly descriptive but uninspiring name "sync-tables". Greg Sabino Mullane came up with the name Bucardo, a reference to the logo of this program's patron, Backcountry.com. You can read about attempts to clone the extinct bucardo in the Wikipedia articles Bucardo and Cloning.

Rails Conf 2009 - Company Report

RailsConf 2009 concluded last week so its time for me to talk about some of the highlights for my fellow teammates that could not make it. I think one of the more interesting talks was given by Yehuda Katz. The talk was on the "Russian Doll Pattern" and dealt with mountable apps in the upcoming Rails 3.0 release (slides available here.) Even though he felt like it wasn't his best talk I thought it was quite interesting. Personally, I thought it was refreshing to see something that was not yet complete. The Rails core team should do more of this kind of thing as it provides the community a chance to give feedback on features before they're set in stone.

The Rails Envy guys were there and gave a very interesting presentation about innovations in Rails this past year. I was pleasantly surprised to see that Spree made this list and they even included the new interface in their screenshots. Gregg made some excellent videos from the conference as well which capture some of the spirit of the conference.

This year I had a chance to meet Fabio Akita in person. Fabio has a great blog called Akita on Rails which has a huge following in Brazil. He also does a lot of interesting in-depth interviews with people. This year I had the honor of being the subject of one of those interviews. I basically talked about the origins of the Spree project and what makes it special. You can listen to this and his other interviews here.

One of the more whimsical talks was about Ruby performance. The Phusion guys (who brought us Passenger) created a hilarious version of Wolfenstein written entirely in Ruby. Gregg Pollack put up a fun video showing how it looked. Supposedly this was to show off how Ruby can in fact scale but it seemed more like an excuse to make a lot of inside jokes. Even so, one or two fun presentations like this tend to liven up what would otherwise be a pretty dull conference.

The Spree BOF talk went well. It was very informal and we used it as a chance to meet some of our users and to find out what kinds of features they would like to see in future versions Spree. Steph already gave an excellent summary of this so I won't rehash it here.

Overall this was a great conference. Las Vegas was a pretty good location due to its cheap hotels and convenient flights. The hotel itself was not the best and the cigarette smoking was completely out of hand. I have it on good information that the conference will be in a different location next year. Rumor has it that it may even be on the east coast. Wherever it is, I definitely look forward to going back next year!

Learn more about End Point's rails development and spree development.

Operating system upgrades

This won't be earth-shattering news to anyone, I hope, but I'm pleased to report that two recent operating system upgrades went very well.

I upgraded a laptop from Ubuntu 8.10 to 9.04, and it's the smoothest I've ever had the process go. The only problem of any kind was that the package download process stalled on the last of 1700+ files downloaded, and I had to restart the upgrade, but all the cached files were still there and on reboot everything worked including my two-monitor setup, goofy laptop audio chipset, wireless networking, crypto filesystem, and everything else.

I also upgraded an OpenBSD 4.3 server that is a firewall, NAT router, DHCP server, and DNS server, to OpenBSD 4.5. It was the first time I used the in-place upgrade with no special boot media and fetching packages over the network, as per the bsd.rd instructions, and it went fine. Then the extra packages that were there before had to be upgraded separately as per the FAQ on pkg updates. I initially scripted some munging of pkg_info's output, not realizing I could simply run pkg_add -u and it updates all packages.

There was one hangup upgrading zsh, which I just removed and reinstalled. Everything else went fine, and all services worked fine after reboot.

How pleasant.

Spree at RailsConf

Last week at RailsConf 2009, the Spree folks from End Point conducted a Birds of a Feather session to discuss Spree, an End Point sponsored open source rails ecommerce platform. Below is some of the dialog from the discussion (paraphrased).

Crowd: "How difficult is it to get Spree up and running from start to finish?"
Spree Crew: "This depends on the level of customization. If a customer simple needs to reskin the site, this shouldn't take more than a week (hopefully much less than a full week). If the customer needs specific functionality that is not included in core functionality or extensions, you may need to spend some time developing an extension."

Crowd: "How difficult is it to develop extensions in Spree?"
Spree Crew: "Spree extension work is based on the work of the Radiant community. Extensions are mini-applications: they allow you to drop a pre-built application into spree to override or insert new functionality. Documentation for extensions is available at the spree github wiki. We also plan to release more extensive Spree Guides documentation based on Rails Guides soon."

Spree Crew: "How did you hear about Spree?"
Crowd: "My client and I found it via search engines. My client thought that Spree looked like a good choice."

Spree Crew: "What other platforms did you consider before you found spree?"
Crowd: "Magento", "Substruct", "My client considered Magento, but I know several people that have developed with Magento and have found it difficult to override core functionality."

Spree Crew: "What types of functionality were missing from Spree that you'd like to see developed in the future?"
Crowd: "My client wanted checkout split into multiple steps instead of the new single page checkout. I was able to implement this by overriding the Spree checkout library and checkout views.", "My client needed complex inventory management.", "My client needed split shipping functionality."

Crowd: "What is the plan for Spree with regards to CMS development?"
Spree Crew: "There has been some discussion on the integration of a CMS into Spree. No one in the Spree community appears to be currently working on this. Contributions in this area are welcome. Also, Yehuda Katz is giving a talk on mountable apps - the Spree community would like to investigate the implications this has for Spree."

Crowd: "What are the next steps for localization, especially multilingual product descriptions?"
Spree Crew: "This is on the radar for future Spree development. It is not currently in development, and again, contributions in this area are welcome."

From the discussion, I took away that some of the desired features for Spree are inventory management, split shipping functionality, cms integration, and improved localization. I hope that the application of Spree continues to contribute to it's progress. The Spree Crew also hopes to showcase some of the sites referenced above at the spree site.

Learn more about End Point's rails development and rails shopping cart development.

TLS Server Name Indication

I came across a few discussions of the TLS extension "Server Name Indication", which would allow hosting more than one "secure" (https) website per IP address/TCP port combination. The best summary of the state of things is (surprise, surprise) the Wikipedia Server Name Indication article. There are more details about client and server software support for SNI in Zachary Schneider's blog post and Daniel Lange's blog post.

I don't recall hearing about this before, but if I did I probably dismissed as being irrelevant at the time because there would've been almost no support in either clients or servers. But now that all major browsers on all operating systems support SNI except some on Windows XP it may be worth keeping an eye on this.

Yes, IE on Windows XP is still a huge contingent and thus a huge hurdle. But maybe Microsoft will backport SNI support to XP. Even if just for IE 7 and later. Or maybe we'll have to wait a few more years till the next Windows operating system (hopefully) displaces XP. Here's a case where the low popularity of Vista (which supports SNI) is hurting the rest of us.

I'm really looking forward to the flexibility of name-based virtual hosting for https that we've had for 10+ years with plain http. It could really change the setup and ongoing infrastructure costs for secure websites, such as ecommerce sites.

Rails Optimization @RailsConf

On the second day of RailsConf 2009, I attended a talk on Advanced Performance Optimization of Rails Applications. Although it was reminiscent of college as I felt compelled to write down and memorize lots of trivial information, I appreciate that I can actually apply the information. Below is a performance checklist for advanced optimization techniques covered in the talk.

Rails optimization:

Ruby optimization:

  • Date is 16* slower than Time
  • Use Date::Performance
  • Avoid the string+= method, Use string<< method instead
  • Compare like objects - comparing different types of objects is expensive.

Database optimization:

  • Use explain analyze
  • Use any(array ()) instead of in()
  • Push conditions into subselects and joins - postgresql doesn't do that for you.

Environment Optimization:

  • Buy more memory, optimize memory, set memory limits for mongrel (with monit)
  • Competing for memory cache is expensive on a shared server (must avoid database in cold state)
  • Use live debugging tools such as strace, oprofile, dtrace, monit, nagios
  • Pay attention to load balancing

User Environment Optimization:

  • Listen to yslow
  • Inherently slow javascript functions are eval, DOM selectors, css selectors, element.style changes, getElementById, getElementByName, style switching.

Other Topics:

  • Upgrade to ruby 1.9
  • Investigate using Jruby
  • Use Rack, which has been a hot topic at this RailsConf

Some final tips from the presentation were get benchmarks, use profiling tool like 'ruby-prof', optimize memory, pay attention to the garbage collection methods for the language, profile memory and measure! measure! measure!!!

Probably more important than the optimization details covered, the presentation served more valuable to remind me of the following:

Pay attention to all potential areas for optimization. As I've grown as a developer I've continued to add to my "optimization checklist".

When learning a new language, don't forget to pay attention to the the little details of the language. I should appreciate specific points that make a language unique from other languages, including inherently expensive functions.

Like other developers, sometimes I produce code to meet the performance criteria, but I don't have the luxury to spend time examining every area for optimization. I'd like to spend more time throughout a project paying attention to each of these points on my optimization checklist - and always work on doing it better the second time around.

Learn more about End Point's rails consulting and development.

Stuff you can do with the PageRank algorithm

I've attended several interesting talks so far on my first day of RailsConf, but the one that got me the most excited to go out and start trying to shoehorn it into my projects was Building Mini Google in Ruby by Ilya Grigorik.

In terms of doing Google-like stuff (which I'm not especially interested in doing), there are three steps, which occur in order of increasing level of interestingness. They are:

  • Crawling (mundane)
  • Indexing (sort of interesting)
  • Rank (neato)

Passing over crawling, Indexing is sort of interesting. You can do it yourself if you care about the problem, or you can hand it over to something like ferret or sphinx. I expect it's probably time for me to invest some time investigating the use of one or more of these, since I've already gone up and down the do it yourself road.

The interesting bit, and the fascinating focus of Ilya's presentation were the explanation of the PageRank algorithm and the implementation details as well as some application ideas. Hopefully I don't mess this up too badly, but as I understand it, it simplifies down to something like this.

A page is ranked to some degree by how many other pages link to it. This is a bit too simple, though, and trivially gamed. So, you make it a little more complex by modeling the following behavior, a random surfer will surf from one page to another by doing one of two things. They will either follow a link or randomly go to a non-linked page (sort of how I surf Wikipedia). There is a much higher probability (.85) that they will follow a link than that thay will teleport (.15). If you model this (hand waving here) then you come up with a nice formula (more hand waving) that can be used to calculate the page rank for a page in a given data set. A data set in this case is a collection of crawled pages.

For large data sets, these calculations can be somewhat intensive, so we are recommended to the good graces of the Gnu Scientific Library and the appropriate Ruby wrappers and the NArrary gem to do the calculations and array management.

One suggestion of a practical applications of this technology is to apply it to sets of products purchased together in a shopping cart to provide recommendations of the sort for 'people who bought that also bought this.' I'm pretty excited to try to implement this in Spree. But...

...what really piqued my interest was the idea that this could be applied to any graph. The Taxonomies/Taxons/ProductGroups with products could give me a nice big (depending on the size of the data set of course) directed graph to play with. The question, I suppose, is what the PageRank applied against such a graph means.

Cinco de Rails

Today was my first day of sessions at @railsconf. In fact, it was my first time at a technical conference and RailsConf. And because I am a relatively new to Rails development, I took away a plethora of information. I wrote down some notes on things I want to read, investigate, and research more about after the conference.

1) History of Rails Critics
David Heinemeier Hansson's keynote touched on how it's interesting to look back at some of the initial and ongoing rails critiques, such as "Ruby/Rails isn't scalable", "Rails isn't enterprise-ready", etc. and how arguments in support of Rails have grown stronger over time with the maturity of the platform. I'd like to spend some more time looking into some of these comments to be more aware of these issues.

2) Rails 3 Release
Anticipation builds in the Rails community for the announcement of Rails 3. I just recently joined the Rails development community in January and hadn't heard of the Rails vs. Merb debate until recently. I am interested in learning more about Merb and the background of the Rails/Merb merge.

3) PostRank
Appealing to my search engine optimization background, "social media measuring" offered at PostRank *essentially* applies the PageRank algorithm to the social web medium. These articles bring users what they measure to be the most credible measured by engagement. The Google: High Performance chat was presented by the PostRank founder. As social engagement becomes an increasingly important area on the web, this is an interesting company / business model I'd like to watch.

4) Yehuda Katz
I've heard many mentions of Yehuda Katz. He's very involved in the Rails space and someone to keep up to date with.

5) railstips.org
One of the recipients of this year's Ruby heroes award runs railstips.org, a great site for tutorials and development information. Adaptability and learning new things is a necessity as a consultant, and I'm always interested in finding new ways to learn more Rails.

6) Compass / Sass
With the release of Spree 0.8.0 (yesterday) came the integration of compass and sass. I went to the Birds of a Feather session on Compass and Sass integration and came away wanting to learn more about what distinguishes CSS from Sass and about how we can make use of the great functionality offered by Sass to benefit the Spree project. I'm going to check out this Sass screencast and hope to spend some time improving Spree's implementation of Sass.

7) Google Page Rank
I heard great things about the talk about Google: High Performance Computing in Rails, but did not attend. This talk was summarized by a coworker as a "few lines of ruby to implement the google Page Rank algorithm". Even though I missed it, I'm excited to check out the slides here.

8) Active Scaffold
I had a short discussion from an employee of PostRank who works in blog development. We briefly discussed the functionality/troubles around integrating a CMS into Spree. He recommended looking into Active Scaffold.

9) Advanced Git Techniques
One of my coworkers attended a presentation on advanced Git techniques and walked away happy. He mentioned it was a lot of information, and the notes are posted here. End Point's open source projects on GitHub - I'm always open to learning more tips to help me keep my Git repositories clean.

10) Rails Envy
Gregg Pollack is putting up videos from RailsConf at RailsEnvy. I want to make sure I catch these, whether it's during the conference or later on.

Announcing SpreeCamps.com hosting

On day two of RailsConf 2009, we are pleased to announce our new SpreeCamps.com hosting service. SpreeCamps is the quickest way to get started developing your new e-commerce website with Ruby on Rails and Spree and easily deploy it into production.

You get the latest Spree 0.8.0 that was just released yesterday, as part of a fully configured environment built on the best industry-standard open-source software: CentOS, Ruby on Rails, your choice of PostgreSQL or MySQL, Apache, Passenger, Git, and DevCamps. Your system is harmonized and pre-installed on high-performance hardware, so you can simply sign up and start coding today.

SpreeCamps gives you a 64-bit virtual private server and include backups, your own preconfigured iptables firewall, ping and ssh monitoring, and DNS. We also include a benefit unheard of in the virtual private server space: Out of the box we enforce an SELinux security policy that protects you against many types of unforeseen security vulnerabilities, and is configured to work with Passenger, Rails, and Spree.

SpreeCamps' built-in DevCamps system gives you development and staging environments that make it easy to work together in teams, show others your work in progress, and deploying your changes from development to production is as easy as "git pull".

And the best part of all? You can sign up now for just $95 per month, with no setup charge. We also offer hands-on training and support for Spree, DevCamps, and any other part of your application. Visit SpreeCamps.com for more information or to sign up for your own SpreeCamp.

(By the way, it works fine for hosting any other kind of Rails application, or whatever else you want too. We've already used it for one non-Rails website because it made getting going with DevCamps so easy.)

Learn more about End Point's rails development and rails shopping cart development.

Rails Conf Kicking Off in Less Than 24 Hours

Less then 24 hours until RailsConf kicks off. End Point is going to be there in force this year with three of its Spree contributors attending (myself included.) Looking forward to seeing everyone out there. We'll also be making a few big Spree related announcements so stay tuned! If you're a twitter user, follow me @railsdog. Follow Spree @spreecommerce. I'll also be blogging the conference on railsdog as well as here at the End Point blog. Finally, the SpreeCommerce site will be updated with various Spree related announcements.