End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

State of the Postgres project

It's been interesting watching the MySQL drama unfold, but I have to take issue when people start trying to drag Postgres into it again by spreading FUD (Fear, Uncertainty, and Doubt). Rather than simply rebut the FUD, I thought this was a good opportunity to examine the strength of the Postgres project.

Monty recently espoused the following in a blog comment:

"...This case is about ensuring that Oracle doesn't gain money and market share by killing an Open Source competitor. Today MySQL, tomorrow PostgreSQL. Yes, PostgreSQL can also be killed; To prove the case, think what would happen if someone managed to ensure that the top 20 core PostgreSQL developers could not develop PostgreSQL anymore or if each of these developers would fork their own PostgreSQL project."

Later on in his blog he raises the same theme again with a slight bit more detail:

"Note that not even PostgreSQL is safe from this threat! For example, Oracle could buy some companies developing PostgreSQL and target the core developers. Without the core developers working actively on PostgreSQL, the PostgreSQL project will be weakened tremendously and it could even die as a result."

Is this a valid concern? It's easy enough to overlook it considering the Sturm und Drang in Monty's recent posts, but I think this is something worth seriously looking into. Specifically, is the Postgres project capable of withstanding a direct threat from a large company with deep pockets (e.g. Oracle)?

To get to the answer, let's run some numbers first. Monty mentions the "top 20" Postgres developers. If we look at the community contributors page, we see that there are in fact 25 major developers listed, as well as 7 core members, so 20 would indeed be a significant chunk of that page. To dig deeper, I looked at the cvs logs for the year of 2009 for the Postgres project, and ran some scripts against them. The 9185 commits were spread across 16 different people, and about 16 other people were mentioned in the commit notes as having contributed in some way (e.g. a patch from a non-committer). So again, it looks like Monty's number of 20 is a pretty good approximation.

However (and you knew there was a however), the catch comes from being able to actually stop 20 of those people from working on Postgres. There are basically two ways to do this: Oracle could buy out a company, or they could hire (buy out) a person. The first problem is that the Postgres community is very widely distributed. If you look at the people on the community contributors page, you'll see that the 32 people work for 24 different companies. Further, no one company holds sway: the median is one company, and the high water mark is a mere three developers. All of this is much better than it was years ago, in the total number and in the distribution.

The next fly in the ointment is that buying out a company is not always easy to do, despite the size of your pockets. Many companies on that list are privately held and will not sell. Even if you did buy out the company, there is no way to prevent the people working there from then moving to a different company. Finally, buying out some companies just isn't possible, even if you are Oracle, because there are some big names on the list of people employing major Postgres developers: Google, Red Hat, Skype, and SRA. Then of course there is NTT, which is a really, really big company (larger than Oracle). NTT's Postgres developers are not always as visible as some of the English-speaking ones, but NTT employs a lot of people to work on Postgres (which is extremely popular in Japan).

The second way is hiring people directly. However, people can not always be bought off. Sure, some of the developers might choose to leave if Oracle offered them $20 million dollars, but not all of them (Larry, I might go for $19 million, call me :). Even if they did leave, the depth of the Postgres community should not be underestimated. For every "major developer" on that page, there are many others who read the lists, know the code well, but just haven't, for one reason or another, made it on to that list. At a rough guess, I'd say that there are a couple hundred people in the world who would be able to make commits to the Postgres source code. Would all of them be as fast or effective as some of the existing people? Perhaps not, but the point is that it would be nigh impossible to thin the pool fast enough to make a dent.

The project's email lists are as strong as ever, to such a point that I find it hard to keep up with the traffic, a problem I did not have a few years ago. The number of conferences and people attending each is growing rapidly, and there is a great demand for people with Postgres skills. The number of projects using Postgres, or offering it as an alternative database backend, is constantly growing. It's no longer difficult to find a hosting provider that offers Postgres in addition to MySQL. Most important of all, the project continues to regularly release stable new versions. Version 8.5 will probably be released in 2010.

In conclusion, the state of the Postgres project is in great shape, due to the depth and breadth of the community (and the depth and breadth of the developer subset). There is no danger of Postgres going the MySQL route; the PG developers are spread across a number of businesses, the code (and documentation!) is BSD, and no one firm holds sway in the project.

16 comments:

Stefan Kaltenbrunner said...

Very good summary on the state of the project but I think you somehow undercounted the number of non-commiter contributors. The commitfest database alone lists more than 70 different contributors(including commiters) during the current development cycle.
If you also consider people providing smaller patches (say as part of a bug report) you will end up with an even larger number than that.

Greg Sabino Mullane said...

Stefan:

Yes, I need a better metric. I've come up with higher numbers in the past when scanning the cvs commit logs - maybe people are not attributing others as much as they used to? Good point about the commitfest as well, that's another good indication of the strength of the project. (Maybe someone with more free time than me can manually comb through the cvs logs and get a better number - I used a simple name search algorithm)

David Fetter said...

Greg,

I think buying off enough individual contributors could dent the project for a year or two, but not permanently. The cat's been out of the bag for way too long.

linuxpoet said...

I don't think it would even be possible to buy off all the contributors. Considering the number of corporations that have a vested interest in the success of PostgreSQL that aren't even PostgreSQL companies.

One would just step in, in place of the person who was bought by Oracle. Also Oracle is known to be an extremely shrewd company. Attacking PostgreSQL in this manner doesn't make sense for them.

Anonymous said...

I am sick and tired of hearing FUD from mysql zealots. They are literally saying that if Oracle buys MySQL, `the INTERNET will no longer be free.`

They are not even aware of alternatives, they just go around screaming what Monty said.

I love PostgreSQL and have been using it for many years, since mysql was unable to handle my large datasets.

Thanks PostgreSQL!

Gary Chambers said...

I have always been surprised by the popularity of MySQL, and have always considered it a toy database for educational purposes, at best (regardless of the fact that it was used by some very large web sites). I can't recall a single person making even a weak argument for MySQL's superiority to PostgreSQL, much less a compelling one.

leifbk said...

In reply to Gary Chambers: It doesn't make sense to use MySQL for «educational purposes», as you have to unlearn a lot of MySQLisms later as you start working with real databases.

Robert Hodges said...

Nice post Greg. It seems to me that the real problem PostgreSQL faces is not getting taken over by evildoers but scaling processes to deal with an increasingly large community while maintaining software quality. As one example, it is now quite hard to figure out the actual status of key features in new releases like 8.5. Progress is still tracked manually by dedicated people on the project. This works for small projects (like the PG of yore) but not for big ones.

Robert Hodges said...

@Gary:
Argument #1 for MySQL: built-in logical replication. Projects like SLONY, Londiste, and Bucardo simply don't cut the mustard in comparison. Hot standby and log streaming will begin to alter the balance more in favor of PostgreSQL but they are still unreleased.

Jay said...

Great post, Greg!

Greg Sabino Mullane said...

Robert Hodges:

built-in logical replication. Projects like SLONY, Londiste, and Bucardo simply don't cut the mustard in comparison

Which parts do you see as lagging? I think those three are excellent for many common use cases, but I'm always interested to hear specific areas for improvement.

Anonymous said...

Greg,

I don't think your post expresses that you're talking about committers and major code contributors to the core postgreSQL code. We have, of course, several score minor code contributors and contributors to drivers and accessory prjects you're not including in that list of 32 or 70.

--Josh Berkus

Anonymous said...

http://www.dbdebunk.com/page/page/1934913.htm

Anonymous said...

@Robert Hodges

Check out www.commandprompt.com and the mammoth replicator. It's getting much better. They have a small team working on it, but it is actively being worked on. It seems to do a pretty decent job and I'm sure they'd welcome any help they can get. (and it has been open sourced)

mrx said...

I am a complete Postgres fan / user / evangelist. However, replication is simply not close to MySQL. I talk about Slony, I haven't investigated more exotic solutions.

1) changes in database structure doesn't propagate to slaves (this is a complete showstopper for some).

2) it is not scaling well with number of slaves.

I cannot think of any department where PG is behind MySQL but replication is unfortunately bad.

Anonymous said...

For those where some advance features isn't really required they won't even think about spending $$ to get commercial liecense of either mysql,ms sql or oracle. Since postgre could easily handle large portion of their work.