Welcome to End Point’s blog

Ongoing observations by End Point people

PostgreSQL Conference - PGCon 2010 - Day One

The first day of talks for PGCon 2010 is now over, here's a recap of the parts that I attended.

On Wednesday, the developer's meeting took place. It was basically 20 of us gathered around a long conference table, with Dave Page keeping us to a strict schedule. While there were a few side conversations and contentious issues, overall we covered an amazing amount of things in a short period of time, and actually made action items out of almost all of them. My favorite *decision* we made was to finally move to git, something myself and others have been championing for years. The other most interesting parts for me were the discussion of what features we will try to focus on for 9.1 (it's an ambitious list, no doubt), and DDL triggers! It sounds like Jan Wieck has already given this a lot of thought, so I'm looking forward to working with him in implementing these triggers (or at least nagging him about it if he slows down). These triggers will be immensely useful to replication systems like Bucardo and Slony, which implement DDL replication in a very manual and unsatisfactory way. These triggers will not be like the current triggers, in that they will not be directly attached to system tables. Instead, they will be associated with certain DDL events, such that you could have a trigger on any CREATE events (or perhaps also allowing something finer grained such as a trigger on a CREATE TABLE event). Whenever it comes in, I'll make sure that Bucardo supports it, of course!

The first day of talks kicked off the the plenary by Gavin Roy called "Perspectives on NoSQL" (description and slides are available). Gavin actually took the time to *gasp* research the topic, and gave a quick rundown of some of the more popular "NoSQL" solutions, including CouchDB, MongoDB, Cassandra, Project Voldemort, Redis, and Tokyo Tyrant. He then benchmarked all of them against Postgres for various tasks - and did it against both "regular safe" Postgres and "running with scissors" fsync-off Postgres. The results? Postgres scales, very well, and more than holds it own against the NoSQL newcomers. MongoDB did surprisingly well: see the slides for the details. His slides also had the unfortunate portmanteau of "YeSQL", which only helps to empahsize how silly our "PostgreSQL" name is. :)

The next talk was Postgres (for non-Postgres people) by Greg Sabino Mullane (me!). Unlike previous years, my slides are already online. Yes, at first blush, it seems a strange talk to give at a conference like this, but we always have a good number of people from other database systems that are considering Postgres, are in the process of migrating to Postgres, or are just new to Postgres. The talk was in three parts: the first was about the mechanics of migrating your application to Postgres: the data types that Postgres uses, how we implement indexes, the best way to migrate your data, and many other things, with an eye towards common migration problems (especially when coming from MySQL). The second part of the talk discussed some of the quirks of Postgres people coming from DB2, Oracle, etc. should be aware of. Some things discussed: how Postgres does MVCC and need for vacuum, our really smart planner and lack of hints, the automatic (and against the spec) lowercasing, and our concept of schemas. I also touched on what I see as some of our drawbacks: tuned for a toaster, no true in place upgrade, the unpronounceable name, the lack of marketing. and what some of our perceived-but-not-real drawbacks are: lack of replication, poor speed. What would a list of drawbacks be without a list of strengths?: transactional DDL, very friendly and helpful community, PostGIS, authentication options, awesome query planner, the ability to create your own custom database objects, and our distributed nature that ensures the project cannot be bought out or destroyed. The last part of the talk went over the Postgres project itself: the community, the developers, the philosophy, and how it all fits together. I ran out of time so did not get to tell my "longest patch process ever" story for \dfS (six years!) but I don't think I missed anything important and gave time for some questions.

The next talk was Hypothetical Indexes towards self-tuning in PostgreSQL by Sergio Lifschitz. In the words of Sergio:

Hypothetical indexes are simulated index structures created solely in the database catalog. This type of index has no physical extension and, therefore, cannot be used to answer actual queries. The main benefit is to provide a means for simulating how query execution plans would change if the hypothetical indexes were actually created in the database. This feature is quite useful for database tuners and DBAs.

It was a very interesting talk. Robert Haas asked him to put it in the PostgreSQL license so we can easily put it into the project as needed. Sergio promised to make the change immediately after the talk!

After lunch, the next talk was pg_statsinfo - More useful statistics information for DBAs by Tatsuhito Kasahara. This talk was a little hard to follow along, but had some interesting ideas about monitoring Postgres, a lot of which overlapped with some of my projects such as tail_n_mail and check_postgres.

The next talk was Forensic Analysis of Corrupted Databases by Greg Stark. This was a neat little talk; many of the error messages he displayed were all too familiar to me. It was nice overview of how to track down the exact location of a problem in a corrupted database, and some strategies for fixing it, including the old "using dd to write things from /dev/zero directly into your Postgres files" trick. There was even a discussion about the possibility of zeroing out specific parts of a page header, with the consensus that it would not work as one would hope.

After a quick hacky sack break with Robert Treat and some Canadian locals, I went to the final real talk of the day: The PostgreSQL Query Planner by Robert Haas. I had seen this talk recently, but wanted to see it again as I missed some of the beginning of the talk when I saw it at Pg East 2010 in Philly. Robert gave a good talk, and was very good at repeating the audience's questions. I didn't learn all that much, but it was a very good overview of the planner, including some of the new planner tricks (such as join removal) in 9.0 and 9.1.

After that, the lightning talks started. I really like lightning talks, and thankfully they weren't held on the last day of the conference this time (a common mistake). The MC was Selena Deckelmann, who did a great job of making sure all the slides were gathered up beforehand, and strictly enforced the five minute time limit. The list of slides is on the Postgres wiki. I talked on my latest favorite project, tail_n_mail - the slides are available on the wiki. I didn't make it through all my slides, so if you were at the talks, check out the PDF for the final two that were not shown. There seemed to be good interest in the project, and I had several people tell me afterwards they would try it out.

The night ended with the EnterpriseDB sponsored party. I spoke to a lot of people there, about replication, PITR scripts, log monitoring, the problem with a large number of inherited objects, and many other topics. Note to EDB: I don't think that venue is going to scale, as the conference gets bigger each year! The total number of people at the conference this year was 184, a new record.

A very good first day: I learned a lot, met new people, saw old friends, and hopefully sold Postgres to some non-Postgres people :). I also managed to git push some changes to tail_n_mail, check_postgres, and Bucardo. It's hard to say no to feature requests when someone asks you in person. :)

Learn more about End Point's Postgres Support, Development, and Consulting.


Ethan Rowe said...


Thanks for the write-up. It sounds pretty interesting.

The comparison of NoSQL and Postgres is nice, but some of the conclusions strike me as a little cavalier. For instance: what happens in the Postgres scenario when a backend server dies? In Cassandra, Voldemort, Riak, HBase, etc., the data is inherently redundant, there's no such thing as standby or failover, and operations continue as normal.

Additionally, equating sharding as a means of horizontal scalability with the kind of distribution you get with the Dynamo-inspired architectures (again, like Cassandra) oversimplifies things immensely. Sharding does not give you availability, so you're stuck propping up each shard. Additionally, you have to manage the distribution of the data somehow, which is a pretty significant annoyance. It's not that the technique is bad or invalid, but it's simply not equivalent. Furthermore, there's a lot more to the CAP balancing act in the Dynamo family of systems than is made clear here; the balancing of CAP is configurable by use case in those systems, and is not as simple as "no consistency" (in contrast to the slide to the contrary).

I don't have a particular horse in the race here, mind you.

Jon Jensen said...

Thanks for the detailed report, Greg. I feel closer to having actually been there than any other Postgres conference I've read about, which is nice since I really would've liked to attend this one. :)

Michael said...

Greg: Your Postgres (for non-PG) People talk sounds really interesting to us: we're two weeks into a project to migrate our Oracle-based applications to Postgres. We have 25 years Oracle experience, but 14 days Postgres. We're mired in all the details, looking out for the gotchas, and trying to do things right the first (or the second) time. Is there a recording of your talk? The slides are good, but really just hit the bullet points. Thanks!

Greg Sabino Mullane said...

Ethan: You really should have a long talk with Gavin Roy at some point. :)

Jon: I expect to see you at PgCon 2011!

Michael: Congratulations on the upgrade. :) All the talks were taped, but I am not sure when they will be available. You can join the PgCon announce mailing list; I assume that updates will be posted there.

If it wasn't clear from the slides, my talk is heavy on MySQL->Postgres migration (versus databaseX->Postgres). The good news is that one of the major reasons for this is that Oracle is much closer to Postgres than MySQL is, thus an Oracle migration generally has less gotchas than a MySQL one.