End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

10,000 databases on a PostgreSQL cluster

One of our clients wanted confirmation that PostgreSQL will have no problem handling 1000 or 2000 databases on a single database cluster. I remember testing some years ago, probably on Postgres 7.2 or 7.3, creating 1000 or so databases and finding that it worked fine. But that was a long time ago, the software has changed, and I thought I should make sure my old experiment results still stand.

There's a PostgreSQL FAQ question, "What is the maximum size for a row, a table, and a database?" but no mention of the maximum number (or more importantly, maximum practical number) of databases per cluster. So I threw together a test script to create 10,000 databases, each with between (randomly) 1-5 tables with 2 columns each (INTEGER and TEXT), each getting randomly between 1-10 inserts with random data up to 100 or so characters in the TEXT field.

I ran the test on PostgreSQL 8.1, the default that ships with Red Hat Enterprise Linux 5 (x86_64). The hardware was a desktop-class HP with an Intel Core 2 @ 1.86 GHz that wasn't always idle.

The short answer: Postgres 8.1 handles 10,000 databases just fine. \l in psql generates a long list of databases, of course, but returns quickly enough. Ad-hoc concurrency testing was fine. Running queries, inserts, etc. on a handpicked group of the various play databases worked fine, including while new databases were being created. During the creation process, the last database creates seemed about as fast the first. It took 2.75 hours to run.

This all is hardly a big surprise, but maybe by documenting it I'll save someone the bother of running your own test in the future.

Addendum: The actual limit on this platform is probably 31995 databases, because each database occupies a subdirectory in data/base/ and the ext3 filesystem has a limit of 31998 sub-directories per one directory, stemming from its limit of 32000 links per inode. The other 5 would be ., .., template0, template1, and postgres. (Thanks, Wikipedia.)

7 comments:

Anonymous said...

Two thoughts on the maximum number of databases limited by ext3:

1) Use another file system. ZFS, for example, support (at least theoretically) 2^48 entries in a single directory.

2) If ext3 must be used, create at least some of the directories using symbolic links.

James Parker

Magnus Hagander said...

One thing I've noticed that can bite you when you have too many databases is the stats subsystem (the stats file getting large and consuming your I/O - this happens mainly because of lots of relations though, but if you have 10k databases you likely have more than 1 or 2 tables in each)

Also, autovacuum that doesn't quite like the situation...

But it works :-)

Jon Jensen said...

James, using symlinks for some of the directories is a clever workaround to "outsource" or compartmentalize the link limitation to a different place or places.

Magnus, good point on overall number of relations. Sometime I'll have to try another test with more like 100 tables per database.

Michael said...

How about 20k tables in one database? We have that situation on our servers. And no, we can't easily change it ... ;-) To make it worse, the tables have about 25 columns on average, and there are 2-3 indexes per table.

Of course vacuuming such a database takes some time ...

wstrzalka said...

I've similiar installation. Instead of large number of databases there is large number of schemas - and of course relations.
As mentioned by Magnus the problem is with stats subsystem which takes a lot of CPU/IO all the time, even when the cluster has no user activity.

Vikram Anand said...

Does this still holds good with version 9.0?

Jon Jensen said...

I don't know of anything that's changed in the way PostgreSQL creates new databases, so this all should apply to PostgreSQL 9.0 as well.

ext4 doubles the number of subdirectories allowed, to 64K, or optionally, no limit at all:

http://en.wikipedia.org/wiki/Ext4#Features

I believe XFS has a very high (millions) limit on the number of subdirectories.

So either of those is going to be much better than ext3 if you're hosting a large number of databases on the same cluster.