News

Welcome to End Point’s blog

Ongoing observations by End Point people

OFFSET 0, FTW

A query I worked with the other day gave me a nice example of a useful PostgreSQL query planner trick. This query originally selected a few fields from a set of inner-joined tables, sorting by one particular field in descending order and limiting the results, like this:

SELECT <some stuff> FROM table_a INNER JOIN table_b ON (...)
INNER JOIN table_c ON (...) WHERE table_a.field1 = 'value'
ORDER BY table_a.field2 DESC LIMIT 20

The resulting query plan involved a bunch of index scans on the various tables, joined with nested loops, all based on a backward index scan of an index on the table_a.field2 column, looking for rows that matched the condition in the WHERE clause. PostgreSQL likes to choose backward index scans when there's a LIMIT clause and it needs result sorted in reverse order, because although backward index scans can be fairly slow, they're easy to interrupt when it finds enough rows to satisfy the LIMIT. In this case, it figured it could search backward through the index on table_a.field2 and quickly find 20 rows where table_a.field1 = 'value' is true. The problem was that it didn't find enough rows as quickly as it thought it would.

One way of dealing with this is to improve your statistics, which is what PostgreSQL uses to estimate how long the backward index scan will take in the first place. But sometimes that method still doesn't pan out, and it takes a lot of experimentation to be sure it works. That level of experimenting didn't seem appropriate in this case, so I used another trick. I guessed that maybe if I could get PostgreSQL to first pull out all the rows matching the WHERE clause, it could join them to the other tables involved and then do a separate sorting step, and come out faster than the plan that it was using currently. Step one is to separate out the part that filters table_a:

SELECT <some stuff> FROM
(SELECT * FROM table_a WHERE field1 = 'value') a
INNER JOIN table_b ON (...) INNER JOIN table_c ON (...)
ORDER BY a.field2 DESC LIMIT 20

The problem is that this doesn't change the query plan at all. PostgreSQL tries to "flatten" nested subqueries -- that is, it fiddles with join orders and query ordering to avoid subquery operations. In order to convince it not to flatten the new subquery, I added "OFFSET 0" to the subquery. This new query gives me the plan I want:

SELECT <some stuff> FROM
(SELECT * FROM table_a WHERE field1 = 'value' OFFSET 0) a
INNER JOIN table_b ON (...) INNER JOIN table_c ON (...)
ORDER BY a.field2 DESC LIMIT 20

This selects all rows from table_a where field1 = 'value', and uses them as a distinct relation for the rest of the query. This led to a distinct sorting step, and made the resulting query much faster than it had been previously.

CAVEAT: The query planner is pretty much always smarter than whoever is sending it queries. This trick just happened to work, but can be a really bad idea in some cases. It tells PostgreSQL to pull all matching rows out of the table and keep them all in memory (or worse, temporary disk files), and renders useless any indexes on the original table. If there were lots of rows matching the condition, this would be Very Bad. If one day my table changes and suddenly has lots of rows matching that condition, it will be Very Bad. It's because of potential problems like this that PostgreSQL doesn't support planner hints -- such things are a potent foot gun. Use with great care.

4 comments:

Neil said...

Actually, I believe the "LIMIT 20" means that even if PG can't flatten the subquery, it still doesn't need to materialize the entire subquery result -- you can only keep the top 20 results sorted by a.field2, which is what Greg's "Top k" ORDER BY patch for 8.3 does.

Joshua Tolley said...

If it were to push the limit into the subquery, it would end up with incorrect results, because the query uses inner joins. The top N rows in table_a aren't the same as the top N rows that result after a series of inner joins with some other tables. Perhaps, though, with a left or full join, where it can guarantee that every row from the subquery will be in the result set at least once, it could move the limit into the subquery and avoid materializing the entire result set. I just changed the join types on the query in question, and it apparently didn't push the limit into the subquery, though. Explicitly specifying a limit in the subquery, which again seems only valid in the case of left or full joins, gives another dramatic speed improvement, though. Thanks for bringing it up.

Neil said...

I was saying something slightly different: simply that there is no need to materialize the entire subquery result set. For example, PG can just compute the subquery one tuple at a time, and join that tuple against the other two tables in the parent query; and then keep the resulting join tuple if it is among the top 20 results seen so far, ordered by a.field2. The planner might choose to materialize the subquery result anyway (e.g. depending on the join method used for table_b and table_c), but it's not necessary.

As for pushing limits down to subqueries, I don't think PG does that automatically right now. But it's an interesting idea -- there's a bunch of work in the literature on optimizing "top K" queries that is related to this.

Joshua Tolley said...

Ah, understood. I'd figured it would have to completely finish dealing with the subquery before feeding the results into the joins with the other tables. That does make sense -- thanks.