End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Announcing Release of PostgreSQL System Impact (PGSI) Log Analyzer

The PostgreSQL System Impact (PGSI) log analyzer is now available at http://bucardo.org/wiki/Pgsi.

System Impact (SI) is a measure of the overall load a given query imposes on a server. It is expressed as a percentage of a query's average duration over the its average interval between successive calls.

Queries are collected into canonical form with respect to literals and bind params; further, IN lists of varying cardinality are collapsed. Thus, queries that differ only in argument composition will be collected together in the evaluation. However, logically equivalent queries that differ in any other manner of structure (say two comparisons between AND that are transposed) will be seen as distinct.

The goal of SI is to identify those queries most likely to cause performance degradation on the database during heaviest traffic periods. Focusing exclusively on the least efficient queries can hide relatively fast-running queries that saturate the system more because they are called far more frequently. By contrast, focusing only on the most-frequently called queries will tend to emphasize small, highly optimized queries at the expense of slightly less popular queries that spend much more of their time between successive calls in an active state. These are often smaller queries that have failed to be optimized and punish a system severely under heavy load.

PGSI requires full PostgreSQL logging through syslog with a prescribed format. Specifically, log_statement must be 'all' and log_duration must be 'on'. Given a continuous log interval of any duration, PGSI will calculate reports in wiki-ready format with the following data over that interval:

  • First line defines suggested wiki page name for the given report
  • Log interval over which the report applies
  • SI, sorted from worst to best
  • Average duration of execution for the canonical query
  • Total count of times canonical query was executed
  • Average interval between successive executions
  • Standard deviation of the duration
  • Display of the canonical query
  • List of log entries for best- and worst-duration instances of the canonical query (only if report was generated using the --offenders option).

PGSI can be downloaded in tar.gz format or can be accessed from Git, its version-control system. To obtain it from git, run:

git clone http://bucardo.org/pgsi.git/

Contributions are welcome. Send patches (unified output format, please) to mark@endpoint.com.

7 comments:

Ethan Rowe said...

Yay!

This is a terrific tool. The balance of query runtime versus execution frequency is something that can easily elude the engineer when trying to identify low-hanging optimization fruit. While the actual system impact number does not (to my knowledge) have any kind of absolute meaning, the relative values of the queries analyzed makes the offenders stick out clearly.

Thanks for getting it out there, Mark.

Mark Johnson said...

While the actual system impact number does not (to my knowledge) have any kind of absolute meaning, the relative values of the queries analyzed makes the offenders stick out clearly.

It does have a specific meaning. SI indicates the number of instances of the given query that, on average, are running over the interval at any given moment. The simplest example is a SI of 100. This means that exactly 1 instance of this query is running over the interval at all times, on average. SI=200 -> 2 instances at all times, on average.

Thinking of it as a percentage tends to be more useful, I think, for other values. That is, SI of 80 means 0.8 instance of the query running at all times (which never would actually be true since you can't have non-integer instances of a query running), but you can think of it instead as 1 instance of the query running 80% of the time over the interval.

Devrim Gündüz said...

Are there any other settings that we need to do in postgresql.conf other than setting log_duration, log_statement and log_destination?

Also, anything else we need to do on (r)syslog.conf?

Mark Johnson said...

Are there any other settings that we need to do in postgresql.conf other than setting log_duration, log_statement and log_destination?

Also, anything else we need to do on (r)syslog.conf?


The code does require a specific format for the log entries, which is specified in the docs.

%Y-%m-%dT%H:%M:%S(%:z)? server postgres[pid]:

Denis said...

Any chance you could post the relevant logging configuration bits from a server that does output the correct format? I have tried PGSI on a couple of servers now and none matched properly..

This does sound like a terriffic tool!

Greg Sabino Mullane said...

This was designed for use with [r]syslog, and the log files look like this (all one line):

2009-02-24T08:09:29-07:00 hostname postgres[11902]: [130-1] LOG: statement: DEALLOCATE dbdpg_13

As far as I know, there is no special syslog.conf or rsyslog.conf setting as to the output format.

The test suite that comes with it has a log_line_prefix that is used to emulate syslog's output, as doing unit tests with syslog is a whole lot tricker than just using a log file you can control.

Ezekiel said...

In at least one place that we are using PGSI, we have commented out one of rsyslog's default options:

"RSYSLOG_TraditionalFileFormat"

This option makes rsyslog use "traditional" timestamps. We wanted the more verbose timestamp that rsyslog uses when it is not told otherwise. This may be your issue, Denis.