End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Changing Postgres pg_dump warnings into errors with sed

Turning off warnings when loading a PostgreSQL data dump can be a trickier task than it may first appear. One of the goals of a program I was working on was to automatically load the schema of a production database into a development system. The program needed to stay as quiet as possible - the users running it did not need to be distracted by annoying messages generated by Postgres as the schema was loaded.

The problem occurred when a text-based dump of the database (created via pg_dump with the --schema-only flag) was loaded into an existing database with the help of the the psql program. To load the file "test.sql" into the database wilber, one usually does:

$ psql wilber -U postgres -f test.sql

There are two general classes of "noise" that are generated by such an action. I'll show how to make both kinds quiet.

The first problem is that psql by default is very verbose and every single command gives an echoing confirmation of what just occurred. Thus, a typical schema load outputs a lot of things like this:

$ psql wilber -U postgres -f test.sql
SET
SET
SET
SET
SET
CREATE EXTENSION
COMMENT
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
CREATE SEQUENCE

These types of messages are easy to turn off. Simply add the --quiet flag when calling psql, usually abbreviated to just -q, like so:

$ psql wilber -q -U postgres -f test.sql

The other type of unwanted output that can appear are the various messages from Postgres itself. These have different levels of severity: which ones are shown to you are determined by the client_min_messages setting. From least important to most important, the levels are:

  • DEBUG
  • LOG
  • NOTICE
  • WARNING
  • ERROR
  • FATAL
  • PANIC

The output I was seeing looked like this:

psql:test.sql:22716: WARNING:  => is deprecated as an operator name
DETAIL:  This name may be disallowed altogether in future versions of PostgreSQL.

Not something I wanted users to see every time they ran the script! The solution was to set client_min_messages to something higher than WARNING. In this case, ERROR is a good choice. All warning-level notices will be suppressed. The SQL would look like this:

SET client_min_messages = ERROR;

However, because we are loading a file, there is no easy way to enter in SQL commands before the file is loaded. Luckily, there is a very-poorly-documented trick that can apply some settings before a program (such as psql) is run. Just set the PGOPTIONS environment variable. It allows you to pass certain options to psql and other programs. For the case at hand, you would do this:

$ PGOPTIONS='--client-min-messages=error' psql wilber -q -U postgres -f test.sql

This will ensure that client_min_messages is set to ERROR before psql starts to load the file. If you are using a Perl script (as I was), you can set the environment variable before making a system call:

$ENV{PGOPTIONS} = '--client-min-messages=error';
$COM = 'psql wilber -q -U postgres -f test.sql';
system $COM;

However, if your file is from a pg_dump, you will find that this trick does not work! Why? When pg_dump creates a file, it adds some SET commands to the top of it. One of those is client_min_messages, which will clobber whatever you have set it to. D'oh! Here's what pg_dump currently sticks at the top of its output:

$ pg_dump --schema-only | head -20 | grep -n '^SET'
5:SET statement_timeout = 0;
6:SET client_encoding = 'UTF8';
7:SET standard_conforming_strings = on;
8:SET check_function_bodies = false;
9:SET client_min_messages = warning;

What to do? The schema file was very large, so editing it was not an option - and certainly not for this automated task. The answer was to modify the large schema text file before loading it. While perl is my goto tool for most things, this looked the perfect job for the sed program. So, in my script, I simply added this right before psql is invoked:

  $COM = "sed --in-place '1,/client_min_messages/ {s/warning/ERROR/}' test.sql";
  system $COM;

The short explanation is that it replaces the 'warning' on line 9 of the file with the word 'error'. This prevented any of the annoying warnings from showing up, and made the script nice and quietly happy again.

The long explanation about how it works is the rest of this post. :)

The sed program is a "stream editor" designed to quickly transform text. The man page is not very useful: I recommend learning more about sed via an online tutorial. This one is my favorite, not least because of the "modem noise" joke (kids, look that one up).

First we instruct sed, via the --in-place flag to make the changes directly to the file itself, rather than sending output to stdout. This is usually abbreviated to simply -i, but I consider the longer form more polite, as it's a little less obtuse to someone not familiar with the flags. Consider it a cheap way to comment your code.

After that, in single quotes, is the actual instructions we pass to sed. Finally, we give it the file to perform the work on. Again, the instructions to sed were:

'1,/client_min_messages/ {s/warning/ERROR/}'

This has two sections: what to do, and where to do it. By default, sed operates on every line of the file. As we only want sed to change the first occurrence of client_min_messages, we tell it to limit its actions:

1,/client_min_messages/

You can instruct sed to operate on a range of lines by using the X,Y format, aka start,stop. A lone number represents a line number in the file; in this case we want to start at the first line of the file. You can also provide a regular expression. In this case, we want to stop processing when we find a line that matches the regular expression /client_min_messages/. Note that these limits are inclusive, so that we still process the matched line.

The next section tells sed what to do:

' {s/warning/ERROR/}'

Note that both the extra space and the braces are not needed, but are there simply to make things easier to read. You can also change the delimiter, which can arguably help as well; here is an equivalent version:

'1,/client_min_messages/ {s:warning:ERROR:}'

We are telling sed to make a simple substitution of the word warning with the word ERROR. I used uppercase here to make it more obvious to anyone viewing the file that a change had been made (Postgres does not care what the case is).

Normally, this might be considered a fairly sloppy regex, as it doesn't match very strictly - any lines up to and including the client_min_messages will have the word warning replaced. However, because we can be confident that the pg_dump output is going to remain very standard, we can let it slide. A more proper solution might be:

'1,/client_min_messages/ {s:client_min_messages = warning:client_min_messages = ERROR:}'

Why didn't I simply replace the regex and be done, like this?:

's/client_min_messages = warning/client_min_messages = ERROR/'

Two reasons. First, by setting a range, we ensure that we only make the change at the top of the file, and not anywhere else where it may be set (not by pg_dump itself, but perhaps as the text inside a function. You never know). This is also more efficient, as sed doesn't need to scan the lines for a potential replacement after it hits the first occurrence at line 9.

The second reason is that we can now safely run this against the file more than once, and not worry about a bad replacement or about losing efficiency. If we search for the 'warning' version of the regex, then the second time through we may end up scanning the entire file (and any substitutions we do make will be in the wrong place).

To recap, the winning lines in Perl were:

$COM = "sed --in-place '1,/client_min_messages/ {s/warning/error/}' test.sql";
system $COM;

Once we have this, we no longer need the PGOPTIONS setting. Which was a little dangerous as a global $ENV inside the perl script anyways, as the setting persists even after the first system call! So sed has saved the day, or at least made the users of this program happier!

1 comment:

Jon Jensen said...

Nice write-up, and good to see the cautious wisdom in only substituting up until that first match.

For the record, using the long --in-place had the opposite effect for me of what you intended: I am well familiar with -i but had never seen --in-place and had to look it up to see if it was some new, fancier upgrade to -i.

It turns out to be specific to GNU sed, and isn't in BSD and Mac OS X sed, nor in perl which has a -i option modeled on sed's. So I would argue that --in-place is more confusing than -i.

cp -f vs. cp --force or cat -v vs. cat --show-nonprinting, anyone? :)