News

Welcome to End Point’s blog

Ongoing observations by End Point people

Enforcing Transaction Compartments with Foreign Keys and SECURITY DEFINER

In support of End Point's evolving offering for multi-master database replication, from the precursor to Bucardo through several versions of Bucardo itself, our code solutions depended on the ability to suppress the actions of triggers and rules through direct manipulation of the pg_class table. Most PostgreSQL database developers are probably familiar with the construct we used from the DDL scripts generated by pg_dump at one time.

Disable triggers and rules on table "public"."foo":

UPDATE pg_class SET
        relhasrules = false,
        reltriggers = 0
    FROM pg_namespace
    WHERE pg_namespace.oid = pg_class.relnamespace
        AND pg_namespace.nspname = 'public'
        AND pg_class.relname = 'foo';

Re-enable all triggers and rules on "public"."foo" when finished with DML that must not fire triggers and rules:

UPDATE pg_class SET
        reltriggers = (
            SELECT COUNT(*) FROM pg_trigger
            WHERE pg_class.oid = pg_trigger.tgrelid
        ),
        relhasrules = (
            SELECT COUNT(*) > 0
            FROM pg_rules
            WHERE schemaname = 'public'
                AND tablename = 'foo'
        )
    FROM pg_namespace
    WHERE pg_namespace.oid = pg_class.relnamespace
        AND pg_namespace.nspname = 'public'
        AND pg_class.relname = 'foo';

In practice, the simple usage described for trigger and rule suppression worked reasonably well. However, it didn't always work well. In particular, there is a somewhat concerning state that exists between the two previously described events. The actions of disabling triggers/rules, then manipulating those relations affected, and then re-enforcing triggers/rules, must happen within the confines of a single transaction, and they must happen, period. The risk is that, at some point between the "open and shut" on pg_class, the transaction is committed and the "shut" never fires. If that happens, all database activity against the relations with disabled triggers and rules continues. I don't recall that we ever isolated the reasons why, on rare occasion, this happened; I only know that it did happen, and it was never welcome news.

In an effort to curb the worst aspect of this issue, I started with a simple question: how can I limit the transaction to a "safe compartment", thinking in terms of perl's Safe.pm? In this case, the "unsafe" action is "commit the transaction with triggers and rules disabled". But in reality, the unsafe list can be any conditions the developer needs to have exposed, but cannot make visible to the rest of the database.

An ancillary issue we faced, too, was the fact that any app code needing to suppress triggers and rules (beyond syncing, there were any number of DML requirements where it was undesirable for syncing to occur, and the pg_class manipulations were quite common) had to operate as the super user. While we had not had an incident where the postgres user for mundane operations had burned down the database, there was certainly concern about that potential.

The resolution I settled on was to construct a pair of functions that made use of the following features:

  • PostgreSQL's SECURITY DEFINER function attribute
  • Deferred foreign keys
  • The ON COMMIT DROP option for CREATE TEMP TABLE

The first function, safe_disable_trigrules(schema_name text, table_name text), is called after beginning a transaction and makes the necessary modifications to pg_class on behalf of schema_name.table_name. After the work within the transaction is finished, the second function--safe_reenable_trigrules(schema_name text, table_name text)--is called before issuing the commit. It, of course, puts pg_class back to the proper state.

Under the hood, the two functions create a dependency that only each other can satisfy when used as a non super user. Before safe_disable_trigrules() will manipulate pg_class, it creates a temp table with a self-referencing, deferred foreign key. Then, based on the schema and table args, it will insert a record for the relation defined by the args that violates the FK. Once the transaction's work is finished, but before committing, safe_reenable_trigrules() is called for every relation that safe_disable_trigrules() was called against and it will delete out the offending record for that relation alone. If the two functions are used properly, by the time of commit, the temp table is empty, thus having no foreign key violations, at which point the transaction can be safely committed. In the process of ensuring the temp table has no foreign key violations, pg_class has been fully restored to its pre-transaction state.

How each of the identified features is used:

  • Creating the functions with the SECURITY DEFINER attribute, we have now opened an access point for non-privileged users specifically for the purpose of the proscribed interaction with pg_class--and nothing more.
  • The temp table is created and owned by user postgres. There is no chance of the non-privileged user manipulating this table directly, accidentally or otherwise. Thus, the only positive escape for the transaction is through the use of the reenabling function.
  • Deferring the foreign key on the temp table allows us to ensure the transaction is in an invalid state at all times while pg_class is in its vulnerable condition without aborting the transaction.
  • ON COMMIT DROP allows the function to clean up after itself without having to make an explicit decision on the right time to drop the table. It allows a single temp table to be utilized per transaction, regardless of how may different relations will be passed through the trigger and rule disabling process.
  • Before the temp table actually drops, its deferred foreign keys are evaluated. If any rows are left in the table, it means for at least one relation we failed to call the reenable function and the entire transction is aborted rather than risk committing pg_class in the disabled state.

Converting to this system of pg_class manipulation completely eliminated the instances of finding pg_class in a committed state with triggers and rules disabled for various relations. It also allowed us to convert a number database-dependent scripts and applications from using the postgres user down to the appropriate application users.

CREATE FUNCTION safe_disable_trigrules (
        schema_name TEXT,
        table_name TEXT
    )
RETURNS void
LANGUAGE plpgsql
STRICT
SECURITY DEFINER
AS $$
DECLARE

    text_table_pk TEXT NOT NULL := '';
    text_fk_table TEXT NOT NULL := '';
    text_cur_min_msg TEXT;

BEGIN

    -- Stop any malicious shenanigans by user overloading
    -- relations or operators in a different schema and
    -- manipulating search_path to use them.

    PERFORM
        pg_catalog.set_config(
            'search_path',
            'pg_catalog, '
            OPERATOR(pg_catalog.||)
            (SELECT
                pg_catalog.current_setting(
                    'search_path'
                )
            ),
            TRUE
        );


    -- Shared PK for table to hold FK in violated state.
    -- This naming convention must not change without also
    -- reflecting the convention in safe_reenable_trigrules()
    -- so that both can immutably create the same name given
    -- the same arguments.

    text_table_pk :=
        schema_name || '_' ||
        table_name || '_' ||
        TO_CHAR(
            NOW(),
            'DHH24MISSMS'
        );

    -- Allowing for the same relation to have triggers and rules
    -- disabled and reenabled multiple times within the same
    -- transaction. On subsequent calls, the temp table will
    -- already exist.

    SELECT setting
    INTO text_cur_min_msg
    FROM pg_settings
    WHERE name = 'client_min_messages';

    UPDATE pg_settings
    SET setting = 'error'
    WHERE name = 'client_min_messages'
        AND text_cur_min_msg IS DISTINCT FROM 'error';

    -- Attempt to create the temp table. If first function call for
    -- transaction, it succeeds; otherwise, it fails silently unless
    -- error is something other than re-creating extant table.

    BEGIN
        -- Temp table for this transaction, with same shared
        -- convention as the PK above.
        text_fk_table :=
            'trigrules_' ||
            TO_CHAR(
                NOW(),
                'DHH24MISSMS'
            );

        -- Use ON COMMIT DROP so PG will garbage collect
        -- all such temp tables created within the transaction.

        EXECUTE
            'CREATE TEMP TABLE ' ||
                quote_ident(text_fk_table) || ' (
                id TEXT PRIMARY KEY NOT NULL,
                fk_id TEXT NOT NULL
                    CONSTRAINT "Must Call safe_reenable_trigrules() Before Commit"
                    REFERENCES ' ||
                    quote_ident(text_fk_table) || '
                    DEFERRABLE
                    INITIALLY DEFERRED
            )
            ON COMMIT DROP';

    EXCEPTION
        WHEN DUPLICATE_TABLE THEN
            -- Ignore
    END;

    UPDATE pg_settings
    SET setting = text_cur_min_msg
    WHERE name = 'client_min_messages'
        AND text_cur_min_msg IS DISTINCT FROM 'error';

    -- Insert new record that violates FK. Allowing for
    -- the function to be gracefully recalled on the same
    -- relation between calls to re-enable triggers and rules.

    EXECUTE '
        INSERT INTO ' ||
        quote_ident(text_fk_table) || '
        SELECT ' ||
            quote_literal(text_table_pk) ||
            ', ' ||
            quote_literal(text_table_pk || 'X') || '
        WHERE NOT EXISTS (
            SELECT 1
            FROM ' ||
            quote_ident(text_fk_table) || '
            WHERE id = ' ||
            quote_literal(text_table_pk) || '
        )';

    -- Disable all rules and triggers on target relation
    UPDATE pg_class SET
        relhasrules = false,
        reltriggers = 0
    FROM pg_namespace
    WHERE pg_namespace.oid = pg_class.relnamespace
        AND pg_namespace.nspname = schema_name
        AND pg_class.relname = table_name;

    -- Abort transaction if relation doesn't exist
    IF NOT FOUND THEN

        RAISE EXCEPTION
            'Table %.% does not exist',
            schema_name,
            table_name;

    END IF;

    -- reset search_path for users legitimately overloading
    -- operators or relations.

    PERFORM
        set_config(
            'search_path',
            (SELECT
                SUBSTRING(
                    current_setting('search_path')
                    FROM
                    '^pg_catalog, (.*)'
                )
            ),
            TRUE
        );
END;
$$
;


CREATE FUNCTION safe_reenable_trigrules (
        schema_name TEXT,
        table_name TEXT
    )
RETURNS void
LANGUAGE plpgsql
STRICT
SECURITY DEFINER
AS $$
DECLARE

    text_fk_table TEXT NOT NULL := '';
    text_table_pk TEXT NOT NULL := '';
    int_num_del INTEGER;

BEGIN

    -- Stop any malicious shenanigans by user overloading
    -- relations or operators in a different schema and
    -- manipulating search_path to use them.

    PERFORM
        pg_catalog.set_config(
            'search_path',
            'pg_catalog, '
            OPERATOR(pg_catalog.||)
            (SELECT
                pg_catalog.current_setting(
                    'search_path'
                )
            ),
            TRUE
        );

    -- Re-enable rules and triggers on target
    UPDATE pg_class SET
        reltriggers = (
            SELECT COUNT(*) FROM pg_trigger
            WHERE pg_class.oid = pg_trigger.tgrelid
        ),
        relhasrules = (
            SELECT COUNT(*) > 0
            FROM pg_rules
            WHERE schemaname = schema_name
                AND tablename = table_name
        )
    FROM pg_namespace
    WHERE pg_namespace.oid = pg_class.relnamespace
        AND pg_namespace.nspname = schema_name
        AND pg_class.relname = table_name;

    -- Shared PK for table to hold FK in violated state.
    -- This naming convention must not change without also
    -- reflecting the convention in safe_disable_trigrules()
    -- so that both can immutably create the same name given
    -- the same arguments.

    text_table_pk :=
        schema_name || '_' ||
        table_name || '_' ||
        TO_CHAR(
            NOW(),
            'DHH24MISSMS'
        );

    -- Temp table for this transaction, with same shared convention
    -- as the PK above.
    text_fk_table :=
        'trigrules_' ||
        TO_CHAR(
            NOW(),
            'DHH24MISSMS'
        );

    -- Remove pertinent row so FK is no longer in violated state
    EXECUTE
        'DELETE FROM ' ||
        quote_ident(text_fk_table) || '
        WHERE id = ' ||
        quote_literal(text_table_pk);

    GET DIAGNOSTICS int_num_del = ROW_COUNT;

    IF (int_num_del > 0) IS NOT TRUE THEN
        RAISE EXCEPTION
            'No entry for %.% set by safe_disable_trigrules()',
            schema_name,
            table_name;
    END IF;

    -- reset search_path for users legitimately overloading
    -- operators or relations.

    PERFORM
        set_config(
            'search_path',
            (SELECT
                SUBSTRING(
                    current_setting('search_path')
                    FROM
                    '^pg_catalog, (.*)'
                )
            ),
            TRUE
        );

END;
$$
;

6 comments:

Jon Jensen said...

Nice write-up, Mark.

You only mention direct pg_class manipulation in this article, which has known but rare non-MVCC behavior, as noted in this Bucardo source code that uses pg_class manipulation only on older Postgres versions where there was no better way.

From the look of things in your PL/pgSQL functions, the newer session_replication_role GUC designed for this use would work fine too. Have you used that in your functions on newer versions of Postgres?

Also, way to sneak the word "shenanigans" into your code comments as well. :)

Greg Sabino Mullane said...

Mark can correct me if I am wrong, but I think the main reason we did not decide to use session_replication_role once Postgres introduced it is because it is "all or nothing", and the implementation used relies heavily on being able to turn off triggers and rules on a specific table only. (The other possible reason we never switched to SRR is "if it ain't broke, don't fix it' :)

Mark Johnson said...

Jon,

There are two main reasons we've left the function pair operating directly against pg_class.

One is that our primary client using them has certain features within Bucardo's customcode that were not functioning properly while in session_replication_role.

The other is as Greg mentions. We use these functions within application code where the desire to suppress triggers and rules is surgical--some tables have it, and others don't, within the same transaction.

It would be fairly simple to produce non-arg versions of the functions that could leverage session_replication_role to make a transaction-wide disabled state. That would in fact be quite handy if that's what you really wanted and you were manipulating a large number of tables, rather than having to call the functions once each for the entire list of tables.

Jon Jensen said...

Regarding "surgical" disabling of triggers:

I think the current way to do that without non-MVCC-safe system table munging is with ALTER TABLE ... DISABLE TRIGGER. Unless I'm missing some fundamental difference between that an pg_class munging.

Mark Johnson said...

The main difference is the access exclusive lock required for ALTER TABLE vs. pg_class manipulation.

Greg Sabino Mullane said...

To expand, the exclusive lock was a total show-stopper (quite literally) on the production system. Messing with pg_class has disadvantages as well, but well worth the tradeoff. :)