Postgresql tricks | Lanre Adelowo

I was recently caught up in a Twitter thread where someone suggested using a materialized view for calculating some value from another table. I suggested using a TRIGGER instead as it happens in the same transaction. Someone DM'd me saying it solved a problem he has been having at work with poor performance while they are trying to update another table as they were doing it via code, fetching all data, then iterating to calcualate the difference. Asides that, it meant the table was out of sync when they get concurrent requests.

I then decided to list some of my favorite and useful postgresql features. Might find them useful. Might not :) but here is a tiny list:

Make PSQL display null in a more sensible manner other than an empty string
Where column = true is pretty redundant lmao
TRIGGER
Row level validation
Copy all rows in a table to a csv file
Forcellly kill all currently running queries

weird bug, links to sections do not work, so you might need to scroll. Currently fixing as soon as i can find time. apologies :)))

Make psql stop displaying null as an empty string

please don't do this in production lmao so as not to make people start thinking silly data is still stored.

I really hope you hate nulls as much as I do. But sometimes we just need to keep them around sadly enough. In development, I like to replace null with something else:

\pset null '🤮'
\pset null 'billion dollar mistake'
/**
Now when you run queries via the psql command line, this new value would be
shown instead of an empty string
**/

By the way, note how i used \pset instead this time. This just means I am running this via the psql console. I don't have access to psql in prod usually. Just my local machine so please don't do this in prod :))

The billion dollar mistake is a great watch by the way https://youtu.be/ybrQvs4x0Ps

Where column = true is pretty redundant

It is not uncommon for tables to have column called is_active or similar that stores a boolean. When we want to query for this, most people write SELECT FROM table WHERE is_active = true.

This can be changed to:

/*
same thing as `is_active = true`
*/
SELECT FROM table WHERE is_active

Trigger

This is probably one of my favorite features when I use Postgresql. Most times you need to derive a value and store in another table after a CRUD operation on another table.

A concrete example would be a SaaS system with credits for each user. This can be easily modelled as below:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS users(
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    email VARCHAR (100) UNIQUE NOT NULL,
    full_name VARCHAR (100) NOT NULL,
    metadata jsonb NOT NULL DEFAULT '{}'::jsonb,

    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP WITH TIME ZONE
);

CREATE TABLE IF NOT EXISTS credits(
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id uuid NOT NULL REFERENCES users(id),
    /*
    the current amount of units left for this user
    */
    amount NUMERIC NOT NULL,

    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMP WITH TIME ZONE
);

CREATE TABLE IF NOT EXISTS credit_usage_histories(
    id uuid PRIMARY KEY DEFAULT uuid_generate_v4(),
    counter SERIAL,
    reference VARCHAR(200) NOT NULL,
    description VARCHAR(200) NOT NULL DEFAULT 'Used credits',
    user_id uuid NOT NULL REFERENCES users(id),
    /**
    assuming some transactions can use more than 1 unit so we store the item here
    **/
    amount NUMERIC NOT NULL,
    metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    /**
    this will be of two types (0 and 1)
    0 would be mean debit
    1 would be credit

    This might make sense if you wanted to refund a user a part of their credit unit they have used previously.

    This can also be useful for resetting a user credit count at the start of their next billing cycle. User X has 1M units per month. You can create two records:
    - One that drains the current balance at 12:00 am on the first day of the month
    - Second that credits the monthly units based of the plan they are subscribed to.

   so it makes sense to have this here
    **/
    usage_type SMALLINT NOT NULL
);

We have created 3 tables:

users: the base level table after a user signs up
credits: this stores how much credits a user has
credit_usage_histories: we need to keep track of the depletion of the user credit or whenever we credit them extra credits.

Usually, you'd be tempted to want to add business logic in your code to calculate the balance of the user credits and update the credits table everytime a record is added to credit_usage_histories, but you can always delegate that to the DB which would be faster than your codebase anyways. This is exactly where Postgresql triggers come in. An example of how this would work can be seen below:

DROP TRIGGER IF EXISTS transaction_added on credit_usage_histories;

CREATE FUNCTION calculate_credit_balance()
RETURNS trigger AS $calculate_credit_balance$
BEGIN
    UPDATE credits
 SET amount = COALESCE((SELECT SUM(amount) FROM credit_usage_histories WHERE user_id = NEW.user_id AND usage_type = 1),0) -

         COALESCE((SELECT SUM(amount) FROM credit_usage_histories WHERE user_id = NEW.user_id AND usage_type = 0),0)

          WHERE user_id = NEW.user_id;

    RETURN NEW;
END;
$calculate_credit_balance$ LANGUAGE plpgsql;

CREATE TRIGGER transaction_added
  AFTER INSERT OR UPDATE
  ON credit_usage_histories
  FOR EACH ROW
  EXECUTE PROCEDURE calculate_credit_balance();

What we have done is created a function that does the following: total_credit_transactions - total_debit_transactions; THEN update credits table with the result.

With this, you delegate the whole work to the Db to always keep your credits table in sync with the correct balance for a user at all times.

Ideally, at the end of the user's billing cycle, you will want to drain all their remaining credits by creating a debit `credit_usage_histories transaction and add a credit for the new billing cycle

Row level validation

There is a problem with what we have done above there, the amount on the credits can run into the negative if someone decides to add a debit transaction on credit_usage_histories that exceeds the total credits the user has. A simple way to have this fixed is to delegate that to the database too again. You absolutely don't have to do in your codebase.

This can be done easily as:

ALTER TABLE credits ADD CONSTRAINT credit_amount_check_key CHECK (amount >= 0);
ALTER TABLE credit_usage_histories ADD CONSTRAINT usage_history_amount_check_key CHECK (amount >= 0);

We have used another feature of Postgresql called CONSTRAINTS. This is essentially validation that can be done by the database on your behalf.

CHECK (amount >= 0) makes sure the amount column in the table will always be more than or greater than 0. So if someone adds a debit unit that allows the balance of the user's credit to be less than 0, the INSERT or UPDATE query would fail.

Let's assume you were building yet another PoS store and you want every registered product to have at least 10 quantities at all times. You just need to tweak the logic on the db level as shown before.

it makes sense to also validate this on the Frontend or backend so more detailed errors can be shown to the user but in cases where there is an admin backdoor, or a bug in your code logic or someone manually adds records to the DB ( noooooo ), this helps to make sure your data stays consistent.

You can also validate almost anything using CHECKS. Not just numbers. A nice feature i like doing is validating user references are mapped in a certain way. Let's take Stripe as an example, all customer's ID are prefixed with cus_, Subscriptions are prefixed with sub_. So a valid customer ID would be cus_jfbiu4664hjf as an example. If we wanted to validate on the DB level that all references must match this format, we could use regular expressions in our CHECK logic:

ALTER TABLE customers ADD CONSTRAINT customers_reference_check CHECK (reference ~ 'cus_?(.*)');
/***
OR the below
**/
ALTER TABLE customers ADD CONSTRAINT customers_reference_check CHECK (reference ~ 'CUS_[a-zA-Z0-9._]+')

Quick CSV exports

For some reason, some other departments at work might need some data in a csv file to perform some of their duties or something. A quick way to do that instead of downloading plugins or another GUI app is to just the builtin COPY command. Let us take our users table from the top:

COPY users TO '/Users/lanreadelowo/Downloads/users.csv' WITH DELIMITER ',' CSV HEADER

You can even be more creative by fetching only select records or using a much more complex query. Let's say I wanted to only export users that have been created in the last 1 week as an example:

COPY (SELECT * FROM users WHERE created_at > NOW() - INTERVAL '1 week') TO '/Users/lanreadelowo/Downloads/users.csv' WITH DELIMITER ',' CSV HEADER

This also works for quick imports too. Just change TO to FROM.

Something you might want to note is the current user must have the neccessary filesystem permissions

Forcefully kill all running queries

We once had an issue at a previous place I worked where a dev decided to go query a table that had almost 50 million rows. To make things worse this table didn't have any index at all. It wasn't his fault at all. The DB dsn was hardcoded in the codebase and ignored the env variable the README had. The data eng guy just airdropped a folder with a .py and README.md.

This was a legacy project we were brought in to help with. No crazy way on earth anyone should have done this in this modern age without clustering the table and using an index on the table.

Because of the way things were set up, the data guys would usually run queries directly on prod data but only do it in periods with known minimal usage by the system/product - usually weekends. But a mid level dev on our side didn't know the data was that much and just wanted to run the scripts he was given to check the output and understand what needed to be done. As we wanted to start warehousing that data for the data guys so they could run whatever they wanted at any time.

The script was running for some long and it started affecting prod and slowing things down. The script from the data guys didn't even properly handle SIGINT , SIGTERM or even SIGHUP. None of the numerous signal types were implemented that could have closed the database connection or something.

The only way we had to bring things back to order was kill all currently running processes:

WITH pids AS (
  SELECT pid
  FROM pg_stat_activity
  WHERE username='postgres' AND (now() - pg_stat_activity.query_start) > interval '1 hour';
)

/**
We did not want to kill other processes already running in prod
only recently started sessions
*/

SELECT pg_cancel_backend(pid)
FROM pids;