Lesser Known PostgreSQL Features

In 2006 Microsoft conducted a customer survey to find what new features users want in new versions of Microsoft Office. To their surprise, more than 90% of what users asked for already existed, they just didn't know about it. To address the "discoverability" issue, they came up with the "Ribbon UI" that we know from Microsoft Office products today.

Office is not unique in this sense. Most of us are not aware of all the features in tools we use on a daily basis, especially if it's big and extensive like PostgreSQL. With PostgreSQL 14 released just a few weeks ago, what a better opportunity to shed a light on some lesser known features that already exist in PostgreSQL, but you may not know.

In this article I present lesser known features of PostgreSQL.

<small>Illustration by <a href="https://www.instagram.com/_wrightdesign/">Eleanor Wright</a></small> — Illustration by Eleanor Wright

Table of Contents

Get the Number of Updated and Inserted Rows in an Upsert
Grant Permissions on Specific Columns
Match Against Multiple Patterns
Find the Current Value of a Sequence Without Advancing It
Use \copy With Multi-line SQL
Prevent Setting the Value of an Auto Generated Key
Two More Ways to Produce a Pivot Table
Dollar Quoting
Comment on Database Objects
Keep a Separate History File Per Database
Autocomplete Reserved Words in Uppercase
Sleep for Interval
Get the First or Last Row in a Group Without Sub-Queries
Generate UUID Without Extensions
Generate Reproducible Random Data
Add Constraints Without Validating Immediately
Synonyms in PostgreSQL
Find Overlapping Ranges

Get the Number of Updated and Inserted Rows in an Upsert

INSERT ON CONFLICT, also known as "merge" (in Oracle) or "upsert" (a mashup of UPDATE and INSERT), is a very useful command, especially in ETL processes. Using the ON CONFLICT clause of an INSERT statement, you can tell the database what to do when a collision is detected in one or more key columns.

For example, here is a query to sync data in an employees table:

db=# WITH new_employees AS (
    SELECT * FROM (VALUES
        ('George', 'Sales',    'Manager',   1000),
        ('Jane',   'R&D',      'Developer', 1200)
    ) AS t(
         name,      department, role,       salary
    )
)
INSERT INTO employees (name, department, role, salary)
SELECT name, department, role, salary
FROM new_employees
ON CONFLICT (name) DO UPDATE SET
    department = EXCLUDED.department,
    role = EXCLUDED.role,
    salary = EXCLUDED.salary
RETURNING *;

  name  │ department │   role    │ salary
────────┼────────────┼───────────┼────────
 George │ Sales      │ Manager   │   1000
 Jane   │ R&D        │ Developer │   1200
INSERT 0 2

The query inserts new employee data to the table. If there is an attempt to add an employee with a name that already exists, the query will update that row instead.

RETURNING *

Check out how to implement complete processes using WITH and RETURNING.

You can see from the output of the command above, INSERT 0 2, that two employees were affected. But how many were inserted, and how many were updated? The output is not giving us any clue!

While I was looking for a way to improve the logging of some ETL process that used such query, I stumbled upon this Stack Overflow answer that suggested a pretty clever solution to this exact problem:

db=# WITH new_employees AS (
    SELECT * FROM (VALUES
        ('George', 'Sales',    'Manager',   1000),
        ('Jane',   'R&D',      'Developer', 1200)
    ) AS t(
         name,      department, role,       salary
    )
)
INSERT INTO employees (name, department, role, salary)
SELECT name, department, role, salary
FROM new_employees
ON CONFLICT (name) DO UPDATE SET
    department = EXCLUDED.department,
    role = EXCLUDED.role,
    salary = EXCLUDED.salary
RETURNING *, (xmax = 0) AS inserted;

  name  │ department │   role    │ salary │ inserted
────────┼────────────┼───────────┼────────┼──────────
 Jane   │ R&D        │ Developer │   1200 │ t
 George │ Sales      │ Manager   │   1000 │ f
INSERT 0 2

Notice the difference in the RETUNING clause. It includes the calculated field inserted that uses the special column xmax to determine how many rows were inserted. From the data returned by the command, you can spot that a new row was inserted for "Jane", but "George" was already in the table, so the row was updated.

The xmax column is a special system column:

The identity (transaction ID) of the deleting transaction, or zero for an undeleted row version.

In PostgreSQL, when a row is updated, the previous version is deleted, and xmax holds the ID of the deleting transaction. When the row is inserted, no previous row is deleted, so xmax is zero. This "trick" is cleverly using this behavior to distinguish between updated and inserted rows.

Grant Permissions on Specific Columns

Say you have a users table that contain sensitive information such as credentials, passwords or PII:

db=# CREATE TABLE users (
    id INT,
    username VARCHAR(20),
    personal_id VARCHAR(10),
    password_hash VARCHAR(256)
);
CREATE TABLE

db=# INSERT INTO users VALUES (1, 'haki', '12222227', 'super-secret-hash');
INSERT 1 0

The table is used by different people in your organization, such as analysts, to access data and produce ad-hoc reports. To allow access to analysts, you add a special user in the database:

db=# CREATE USER analyst;
CREATE USER

db=# GRANT SELECT ON users TO analyst;
GRANT

The user analyst can now access the users table:

db=# \connect db analyst
You are now connected to database "db" as user "analyst".

db=> SELECT * FROM users;
 id │ username │ personal_id │   password_hash
────┼──────────┼─────────────┼───────────────────
  1 │ haki     │ 12222227    │ super-secret-hash

As mentioned previously, analysts access users data to produce reports and conduct analysis, but they should not have access to sensitive information or PII.

To provide granular control over which data a user can access in a table, PostgreSQL allows you to grant permissions only on specific columns of a table:

db=# \connect db postgres
You are now connected to database "db" as user "postgres".

db=# REVOKE SELECT ON users FROM analyst;
REVOKE

db=# GRANT SELECT (id, username) ON users TO analyst;
GRANT

After revoking the existing select permission on the table, you granted analyst select permission only on the id and username columns. Now, analyst can no longer access these columns:

db=# \connect db analyst
You are now connected to database "db" as user "analyst".

db=> SELECT * FROM users;
ERROR:  permission denied for table users

db=> SELECT id, username, personal_id FROM users;
ERROR:  permission denied for table users

db=> SELECT id, username FROM users;
 id │ username
────┼──────────
  1 │ haki

Notice that when the user analyst attempts to access any of the restricted columns, either explicitly or implicitly using *, they get a "permission denied" error.

Match Against Multiple Patterns

It's not uncommon to use pattern matching in SQL. For example, here is a query to find users with a "gmail.com" email account:

SELECT *
FROM users
WHERE email LIKE '%@gmail.com';

This query uses the wildcard '%' to find users with emails that end with "@gmail.com". What if, for example, in the same query you also want to find users with a "yahoo.com" email account?

SELECT *
FROM users
WHERE
    email LIKE '%@gmail.com'
    OR email LIKE '%@yahoo.com'

To match against either one of these patterns, you can construct an OR condition. In PostgreSQL however, there is another way to match against multiple patterns:

SELECT *
FROM users
WHERE email SIMILAR TO '%@gmail.com|%@yahoo.com'

Using SIMILAR TO you can match against multiple patterns and keep the query simple.

Another way to match against multiple patterns is using regexp:

SELECT *
FROM users
WHERE email ~ '@gmail\.com$|@yahoo\.com$'

When using regexp you need to take be a bit more cautious. A period "." will match anything, so to match the period "." in gmail.com or yahoo.com, you need to add the escape character "\.".

When I posted this on twitter I got some interesting responses. One comment from the official account of psycopg, a PostgreSQL driver for Python, suggested another way:

SELECT *
FROM users
WHERE email ~ ANY(ARRAY['@gmail\.com$', '@yahoo\.com$'])

This query uses the ANY operator to match against an array of patterns. If an email matches any of the patterns, the condition will be true. This approach is easier to work with from a host language such as Python:

with connection.cursor() as cursor:
    cursor.execute('''
        SELECT *
        FROM users
        WHERE email ~ ANY(ARRAY%(patterns)s)
    ''' % {
        'patterns': [
            '@gmail\.com$',
            '@yahoo\.com$',
        ],
    })

Unlike the previous approach that used SIMILAR TO, using ANY you can bind a list of patterns to the variable.

Find the Current Value of a Sequence Without Advancing It

If you ever needed to find the current value of a sequence, your first attempt was most likely using currval:

db=# SELECT currval('sale_id_seq');
ERROR:  currval of sequence "sale_id_seq" is not yet defined in this session

Just like me, you probably found that currval only works if the sequence was defined or used in the current session. Advancing a sequence for no good reason is usually not something you want to do, so this is not an acceptable solution.

In PostgreSQL 10 the view pg_sequences was added to provide easy access to information about sequences:

db=# SELECT * FROM pg_sequences WHERE sequencename = 'sale_id_seq';
─[ RECORD 1 ]─┬────────────
schemaname    │ public
sequencename  │ sale_id_seq
sequenceowner │ db
data_type     │ integer
start_value   │ 1
min_value     │ 1
max_value     │ 2147483647
increment_by  │ 1
cycle         │ f
cache_size    │ 1
last_value    │ 155

This table can answer your question, but it's not really a "lesser known feature", it's just another table in the information schema.

Another way to get the current value of a sequence is using the undocumented function pg_sequence_last_value:

db=# SELECT pg_sequence_last_value('sale_id_seq');
 pg_sequence_last_value
────────────────────────
                   155

It's not clear why this function is not documented, but I couldn't find any mention of it in the official documentation. Take that under consideration if you decide to use it.

Another interesting thing I found while I was researching this, is that you can query a sequence, just like you would a table:

db=# SELECT * FROM sale_id_seq;

 last_value │ log_cnt │ is_called
────────────┼─────────┼───────────
        155 │      10 │ t

This really makes you wonder what other types of objects you can query in PostgreSQL, and what you'll get in return.

It's important to note that this feature should not be used for anything except getting a cursory look at a sequence. You should not try to update ID's based on values from this output, for that you should use nextval.

Use `\copy` With Multi-line SQL

If you work with psql a lot you probably use \COPY very often to export data from the database. I know I do. One of the most annoying things about \COPY is that it does not allow multi-line queries:

db=# \COPY (
\copy: parse error at end of line

When you try to add a new line to a \copy command you get this error message.

To overcome this restriction, my first idea was to use a view:

db=# CREATE VIEW v_department_dbas AS
    SELECT department, count(*) AS employees
    FROM emp
    WHERE role = 'dba'
    GROUP BY department
    ORDER BY employees;
CREATE VIEW

db=# \COPY (SELECT * FROM v_department_dbas) TO department_dbas.csv WITH CSV HEADER;
COPY 5

db=# DROP VIEW v_department_dbas;
DROP VIEW;

This works, but if something fails in the middle it can leave views laying around. I like to keep my schema tidy, so I looked for a way to automatically cleanup after me. A quick search brought up temporary views:

db=# CREATE TEMPORARY VIEW v_department_dbas AS # ...
CREATE VIEW

db=# \COPY (SELECT * FROM v_department_dbas) TO department_dbas.csv WITH CSV HEADER;
COPY 5

Using temporary views I no longer had to cleanup after myself, because temporary views are automatically dropped when the session terminates.

I used temporary views for a while, until I struck this little gem in the psql documentation:

db=# COPY (
    SELECT department, count(*) AS employees
    FROM emp
    WHERE role = 'dba'
    GROUP BY department
    ORDER BY employees
) TO STDOUT WITH CSV HEADER \g department_dbas.csv
COPY 5

Nice, right? Let's break it down:

Use COPY instead of \COPY: the COPY command is a server command executed in the server, and \COPY is a psql command with the same interface. So while \COPY does not support multi-line queries, COPY does!
Write results to STDOUT: Using COPY we can write results to a directory on the server, or write results to the standard output, using TO STDOUT.
Use \g to write STDOUT to local file: Finally, psql provides a command to write the output from standard output to a file.

Combining these three features did exactly what I wanted.

Copy expert

If you move a lot of data around, don't miss the fastest way to load data into PostgreSQL using Python.

Prevent Setting the Value of an Auto Generated Key

If you are using auto generated primary keys in PostgreSQL, it's possible you are still using the SERIAL datatype:

CREATE TABLE sale (
    id SERIAL PRIMARY KEY,
    sold_at TIMESTAMPTZ,
    amount INT
);

Behind the scenes, PostgreSQL creates a sequence to use when rows are added:

db=# INSERT INTO sale (sold_at, amount) VALUES (now(), 1000);
INSERT 0 1

db=# SELECT * FROM sale;
 id │           sold_at             │ amount
────┼───────────────────────────────┼────────
  1 │ 2021-09-25 10:06:56.646298+03 │   1000

The SERIAL data type is unique to PostgreSQL and has some known problems, so starting at version 10, the SERIAL datatype was softly deprecated in favor of identity columns:

CREATE TABLE sale (
    id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    sold_at TIMESTAMPTZ,
    amount INT
);

Identity columns work very similar to the SERIAL data type:

db=# INSERT INTO sale (sold_at, amount) VALUES (now(), 1000);
INSERT 0 1

db=# SELECT * FROM sale;
 id │           sold_at             │ amount
────┼───────────────────────────────┼────────
  1 │ 2021-09-25 10:11:57.771121+03 │   1000

But, consider this scenario:

db=# INSERT INTO sale (id, sold_at, amount) VALUES (2, now(), 1000);
INSERT 0 1

db=# INSERT INTO sale (sold_at, amount) VALUES (now(), 1000);
ERROR:  duplicate key value violates unique constraint "sale_pkey"
DETAIL:  Key (id)=(2) already exists.

Why did it fail?

The first INSERT command explicitly provides the value 2 of the id column, so the sequence was not used.
The second INSERT command does not provide a value for id, so the sequence is used. The next value of the sequence happened to be 2, so the command failed with a unique constraint violation.

Auto-incrementing IDs rarely need to be set manually, and doing so can cause a mess. So how can you prevent users from setting them?

CREATE TABLE sale (
    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    sold_at TIMESTAMPTZ,
    amount INT
);

Instead of using GENERATED BY DEFAULT, use GENERATED ALWAYS. To understand the difference, try the same scenario again:

db=# INSERT INTO sale (sold_at, amount) VALUES (now(), 1000);
INSERT 0 1

db=# INSERT INTO sale (id, sold_at, amount) VALUES (2, now(), 1000);
ERROR:  cannot insert into column "id"
DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.
HINT:  Use OVERRIDING SYSTEM VALUE to override.

What changed?

The first INSERT does not provide a value for id and completes successfully.
The second INSERT command however, attempts to set the value 2 for id and fails!

In the error message, PostgreSQL is kind enough to offer a solution for when you actually do want to set the value for an identity column explicitly:

db=# INSERT INTO sale (id, sold_at, amount)
OVERRIDING SYSTEM VALUE VALUES (2, now(), 1000);

INSERT 0 1

By adding the OVERRIDING SYSTEM VALUE to the INSERT command you explicitly instruct PostgreSQL to allow you to set the value of an identity column. You still have to handle a possible unique constraint violation, but you can no longer blame PostgreSQL for it!

Two More Ways to Produce a Pivot Table

In one of my previous articles I demonstrated how to produce pivot tables using conditional aggregates. After writing the article, I found two more ways to generate pivot tables in PostgreSQL.

Say you want to get the number of employees, at each role, in each department:

db=# WITH employees AS (
    SELECT * FROM (VALUES
        ('Haki',    'R&D',      'Manager'),
        ('Dan',     'R&D',      'Developer'),
        ('Jax',     'R&D',      'Developer'),
        ('George',  'Sales',    'Manager'),
        ('Bill',    'Sales',    'Developer'),
        ('David',   'Sales',    'Developer')
    ) AS t(
        name,       department,  role
    )
)
SELECT role, department, count(*)
FROM employees
GROUP BY role, department;

   role    │ department │ count
───────────┼────────────┼───────
 Developer │ Sales      │     2
 Manager   │ Sales      │     1
 Manager   │ R&D        │     1
 Developer │ R&D        │     2

A better way of viewing this would be as a pivot table. In psql you can use the \crosstabview command to transform the results of the last query to a pivot table:

db=# \crosstabview

   role    │ Sales │ R&D
───────────┼───────┼─────
 Developer │     2 │   2
 Manager   │     1 │   1

Magic!

By default, the command will produce the pivot table from the first two columns, but you can control that with arguments:

db=# \crosstabview department role

 department │ Developer │ Manager
────────────┼───────────┼─────────
 Sales      │         2 │       1
 R&D        │         2 │       1

Another, slightly less magical way to produce a pivot table is using the built-in tablefunc extension:

db=# CREATE EXTENSION tablefunc;
CREATE EXTENSION

db=# SELECT * FROM crosstab('
    SELECT role, department, count(*) AS employees
    FROM employees
    GROUP BY 1, 2
    ORDER BY role
', '
    SELECT DISTINCT department
    FROM employees
    ORDER BY 1
') AS t(role text, sales int, rnd int);

   role    │ sales │ rnd
───────────┼───────┼─────
 Developer │     2 │   2
 Manager   │     1 │   1

Using the function crosstab you can produce a pivot table. The downside of this method is that you need to define the output columns in advance. The advantage however, is that the crosstab function produces a table, which you can use as a sub-query for further processing.

Dollar Quoting

If you store text fields in your database, especially entire paragraphs, you are probably familiar with escape characters. For example, to include a single quote ' in a text literal you need to escape it using another single quote '':

db=# SELECT 'John''s Pizza';
   ?column?
──────────────
 John's Pizza

When text starts to get bigger, and include characters like backslashes and new lines, it can get pretty annoying to add escape characters. To address this, PostgreSQL provides another way to write string constants:

db=# SELECT $$a long
string with new lines
and 'single quotes'
and "double quotes

PostgreSQL doesn't mind ;)$$ AS text;
           text
───────────────────────────
 a long                   ↵
 string with new lines    ↵
 and 'single quotes'      ↵
 and "double quotes       ↵
                          ↵
 PostgreSQL doesn't mind ;)

Notice the dollar signs $$ at the beginning and end of the string. Anything in between $$ is treated as a string. PostgreSQL calls this "Dollar Quoting".

But there is more, if you happen to need to use the sign $$ in the text, you can add a tag, which makes this even more useful. For example:

db=# SELECT $JSON${
    "name": "John's Pizza",
    "tagline": "Best value for your $$"
}$JSON$ AS json;

                  json
─────────────────────────────────────────
 {                                      ↵
     "name": "John's Pizza",            ↵
     "tagline": "Best value for your $$"↵
 }

Notice that we choose to tag this block with $JSON$ , so the sign "$$" was included as a whole in the output.

You can also use this to quickly generate jsonb objects that include special characters:

db=# SELECT $JSON${
    "name": "John's Pizza",
    "tagline": "Best value for your $$"
}$JSON$::jsonb AS json;
                          json
─────────────────────────────────────────────────────────────
 {"name": "John's Pizza", "tagline": "Best value for your $$"}

The value is now a jsonb object which you can manipulate as you wish!

Comment on Database Objects

PostgreSQL has this nice little feature where you can add a comments on just about every database object. For example, adding a comment on a table:

db=# COMMENT ON TABLE sale IS 'Sales made in the system';
COMMENT

You can now view this comment in psql (and probably other IDEs):

db=# \dt+ sale
                                  List of relations
 Schema │ Name │ Type  │ Owner │ Persistence │    Size    │       Description
────────┼──────┼───────┼───────┼─────────────┼────────────┼──────────────────────────
 public │ sale │ table │ haki  │ permanent   │ 8192 bytes │ Sales made in the system

You can also add comments on table columns, and view them when using extended describe:

db=# COMMENT ON COLUMN sale.sold_at IS 'When was the sale finalized';
COMMENT

db=# \d+ sale
  Column  │           Type           │         Description
──────────┼──────────────────────────┼─────────────────────────────
 id       │ integer                  │
 sold_at │ timestamp with time zone │ When was the sale finalized
 amount   │ integer                  │

You can also combine the COMMENT command with dollar quoting to include longer and more meaningful descriptions of, for example, functions:

COMMENT ON FUNCTION generate_random_string IS $docstring$
Generate a random string at a given length from a list of possible characters.

Parameters:

    - length (int): length of the output string
    - characters (text): possible characters to choose from

Example:

    db=# SELECT generate_random_string(10);
     generate_random_string
    ────────────────────────
     o0QsrMYRvp

    db=# SELECT generate_random_string(3, 'AB');
     generate_random_string
    ────────────────────────
     ABB
$docstring$;

This is a function I used in the past to demonstrate the performance impact of medium sized texts on performance. Now I no longer have to go back to the article to remember how to use the function, I have the docstring right there in the comments:

db=# \df+ generate_random_string
List of functions
────────────┬────────────────────────────────────────────────────────────────────────────────
Schema      │ public
Name        │ generate_random_string
/* ... */
Description │ Generate a random string at a given length from a list of possible characters.↵
            │                                                                               ↵
            │ Parameters:                                                                   ↵
            │                                                                               ↵
            │     - length (int): length of the output string                               ↵
            │     - characters (text): possible characters to choose from                   ↵
            │                                                                               ↵
            │ Example:                                                                      ↵
            │                                                                               ↵
            │     db=# SELECT generate_random_string(10);                                   ↵
            │      generate_random_string                                                   ↵
            │     ────────────────────────                                                  ↵
            │      o0QsrMYRvp                                                               ↵
            │                                                                               ↵
            │     db=# SELECT generate_random_string(3, 'AB');                              ↵
            │      generate_random_string                                                   ↵
            │     ────────────────────────                                                  ↵
            │      ABB                                                                      ↵
            │

Keep a Separate History File Per Database

If you are working with CLI tools you probably use the ability to search past commands very often. In bash and psql, a reverse search is usually available by hitting CTRL + R.

If in addition to working with the terminal, you also work with multiple databases, you might find it useful to keep a separate history file per database:

db=# \set HISTFILE ~/.psql_history- :DBNAME

This way, you are more likely to find a relevant match for the database you are currently connected to. You can drop this in your ~/.psqlrc file to make it persistent.

Autocomplete Reserved Words in Uppercase

There is always a lot of debate (and jokes!) on whether keywords in SQL should be in lower or upper case. I think my opinion on this subject is pretty clear.

If like me, you like using uppercase keywords in SQL, there is an option in psql to autocomplete keywords in uppercase:

db=# selec <tab>
db=# select

db=# \set COMP_KEYWORD_CASE upper
db=# selec <tab>
db=# SELECT

After setting COMP_KEYWORD_CASE to upper, when you hit TAB for autocomplete, keywords will be autocompleted in uppercase.

Sleep for Interval

Delaying the execution of a program can be pretty useful for things like testing or throttling. To delay the execution of a program in PostgreSQL, the go-to function is usually pg_sleep:

db=# \timing
Timing is on.

db=# SELECT pg_sleep(3);
 pg_sleep
──────────

(1 row)

Time: 3014.913 ms (00:03.015)

The function sleeps for the given number of seconds. However, when you need to sleep for longer than just a few seconds, calculating the number of seconds can be annoying, for example:

db=# SELECT pg_sleep(255);

How long will this function sleep for? Don't take out the calculator, the function will sleep for 4 minutes and 15 seconds.

To make it more convenient to sleep for longer periods of time, PostgreSQL offers another function:

db=# SELECT pg_sleep_for('4 minutes 15 seconds');

Unlike its sibling pg_sleep, the function pg_sleep_for accepts an interval, which is much more natural to read and understand than the number of seconds.

Get the First or Last Row in a Group Without Sub-Queries

When I initially compiled this list I did not think about this feature as a lesser known one, mostly because I use it all the time. But to my surprise, I keep running into weird solutions to this problem, that can be easily solved with what I'm about to show you, so I figured it deserves a place on the list!

Say you have the this table of students:

db=# SELECT * FROM students;

  name  │ class │ height
────────┼───────┼────────
 Haki   │ A     │    186
 Dan    │ A     │    175
 Jax    │ A     │    182
 George │ B     │    178
 Bill   │ B     │    167
 David  │ B     │    178

⚙ Table data

You can use the following CTE to reproduce queries in this section

WITH students AS (
    SELECT * FROM (VALUES
        ('Haki',    'A',    186),
        ('Dan',     'A',    175),
        ('Jax',     'A',    182),
        ('George',  'B',    178),
        ('Bill',    'B',    167),
        ('David',   'B',    178)
    ) AS t(
        name,       class,  height
    )
)
SELECT * FROM students;

How would you get the entire row of the tallest student in each class?

On first thought you might try something like this:

SELECT class, max(height) as tallest
FROM students
GROUP BY class;

 class │ tallest
───────┼─────────
 A     │     186
 B     │     178

This gets you the height, but it doesn't get you the name of the student. As a second attempt you might try to find the tallest student based on its height, using a sub-query:

SELECT *
FROM students
WHERE (class, height) IN (
    SELECT class, max(height) as tallest
    FROM students
    GROUP BY class
);

  name  │ class │ height
────────┼───────┼────────
 Haki   │ A     │    186
 George │ B     │    178
 David  │ B     │    178

Now you have all the information about the tallest students in each class, but there is another problem.

side note

The ability to match a set of records like in the previous query ((class, height) IN (...)), is another lesser known, but a very powerful feature of PostgreSQL.

In class "B", there are two students with the same height, which also happen to be the tallest. Using the aggregate function MAX you only get the height, so you may encounter this type of situation.

The challenge with using MAX is that you choose the height based only on the height, which makes perfect sense in this case, but you still need to pick just one student. A different approach that lets you "rank" rows based on more than one column, is using a window function:

SELECT
    students.*,
    ROW_NUMBER() OVER (
        PARTITION BY class
        ORDER BY height DESC, name
    ) AS rn
FROM
    students;

  name  │ class │ height │ rn
────────┼───────┼────────┼────
 Haki   │ A     │    186 │  1
 Jax    │ A     │    182 │  2
 Dan    │ A     │    175 │  3
 David  │ B     │    178 │  1
 George │ B     │    178 │  2
 Bill   │ B     │    167 │  3

To "rank" students bases on their height you can attach a row number for each row. The row number is determined for each class (PARTITION BY class) and ranked first by height in descending order, and then by the students' name (ORDER BY height DESC, name). Adding the student name in addition to the height makes the results deterministic (assuming the name is unique).

To get the rows of only the tallest student in each class you can use a sub-query:

SELECT
    name, class, height
FROM (
    SELECT
        students.*,
        ROW_NUMBER() OVER (
            PARTITION BY class
            ORDER BY height DESC, name
        ) AS rn
    FROM
        students
) as inner
WHERE
    rn = 1;

 name  │ class │ height
───────┼───────┼────────
 Haki  │ A     │    186
 David │ B     │    178

You made it! This is the entire row for the tallest student in each class.

Using DISTINCT ON

Now that you went through all of this trouble, let me show you an easier way:

SELECT DISTINCT ON (class)
    *
FROM
    students
ORDER BY
    class, height DESC, name;

 name  │ class │ height
───────┼───────┼────────
 Haki  │ A     │    186
 David │ B     │    178

Pretty nice, right? I was blown away when I first discovered DISTINCT ON. Coming from Oracle, there was nothing like that, and as far as I know, no other database other than PostgreSQL does.

Intuitively understand DISTINCT ON

To understand how DISTINCT ON works, let's go over what it does step by step. This is the raw data in the table:

SELECT *
FROM students;

  name  │ class │ height
────────┼───────┼────────
 Haki   │ A     │    186
 Dan    │ A     │    175
 Jax    │ A     │    182
 George │ B     │    178
 Bill   │ B     │    167
 David  │ B     │    178

Next, sort the data:

SELECT *
FROM students
ORDER BY class, height DESC, name;

  name  │ class │ height
────────┼───────┼────────
 Haki   │ A     │    186
 Jax    │ A     │    182
 Dan    │ A     │    175
 David  │ B     │    178
 George │ B     │    178
 Bill   │ B     │    167

Then, add the DISTINCT ON clause:

SELECT DISTINCT ON (class) *
FROM students
ORDER BY class, height DESC, name;

To understand what DISTINCT ON does at this point, we need to take two steps.

First, split the data to groups based on the columns in the DISTINCT ON clause, in this case by class:

  name  │ class │ height
─────────────────────────
 Haki   │ A     │    186  ┓
 Jax    │ A     │    182  ┣━━ class=A
 Dan    │ A     │    175  ┛

 David  │ B     │    178  ┓
 George │ B     │    178  ┣━━ class=B
 Bill   │ B     │    167  ┛

Next, keep only the first row in each group:

  name  │ class │ height
─────────────────────────
 Haki   │ A     │    186  ┣━━ class=A
 David  │ B     │    178  ┣━━ class=B

And there you have it! The tallest student in each class.

The only requirement DISTINCT ON has, is that the leading columns in the ORDER BY clause will match the columns in the DISTINCT ON clause. The remaining columns in the ORDER BY clause are used to determine which row is selected for each group.

To illustrate how the ORDER BY affect the results, consider this query to find the shortest student in each class:

SELECT DISTINCT ON (class)
    *
FROM
    students
ORDER BY
    class, height, name;

 name │ class │ height
──────┼───────┼────────
 Dan  │ A     │    175
 Bill │ B     │    167

To pick the shortest student in each class, you only have to change the sort order, so that the first row of each group is the shortest student.

Generate UUID Without Extensions

To generate UUIDs in PostgreSQL prior to version 13 you probably used the uuid-ossp extension:

db=# CREATE EXTENSION "uuid-ossp";
CREATE EXTENSION

db=# SELECT uuid_generate_v4() AS uuid;
                 uuid
──────────────────────────────────────
 8e55146d-0ce5-40ab-a346-5dbd466ff5f2

Starting at version 13 there is a built-in function to generate random (version 4) UUIDs:

db=# SELECT gen_random_uuid() AS uuid;
                 uuid
──────────────────────────────────────
 ba1ac0f5-5d4d-4d80-974d-521dbdcca2b2

The uuid-ossp extension is still needed if you want to generate UUIDs other than version 4.

Generate Reproducible Random Data

Generating radom data is very useful for many things such for demonstrations or testing. In both cases, it's also useful to be able to reproduce the "random" data.

Using PostgreSQL random function you can produce different types of random data. For example:

db=# SELECT
    random() AS random_float,
    ceil(random() * 10) AS random_int_0_10,
    '2022-01-01'::date + interval '1 days' * ceil(random() * 365) AS random_day_in_2022;

─[ RECORD 1 ]──────┬────────────────────
random_float       │ 0.6031888056092001
random_int_0_10    │ 3
random_day_in_2022 │ 2022-11-10 00:00:00

If you execute this query again, you will get different results:

db=# SELECT
    random() AS random_float,
    ceil(random() * 10) AS random_int_0_10,
    '2022-01-01'::date + interval '1 days' * ceil(random() * 365) AS random_day_in_2022;

─[ RECORD 1 ]──────┬────────────────────
random_float       │ 0.7363406030115378
random_int_0_10    │ 2
random_day_in_2022 │ 2022-02-23 00:00:00

To generate reproducible random data, you can use setseed:

db=# SELECT setseed(0.4050);
 setseed
─────────

(1 row)

db=# SELECT
    random() AS random_float,
    ceil(random() * 10) AS random_int_0_10,
    '2022-01-01'::date + interval '1 days' * ceil(random() * 365) AS random_day_in_2022
FROM
    generate_series(1, 2);

    random_float    │ random_int_0_10 │ random_day_in_2022
────────────────────┼─────────────────┼─────────────────────
 0.1924247516794324 │               9 │ 2022-12-17 00:00:00
 0.9720620908236377 │               5 │ 2022-06-13 00:00:00

If you execute the same block again in a new session, even in a different database, it will produce the exact same results:

otherdb=# SELECT setseed(0.4050);
 setseed
─────────

(1 row)

otherdb=# SELECT
    random() AS random_float,
    ceil(random() * 10) AS random_int_0_10,
    '2022-01-01'::date + interval '1 days' * ceil(random() * 365) AS random_day_in_2022
FROM
    generate_series(1, 2);

    random_float    │ random_int_0_10 │ random_day_in_2022
────────────────────┼─────────────────┼─────────────────────
 0.1924247516794324 │               9 │ 2022-12-17 00:00:00
 0.9720620908236377 │               5 │ 2022-06-13 00:00:00

Notice how the results are random, but still exactly the same. The next time you do a demonstration or share a script, make sure to include setseed so your results could be easily reproduced.

Add Constraints Without Validating Immediately

Constraint are an integral part of any RDBMS. They keep data clean and reliable, and should be used whenever possible. In living breathing systems, you often need to add new constraints, and adding certain types of constraints may require very restrictive locks that interfere with the operation of the live system.

To illustrate, add a simple check constraint on a large table:

db=# ALTER TABLE orders ADD CONSTRAINT check_price_gt_zero CHECK (price >= 0);
ALTER TABLE
Time: 10745.662 ms (00:10.746)

This statement adds a check constraint on the price of an order, to make sure it's greater than or equal to zero. In the process of adding the constraint, the database scanned the entire table to make sure the constraint is valid for all the existing rows. The process took ~10s, and during that time, the table was locked.

In PostgreSQL, you can split the process of adding a constraint into two steps.

First, add the constraint and only validate new data, but don't check that existing data is valid:

db=# ALTER TABLE orders ADD CONSTRAINT check_price_gt_zero CHECK (price >= 0) NOT VALID;
ALTER TABLE
Time: 13.590 ms

The NOT VALID in the end tells PostgreSQL to not validate the new constraint for existing rows. This means the database does not have to scan the entire table. Notice how this statement took significantly less time compared to the previous, it was almost instantaneous.

Next, validate the constraint for the existing data with a much more permissive lock that allows other operations on the table:

db=# ALTER TABLE orders VALIDATE CONSTRAINT check_price_gt_zero;
ALTER TABLE
Time: 11231.189 ms (00:11.231)

Notice how validating the constraint took roughly the same time as the first example, which added and validated the constraint. This reaffirms that when adding a constraint to an existing table, most time is spent validating existing rows. Splitting the process into two steps allows you to reduce the time the table is locked.

The documentation also mentions another use case for NOT VALID - enforcing a constraint only on future updates, even if there are some existing bad values. That is, you would add NOT VALID and never do the VALIDATE.

Check out this great article from the engineering team at Paypal about making schema changes without downtime, and my own tip to disable constraints and indexes during bulk loads.

Synonyms in PostgreSQL

Synonyms are a way to reference objects by another name, similar to symlinks in Linux. If you're coming from Oracle you are probably familiar with synonyms, but otherwise you may have never heard about it. PostgreSQL does not have a feature called "synonyms", but it doesn't mean it's not possible.

To have a name reference a different database object, you first need to understand how PostgreSQL resolves unqualified names. For example, if you are connected to the database with the user haki, and you reference a table foo, PostgreSQL will search for the following objects, in this order:

haki.foo
public.foo

This order is determined by the search_path parameter:

db=# SHOW search_path;
   search_path
─────────────────
 "$user", public

The first value, "$user" is a special value that resolves to the name of the currently connected user. The second value, public, is the name of the default schema.

To demonstrate some of the things you can do with search path, create a table foo in database db:

db=# CREATE TABLE foo (value TEXT);
CREATE TABLE

db=# INSERT INTO foo VALUES ('A');
INSERT 0 1

db=# SELECT * FROM foo;
 value
───────
 A
(1 row)

If for some reason you want the user haki to view a different object when they reference the name foo, you have two options:

1. Create an object named foo in a schema called haki:

db=# CREATE SCHEMA haki;
CREATE SCHEMA

db=# CREATE TABLE haki.foo (value text);
CREATE TABLE

db=# INSERT INTO haki.foo VALUES ('B');
INSERT 0 1

db=# \conninfo
You are connected to database "db" as user "haki"

db=# SELECT * FROM foo;
value
───────
B

Notice how when the user haki referenced the name foo, PostgreSQL resolved the name to haki.foo and not public.foo. This is because the schema haki comes before public in the search path.

2. Update the search path:

db=# CREATE SCHEMA synonyms;
CREATE SCHEMA

db=# CREATE TABLE synonyms.foo (value text);
CREATE TABLE

db=# INSERT INTO synonyms.foo VALUES ('C');
INSERT 0 1

db=# SHOW search_path;
   search_path
─────────────────
 "$user", public

db=# SELECT * FROM foo;
 value
───────
 A

db=# SET search_path TO synonyms, "$user", public;
SET

db=# SELECT * FROM foo;
 value
───────
 C

Notice how after changing the search path to include the schema synonyms, PostgreSQL resolved the name foo to synonyms.foo.

When synonyms are useful?

I used to think that synonyms are a code smell that should be avoided, but over time I found a few valid use cases for when they are useful. One of those use cases are zero downtime migrations.

When you are making changes to a table on a live system, you often need to support both the new and the old version of the application at the same time. This poses a challenge, because each version of the application expects the table to have a different structure.

Take for example a migration to remove a column from a table. While the migration is running, the old version of the application is active, and it expects the column to exist in the table, so you can't simply remove it. One way to deal with this is to release the new version in two stages - the first ignores the field, and the second removes it.

If however, you need to make the change in a single release, you can provide the old version with a view of the table that includes the column, and only then remove it. For that, you can use a "synonym":

db=# \conninfo
You are now connected to database "db" as user "app".

db=# SELECT * FROM users;
 username │ active
──────────┼────────
 haki     │ t

The application is connected to database db with the user app. You want to remove the column active, but the application is using this column. To safely apply the migration you need to "fool" the user app into thinking the column is still there while the old version is active:

db=# \conninfo
You are now connected to database "db" as user "admin".

db=# CREATE SCHEMA app;
CREATE SCHEMA

db=# GRANT USAGE ON SCHEMA app TO app;
GRANT

db=# CREATE VIEW app.users AS SELECT username, true AS active FROM public.users;
CREATE VIEW

db=# GRANT SELECT ON app.users TO app;
GRANT

To "fool" the user app, you created a schema by the name of the user, and a view with a calculated field active. Now, when the application is connected with user app, it will see the view and not the table, so it's safe to remove the column:

db=# \conninfo
You are now connected to database "db" as user "admin".

db=# ALTER TABLE users DROP COLUMN active;
ALTER TABLE

db=# \connect db app
You are now connected to database "db" as user "app".

db=# SELECT * FROM users;
 username │ active
──────────┼────────
 haki     │ t

You dropped the column and the application sees the calculated field instead! All is left is some cleanup and you are done.

Find Overlapping Ranges

Say you have a table of meetings:

db=# SELECT * FROM meetings;
       starts_at     │        ends_at
─────────────────────┼─────────────────────
 2021-10-01 10:00:00 │ 2021-10-01 10:30:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00
 2021-10-01 12:30:00 │ 2021-10-01 12:45:00

⚙ Table data

You can use the following CTE to reproduce the queries in this section:

WITH meetings AS (
    SELECT
        starts_at::timestamptz AS starts_at,
        ends_at::timestamptz AS ends_at
    FROM (VALUES
        ('2021-10-01 10:00 UTC', '2021-10-01 10:30 UTC'),
        ('2021-10-01 11:15 UTC', '2021-10-01 12:00 UTC'),
        ('2021-10-01 12:30 UTC', '2021-10-01 12:45 UTC')
    ) AS t(
        starts_at,               ends_at)
)
SELECT * FROM meetings;

You want to schedule a new meeting, but before you do that, you want to make sure it does not overlap with another meeting. There are several scenarios you need to consider:

[A] New meeting ends after an existing meeting starts

|-------NEW MEETING--------|
    |*******EXISTING MEETING*******|

[B] New meeting starts before an existing meetings ends

        |-------NEW MEETING--------|
|*******EXISTING MEETING*******|

[C] New meeting takes place during an existing meeting

    |----NEW MEETING----|
|*******EXISTING MEETING*******|

[D] Existing meeting takes place while the new meeting is scheduled

|--------NEW MEETING--------|
    |**EXISTING MEETING**|

[E] New meeting is scheduled at exactly the same time as an existing meeting

|--------NEW MEETING--------|
|*****EXISTING MEETING******|

To test a query that check for overlaps, you can prepare a table with all the scenarios above, and try a simple condition:

WITH new_meetings AS (
    SELECT
        id,
        starts_at::timestamptz as starts_at,
        ends_at::timestamptz as ends_at
    FROM (VALUES
        ('A', '2021-10-01 11:10 UTC', '2021-10-01 11:55 UTC'),
        ('B', '2021-10-01 11:20 UTC', '2021-10-01 12:05 UTC'),
        ('C', '2021-10-01 11:20 UTC', '2021-10-01 11:55 UTC'),
        ('D', '2021-10-01 11:10 UTC', '2021-10-01 12:05 UTC'),
        ('E', '2021-10-01 11:15 UTC', '2021-10-01 12:00 UTC')
    ) as t(
        id,   starts_at,               ends_at
    )
)
SELECT
    *
FROM
    meetings, new_meetings
WHERE
    new_meetings.starts_at BETWEEN meetings.starts_at and meetings.ends_at
    OR new_meetings.ends_at BETWEEN meetings.starts_at and meetings.ends_at;

       starts_at     │        ends_at      │ id │       starts_at     │        ends_at
─────────────────────┼─────────────────────┼────┼─────────────────────┼────────────────────
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

The first attempt found an overlap with 4 out of 5 scenarios. It did not detect the overlap for scenario D, where the new meetings starts before and ends after an existing meeting. To handle this scenario as well, you need to make the condition a bit longer:

WITH new_meetings AS (/* ... */)
SELECT
    *
FROM
    meetings, new_meetings
WHERE
    new_meetings.starts_at BETWEEN meetings.starts_at and meetings.ends_at
    OR new_meetings.ends_at BETWEEN meetings.starts_at and meetings.ends_at
    OR meetings.starts_at BETWEEN new_meetings.starts_at and new_meetings.ends_at
    OR meetings.ends_at BETWEEN new_meetings.starts_at and new_meetings.ends_at;


       starts_at     │        ends_at      │ id │       starts_at     │        ends_at
─────────────────────┼─────────────────────┼────┼─────────────────────┼────────────────────
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

The query now detects an overlap in all 5 scenarios, but, consider these additional scenarios:

[F] New meeting is scheduled immediately after an existing meetings

                            |--------NEW MEETING--------|
|*****EXISTING MEETING******|

[G] New meeting is scheduled to end immediately when an existing meeting starts

|--------NEW MEETING--------|
                            |*****EXISTING MEETING******|

Back-to-back meetings are very common, and they should not be detected as an overlap. Adding the two scenarios to the test, and trying the query:

WITH new_meetings AS (
    SELECT
        id,
        starts_at::timestamptz as starts_at,
        ends_at::timestamptz as ends_at
    FROM (VALUES
        ('A', '2021-10-01 11:10 UTC', '2021-10-01 11:55 UTC'),
        ('B', '2021-10-01 11:20 UTC', '2021-10-01 12:05 UTC'),
        ('C', '2021-10-01 11:20 UTC', '2021-10-01 11:55 UTC'),
        ('D', '2021-10-01 11:10 UTC', '2021-10-01 12:05 UTC'),
        ('E', '2021-10-01 11:15 UTC', '2021-10-01 12:00 UTC'),
        ('F', '2021-10-01 12:00 UTC', '2021-10-01 12:10 UTC'),
        ('G', '2021-10-01 11:00 UTC', '2021-10-01 11:15 UTC')
    ) as t(
        id,   starts_at,               ends_at
    )
)
SELECT
    *
FROM
    meetings, new_meetings
WHERE
    new_meetings.starts_at BETWEEN meetings.starts_at and meetings.ends_at
    OR new_meetings.ends_at BETWEEN meetings.starts_at and meetings.ends_at
    OR meetings.starts_at BETWEEN new_meetings.starts_at and new_meetings.ends_at
    OR meetings.ends_at BETWEEN new_meetings.starts_at and new_meetings.ends_at;

       starts_at     │        ends_at      │ id │       starts_at     │        ends_at
─────────────────────┼─────────────────────┼────┼─────────────────────┼────────────────────
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ F  │ 2021-10-01 12:00:00 │ 2021-10-01 12:10:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ G  │ 2021-10-01 11:00:00 │ 2021-10-01 11:15:00

The two back-to-back meetings, scenarios F and G, are incorrectly classified as overlaps. This is caused because the operator BETWEEN in inclusive. To implement this condition without using BETWEEN you would have to do something like this:

WITH new_meetings AS (/* ... */)
SELECT
    *
FROM
    meetings, new_meetings
WHERE
    (new_meetings.starts_at > meetings.starts_at AND new_meetings.starts_at < meetings.ends_at)
    OR
    (new_meetings.ends_at > meetings.starts_at AND new_meetings.ends_at < meetings.ends_at)
    OR
    (meetings.starts_at > new_meetings.starts_at AND meetings.starts_at < new_meetings.ends_at)
    OR
    (meetings.ends_at > new_meetings.starts_at AND meetings.ends_at < new_meetings.ends_at)
    OR
    (meetings.starts_at = new_meetings.starts_at AND meetings.ends_at = new_meetings.ends_at);

       starts_at     │        ends_at      │ id │       starts_at     │        ends_at
─────────────────────┼─────────────────────┼────┼─────────────────────┼────────────────────
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

The query correctly identifies scenarios A - E as overlaps, and does not identify the back-to-back scenarios F and G as overlaps. This is what you wanted. However, this condition is pretty crazy! It can easily get out of control.

This is where the following operator in PostgreSQL proves itself as extremely valuable:

WITH new_meetings AS (
    SELECT
        id,
        starts_at::timestamptz as starts_at,
        ends_at::timestamptz as ends_at
    FROM (VALUES
        ('A', '2021-10-01 11:10 UTC', '2021-10-01 11:55 UTC'),
        ('B', '2021-10-01 11:20 UTC', '2021-10-01 12:05 UTC'),
        ('C', '2021-10-01 11:20 UTC', '2021-10-01 11:55 UTC'),
        ('D', '2021-10-01 11:10 UTC', '2021-10-01 12:05 UTC'),
        ('E', '2021-10-01 11:15 UTC', '2021-10-01 12:00 UTC'),
        ('F', '2021-10-01 12:00 UTC', '2021-10-01 12:10 UTC'),
        ('G', '2021-10-01 11:00 UTC', '2021-10-01 11:15 UTC')
    ) as t(
        id,   starts_at,               ends_at
    )
)
SELECT
    *
FROM
    meetings, new_meetings
WHERE
    (new_meetings.starts_at, new_meetings.ends_at)
        OVERLAPS (meetings.starts_at, meetings.ends_at);

       starts_at     │        ends_at      │ id │       starts_at     │        ends_at
─────────────────────┼─────────────────────┼────┼─────────────────────┼────────────────────
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ A  │ 2021-10-01 11:10:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ B  │ 2021-10-01 11:20:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ C  │ 2021-10-01 11:20:00 │ 2021-10-01 11:55:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ D  │ 2021-10-01 11:10:00 │ 2021-10-01 12:05:00
 2021-10-01 11:15:00 │ 2021-10-01 12:00:00 │ E  │ 2021-10-01 11:15:00 │ 2021-10-01 12:00:00

This is it! Using the OVERLAPS operator you can replace those 5 complicated conditions, and keep the query short and simple to read and understand.

UPDATES

2021-11-09: A commenter on Reddit spotted a mistake in the name of the psql parameter in the "Autocomplete Reserved Words in Uppercase" section. Fixed COMP_KEYWORD_UPPER to COMP_KEYWORD_CASE.
2021-11-09: The example for pg_sleep was sleeping for 4 hours (14400 seconds) and not 4 minutes as was previously mentioned in the article. The example was changed to better illustrate the benefit of using an interval with pg_sleep_for.

Lesser Known PostgreSQL Features

Features you already have but may not know about!

Get the Number of Updated and Inserted Rows in an Upsert

Grant Permissions on Specific Columns

Match Against Multiple Patterns

Find the Current Value of a Sequence Without Advancing It

Use `\copy` With Multi-line SQL

Prevent Setting the Value of an Auto Generated Key

Two More Ways to Produce a Pivot Table

Dollar Quoting

Comment on Database Objects

Keep a Separate History File Per Database

Autocomplete Reserved Words in Uppercase

Sleep for Interval

Get the First or Last Row in a Group Without Sub-Queries

Generate UUID Without Extensions

Generate Reproducible Random Data

Add Constraints Without Validating Immediately

Synonyms in PostgreSQL

Find Overlapping Ranges

Similar articles

Practical SQL for Data Analysis

Future Proofing SQL with Carefully Placed Errors

How to Get the First or Last Value in a Group Using Group By in SQL

How to Get or Create in PostgreSQL

The Many Faces of DISTINCT in PostgreSQL

Want me to send you an email when I publish something new?

Share to show you care

Similar articles

Practical SQL for Data Analysis

Future Proofing SQL with Carefully Placed Errors

How to Get the First or Last Value in a Group Using Group By in SQL

How to Get or Create in PostgreSQL

The Many Faces of DISTINCT in PostgreSQL

Want me to send you an email
when I publish something new?