Postgre SQL partition tables a practical guide

Picture this: you have a table that started small and innocent, maybe a few thousand rows. Fast-forward a couple of years, and it has grown to hundreds of millions of rows. Your queries are getting slower, vacuuming takes forever, and you start wondering if there is a better way to organize all the data.

Let me introduce Partition Tables. PostgreSQL has supported native table partitioning since version 10, and it is one of those features that can genuinely transform the way your database handles large datasets. But like most powerful tools, it comes with its own set of tradeoffs that you need to understand before jumping in.

Let’s dig in.

What Are Partition Tables?

At its core, table partitioning is a technique that splits one logical table into multiple physical tables. From your application’s perspective, you are still querying a single table, but under the hood, PostgreSQL spreads the data across several smaller tables, or partitions.

Each row lives in exactly one partition, determined by the value of a partition key. When you query the parent table, PostgreSQL’s query planner figures out which partitions are relevant and only scans these, a process called partition pruning. This means smaller indexes, faster scans and more efficient maintenance operations.

Partitioning Strategies

PostgreSQL gives you several ways to decide how data gets distributed across partitions. The right choice depends on your data and your query patterns.

Range Partitioning

Range partitioning is ideal for time-series data or incrementing sequences. You define partitions based on a range of values, like dates or numeric ranges, and PostgreSQL routes each row to the correct partition.

A classic example: you have a sales table and want to partition it by month.

CREATE TABLE sales (
    sale_id serial,
    sale_date date,
    product_id int,
    quantity int,
    amount numeric
)
PARTITION BY RANGE (sale_date);

CREATE TABLE sales_january PARTITION OF sales
FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');

CREATE TABLE sales_february PARTITION OF sales
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');

CREATE TABLE sales_march PARTITION OF sales
FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');

Important: The upper bound is not inclusive. The sales_january partition contains rows where sale_date >= '2023-01-01' and sale_date < '2023-02-01'. The date 2023-02-01 itself goes into the February partition.

List Partitioning

List partitioning works well when you want to split data based on specific discrete values in a column, like country codes, payment methods, or any categorical data.

For example, partitioning companies by country:

CREATE TABLE company (
    id character varying(32) NOT NULL,
    name text NOT NULL,
    country text NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
)
PARTITION BY LIST (country);

CREATE TABLE company_uk PARTITION OF company
FOR VALUES IN ('uk');

CREATE TABLE company_fr PARTITION OF company
FOR VALUES IN ('fr');

You can create a catch-all default partition for values that don’t match any specific partition:

CREATE TABLE company_default PARTITION OF company DEFAULT;

Be careful with default partitions though. Once you have one, adding a new specific partition for a value that is already routed to the default requires you to first move those rows out of the default partition. With large tables, this can be expensive.

You can also group multiple values into a single partition to reduce the total number of partitions:

CREATE TABLE company_europe PARTITION OF company
FOR VALUES IN ('FRANCE', 'ITALY', 'GERMANY');

CREATE TABLE company_americas PARTITION OF company
FOR VALUES IN ('US', 'CANADA', 'BRAZIL');

Hash Partitioning

Sometimes there is no natural partition key. You just want to spread data evenly across partitions so that indexes don’t grow unbalanced. That is where hash partitioning comes in. PostgreSQL applies a hash function to the column value and distributes rows based on the result.

Using our company table from earlier:

CREATE TABLE company (...)
PARTITION BY HASH (id);

CREATE TABLE company_part_0 PARTITION OF company
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE company_part_1 PARTITION OF company
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

The modulus is the total number of partitions, and the remainder determines which partition a given row goes to. You must create all the partitions to cover every possible remainder value, or inserts will fail.

Multi-Level Partitioning

You can combine strategies by creating partitions that are themselves partitioned. For example, you might first partition by country using a list, and then sub-partition each country by a hash on the id to keep the size of each manageable.

-- Top-level: partition by country
CREATE TABLE company (...)
PARTITION BY LIST (country);

-- Sub-partition: partition by hash on id
CREATE TABLE company_uk PARTITION OF company
FOR VALUES IN ('uk')
PARTITION BY HASH (id);

-- Create the actual leaf partitions
CREATE TABLE company_uk_part_0 PARTITION OF company_uk
FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE company_uk_part_1 PARTITION OF company_uk
FOR VALUES WITH (MODULUS 2, REMAINDER 1);

The Good, The Bad, and The Ugly

Partition tables can be really powerful for scalability and performance, but they can also hurt you if not planned ahead. Let’s break it down.

The Good

Smaller indexes, faster queries: Each partition has its own indexes. Smaller indexes mean faster lookups and less memory pressure.
Better maintenance: Database operations, like vacuuming, happen per partition, reducing the impact on the rest of the database. No more waiting for hours.
Easy data retention: Need to drop data older than a year? You could drop the partition if it is time-bound or have a property specifying whether it is soft deleted or not.

The Bad

Queries without partition keys are slow: If your WHERE clause doesn’t include the partition key, PostgreSQL has to scan across all partitions. This is the single most important thing to get right.
No concurrent index operations: Most DDL operations don’t support the CONCURRENTLY algorithm on partitioned tables. You can work around this by running the operation on each partition individually.
More disk usage: Partitioned tables use more disk space due to duplicated indexes and metadata across partitions.
Missing partitions break inserts: PostgreSQL will refuse to insert data if the target partition doesn’t exist. Create partitions in advance or use a default (catch-all) partition.
Over-partitioning hurts performance: Too many partitions leads to terrible query plans. The query planner has to consider each partition, and with thousands of them, this overhead adds up fast.

The Ugly

Here is the one that catches most people off guard: unique indexes (including primary keys) must include all the partitioning columns. This means you can’t enforce global uniqueness across all partitions on a column that isn’t part of your partition key.

Consider this scenario:

table_name	partition_key	field
table_partition_1	1	a
table_partition_2	2	b
table_partition_3	3	b
table_partition_4	4	c

If you wanted a unique constraint on field so that no two rows across the entire table share the same value, you simply can’t. Those four rows live on four different physical tables, and PostgreSQL has no mechanism to enforce uniqueness across them. The value b appears in both table_partition_2 and table_partition_3, and there is nothing stopping that.

This has real implications for your data model. If you rely on globally unique identifiers that are not your partition key, you’ll need to rethink your approach.

For example, you can create a different table where you can keep the data that you want to be globally unique.

Query Performance

To really understand how partitioning affects query performance, let’s look at a multi-level partition setup.

Imagine we have an employee table partitioned by country (list), with each country sub-partitioned by id (hash, 10 partitions):

employee (partitioned by country)
- employee_uk (partitioned by id, hash with 10 parts)
  - employee_uk_part_0 through employee_uk_part_9
- employee_fr (partitioned by id, hash with 10 parts)
  - employee_fr_part_0 through employee_fr_part_9

Querying with only the sublevel key

When you query by id alone (the sub-partition key), PostgreSQL knows which hash partition to look at within each country, but it still has to check all country partitions:

EXPLAIN SELECT * FROM employee WHERE id = 'EE-000131357PB1';

                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.14..1452.59 rows=2 width=844)
   ->  Index Scan using employee_fr_part_6_pkey on employee_fr_part_6  (cost=0.14..2.88 rows=1 width=1007)
         Index Cond: ((id)::text = 'EE-000131357PB1'::text)
   ->  Index Scan using employee_uk_part_6_pkey on employee_uk_part_6  (cost=0.42..1449.71 rows=1 width=685)
         Index Cond: ((id)::text = 'EE-000131357PB1'::text)

Notice the Append node: PostgreSQL scans the matching hash partition in both employee_fr and employee_uk. It can prune within each country (only part_6 is scanned), but it can’t prune the countries themselves because we didn’t provide the country.

Querying with all partition keys

Now let’s include both the country and the id:

EXPLAIN SELECT * FROM employee WHERE id = 'EE-000131357PB1' AND country = 'uk';

                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Index Scan using employee_uk_part_6_pkey on employee_uk_part_6  (cost=0.42..2.64 rows=1 width=685)
       Index Cond: ((country = 'uk'::text) AND ((id)::text = 'EE-000131357PB1'::text))
(2 rows)

A single index scan on exactly one partition. That is the goal. The takeaway is clear: always include your partition keys in your WHERE clauses. The difference between scanning 2 partitions and scanning 1 might not seem huge in this example, but imagine 50 countries with 100 hash partitions each. That is 50 index scans vs 1.

Constraints on Partitioned Tables

Primary Keys

Primary keys must include all partitioning columns. This is a limitation of PostgreSQL’s unique indexes, not the PK itself.

ALTER TABLE ONLY company
    ADD CONSTRAINT company_pkey PRIMARY KEY (country, id);

Alternatively, since each partition is just a regular table, you can add a PK per partition:

ALTER TABLE ONLY company_uk
    ADD CONSTRAINT company_uk_pkey PRIMARY KEY (id);

This gives you uniqueness within each partition but not across the entire table.

Unique Indexes

Same rule as primary keys: unique indexes must include all partitioning columns.

-- Unique across the whole table (must include partition key)
CREATE UNIQUE INDEX idx_company_name ON company(country, name);

If you want uniqueness scoped to a single partition, you can create the index on the partition directly:

-- Unique only within the UK partition
CREATE UNIQUE INDEX idx_company_uk_name ON company_uk(name);

Foreign Keys

When referencing a partitioned table, the foreign key must include the partition columns as well:

CREATE TABLE employee (
    id character varying(32) NOT NULL,
    name text NOT NULL,
    company_id text NOT NULL,
    company_country text NOT NULL,
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL
);

ALTER TABLE employee
    ADD CONSTRAINT fk_company FOREIGN KEY (company_country, company_id)
    REFERENCES company (country, id);

Notice the employee table needs a company_country column to include the partition key in the FK reference.

Conclusions

Partition tables are a powerful tool in PostgreSQL’s arsenal, but they are not a silver bullet. Here is what I’d keep in mind:

Plan your partitioning strategy ahead of time. Changing it later on a live table with millions of rows is painful.
Always include partition keys in your queries. This is the single most important rule. Without the partition key in your WHERE clause, you lose all the benefits of partitioning.
Automate partition creation. Whether it is range partitions for new months or hash partitions for new categories, don’t rely on someone remembering to create them manually.
Be mindful of the unique constraint limitation. If your data model relies on globally unique columns that are not part of your partition key, you will need to find creative workarounds.
Don’t over-partition. More partitions doesn’t mean better performance. Find the right balance for your query patterns and data volume.

Thanks

As always I would like to thank my wife Marie Dziubich for the help on every post, I write so that you all get an easier-to-read article!

Thanks for reading! If you have any questions or feedback, feel free to reach out.

PostgreSQL Partition Tables: A Practical Guide