Ever wonder why your queries are slow? You write a simple SELECT ... WHERE customer_id = 42 and it takes seconds instead of milliseconds. The table isn't even that big.
The answer, almost every time, is a missing index. Without one, PostgreSQL reads every row in the table to find the few you need. With one, it jumps directly to the matching rows.
Indexes are the single biggest performance lever you have in a relational database. Let's understand exactly what they do and how to verify they're working with EXPLAIN ANALYZE.
We'll use a realistic orders table. The schema is small but the concepts apply to tables of any size.
DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS customers;
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
email TEXT NOT NULL,
city TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
order_date DATE NOT NULL,
status TEXT NOT NULL,
total NUMERIC(10,2) NOT NULL
);
Now let's load enough data to make indexes matter. Small tables don't show the difference because PostgreSQL will just scan the whole thing regardless.
INSERT INTO customers (name, email, city)
SELECT
'Customer ' || i,
'customer' || i || '@example.com',
(ARRAY['London','Paris','Berlin','Madrid','Rome'])[1 + (i % 5)]
FROM generate_series(1, 1000) AS i;
INSERT INTO orders (customer_id, order_date, status, total)
SELECT
(random() * 999 + 1)::INTEGER,
DATE '2020-01-01' + (random() * 1500)::INTEGER,
(ARRAY['pending','shipped','delivered','cancelled'])[1 + (i % 4)],
(random() * 500 + 10)::NUMERIC(10,2)
FROM generate_series(1, 50000) AS i;
That gives us 1,000 customers and 50,000 orders. Enough to see real differences in query plans.
An index is a separate data structure that PostgreSQL maintains alongside your table. Think of it like the index in a textbook — instead of reading every page to find "B-tree", you flip to the back, find the entry, and jump directly to page 247.
Without an index, PostgreSQL does a Sequential Scan (Seq Scan) — it reads every row in the table and checks each one against your WHERE clause. With an index, PostgreSQL has two main strategies:
Both are dramatically faster than a Seq Scan. PostgreSQL picks the strategy automatically based on how many rows it expects to find.
The trade-off: indexes speed up reads but slow down writes (because the index must be updated on every INSERT, UPDATE, or DELETE). They also use disk space.
Let's find all orders for a specific customer:
EXPLAIN ANALYZE
SELECT order_id, order_date, status, total
FROM orders
WHERE customer_id = 42;
Output:
Seq Scan on orders (cost=0.00..1135.00 rows=50 width=27)
(actual time=0.031..4.257 rows=46 loops=1)
Filter: (customer_id = 42)
Rows Removed by Filter: 49954
Planning Time: 0.085 ms
Execution Time: 4.312 ms
What this tells us:
That Rows Removed by Filter number is the red flag. PostgreSQL did the work of reading 50,000 rows to return 46. That's a 99.9% waste ratio.
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Now run the same query:
EXPLAIN ANALYZE
SELECT order_id, order_date, status, total
FROM orders
WHERE customer_id = 42;
Output:
Bitmap Heap Scan on orders (cost=6.23..357.44 rows=50 width=27)
(actual time=0.049..0.156 rows=46 loops=1)
Recheck Cond: (customer_id = 42)
Heap Blocks: exact=43
-> Bitmap Index Scan on idx_orders_customer_id
(cost=0.00..6.17 rows=50 width=0)
Index Cond: (customer_id = 42)
Planning Time: 0.290 ms
Execution Time: 0.218 ms
What changed:
The plan has two steps: first, the Bitmap Index Scan reads the index and builds a bitmap of which pages contain matching rows (43 pages, shown in Heap Blocks: exact=43). Then the Bitmap Heap Scan fetches those pages in physical order. This is faster than bouncing randomly across the table for each row.
PostgreSQL chose a Bitmap scan here because the 46 matching rows are scattered across 43 different pages. If only 1-2 rows matched, you'd see a plain Index Scan instead. The planner picks the best strategy automatically.
Let's break down what EXPLAIN tells you.
EXPLAIN vs EXPLAIN ANALYZE:
-- Shows the plan without running the query
EXPLAIN SELECT ...;
-- Runs the query and shows actual timing
EXPLAIN ANALYZE SELECT ...;
Use EXPLAIN when you just want to see the plan. Use EXPLAIN ANALYZE when you need real timing numbers. Be careful with EXPLAIN ANALYZE on INSERT/UPDATE/DELETE — it actually executes the statement.
Key fields to watch:
| Field | What It Means |
|---|---|
| Seq Scan | Reading every row — no useful index |
| Index Scan | Using an index to find rows directly |
| Index Only Scan | Answering the query entirely from the index |
| Bitmap Index Scan | Using index to build a map of matching pages |
| Filter | Checking rows after retrieval (post-filter) |
| Index Cond | Condition pushed into the index (efficient) |
| Rows Removed by Filter | Rows read but not returned (waste) |
| actual time | Wall clock time in milliseconds |
| rows | Number of rows processed |
The goal is to see Index Cond instead of Filter, and any kind of index scan instead of Seq Scan.
If the index contains all the columns your query needs, PostgreSQL can skip the table entirely:
EXPLAIN ANALYZE
SELECT customer_id
FROM orders
WHERE customer_id = 42;
Output:
Index Only Scan using idx_orders_customer_id on orders
(cost=0.42..53.88 rows=50 width=4)
(actual time=0.021..0.048 rows=47 loops=1)
Index Cond: (customer_id = 42)
Heap Fetches: 0
Planning Time: 0.078 ms
Execution Time: 0.067 ms
Index Only Scan — PostgreSQL answered the entire query from the index without touching the table at all. Notice Heap Fetches: 0 — zero trips to the actual table data.
This is the fastest possible scan type. It happens when:
VACUUM if Heap Fetches is high)Indexes aren't always used. PostgreSQL's query planner decides based on cost estimates.
High selectivity — index used:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
-- Bitmap Index Scan (finding ~50 rows out of 50,000)
Low selectivity — index ignored:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id < 500;
-- Seq Scan (returning ~25,000 rows out of 50,000)
Output for the low-selectivity query:
Seq Scan on orders (cost=0.00..1285.00 rows=24975 width=27)
(actual time=0.013..3.471 rows=25013 loops=1)
Filter: (customer_id < 500)
Rows Removed by Filter: 24987
Planning Time: 0.127 ms
Execution Time: 4.384 ms
PostgreSQL chose a Seq Scan even though the index exists. Why? Because when you're returning half the table, it's faster to just read the whole thing sequentially than to bounce between the index and the table for each row. The planner is smart about this.
Rule of thumb: Indexes help most when your query returns a small fraction of the table (typically under 10-15%).
A common trap — wrapping an indexed column in a function:
-- This WON'T use an index on order_date
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2023;
Output:
Seq Scan on orders (cost=0.00..1460.00 rows=250 width=27)
(actual time=0.023..5.258 rows=8870 loops=1)
Filter: (EXTRACT(year FROM order_date) = '2023'::numeric)
Rows Removed by Filter: 41130
Seq Scan. The index is on order_date, but the query is filtering on EXTRACT(YEAR FROM order_date) — a function applied to the column. PostgreSQL can't use the index because the index stores dates, not extracted years.
Fix — rewrite to use the column directly:
CREATE INDEX idx_orders_date ON orders(order_date);
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
Output:
Bitmap Heap Scan on orders (cost=168.42..952.33 rows=8870 width=27)
(actual time=0.812..3.472 rows=8870 loops=1)
Recheck Cond: ((order_date >= '2023-01-01') AND (order_date < '2024-01-01'))
Heap Blocks: exact=259
-> Bitmap Index Scan on idx_orders_date
(cost=0.00..166.20 rows=8870 width=0)
Index Cond: ((order_date >= '2023-01-01') AND (order_date < '2024-01-01'))
Planning Time: 0.112 ms
Execution Time: 4.091 ms
Same result, but now PostgreSQL uses the index. The trick is expressing your condition as a range on the raw column instead of applying a function.
Alternative — expression index:
CREATE INDEX idx_orders_year ON orders((EXTRACT(YEAR FROM order_date)));
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2023;
Now it works because the index matches the expression exactly. Use expression indexes when you can't rewrite the query.
Before creating indexes, check what's already there:
SELECT
indexname,
indexdef
FROM pg_indexes
WHERE tablename = 'orders'
ORDER BY indexname;
Output:
indexname | indexdef
-------------------------|--------------------------------------------
idx_orders_customer_id | CREATE INDEX idx_orders_customer_id ON ...
idx_orders_date | CREATE INDEX idx_orders_date ON ...
orders_pkey | CREATE UNIQUE INDEX orders_pkey ON ...
orders_pkey is the primary key index — PostgreSQL creates this automatically.
Check index size:
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'orders'
ORDER BY pg_relation_size(indexname::regclass) DESC;
Output:
indexname | size
-------------------------|--------
orders_pkey | 1112 kB
idx_orders_customer_id | 1112 kB
idx_orders_date | 1112 kB
Each index takes disk space. For a 50,000-row table, that's roughly 1 MB per index. Scale that up — on a table with 100 million rows, each index could be gigabytes. Don't create indexes you don't need.
PostgreSQL's default index type is a B-tree (balanced tree). Conceptually:
[500]
/ \
[200] [800]
/ \ / \
[100] [300] [600] [900]
/ \ / \ / \ / \
... ... ... ... ... ... ...
(leaf nodes point to table rows)
When you search for customer_id = 42:
This takes O(log n) steps. For 50,000 rows, that's about 16 comparisons instead of 50,000. For 100 million rows, it's about 27 comparisons.
Goal: Write a query on the orders table that does a Seq Scan, then add an index to make it use an Index Scan. Verify with EXPLAIN ANALYZE.
Setup: Drop extra indexes to start clean:
DROP INDEX IF EXISTS idx_orders_customer_id;
DROP INDEX IF EXISTS idx_orders_date;
Hints:
status column — there's no index on it yetEXPLAIN ANALYZE to confirm the Seq ScanEXPLAIN ANALYZE again and comparePseudocode:
-- Step 1: Query that does a Seq Scan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';
-- Step 2: Add the index
CREATE INDEX ... ON orders(...);
-- Step 3: Same query now does Index Scan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE status = 'pending';
-- Compare the execution times
Goal: Find all orders placed on a Monday. First try it without an index, then create an expression index that makes it fast.
Expected output from EXPLAIN:
-- Before: Seq Scan with Filter
-- After: Index Scan (or Bitmap Index Scan) with Index Cond
Hints:
EXTRACT(DOW FROM order_date) returns 1 for Monday in PostgreSQLorder_date won't helpEXTRACT(DOW FROM order_date)Pseudocode:
-- The expression index
CREATE INDEX idx_orders_dow ON orders((EXTRACT(DOW FROM order_date)));
-- Now this uses the index
EXPLAIN ANALYZE
SELECT * FROM orders WHERE EXTRACT(DOW FROM order_date) = 1;
Goal: Write a query that produces an Index Only Scan (no heap fetches). Then modify the query to add a column that forces it back to a regular Index Scan.
Hints:
(status)status with a WHERE on statusVACUUM orders first to update the visibility mapIndex Only Scan and Heap Fetches: 0 in the outputtotal to the SELECT list — watch it switch to Index ScanThis same pattern — "query is slow, add an index, verify with EXPLAIN" — is the first thing you do in any performance investigation:
orders.customer_id for "show my orders" pagesusers.email for login lookupslogs.created_at for time-range queriesproducts.category for filtered browsingIndexes aren't free. Skip them when:
is_active column with 50/50 split won't benefit muchWhen a query is slow:
EXPLAIN ANALYZE on itpg_indexesEXPLAIN ANALYZE againThat's the workflow. Learn it once, use it every day.
Tags: #SQL #PostgreSQL #Indexes #Performance #EXPLAIN #QueryPlanning
Difficulty: Beginner
Topics: Indexes, EXPLAIN ANALYZE, Seq Scan, Index Scan, B-tree, Query Planning
SQL Output