Years ago, I worked on a cable network monitoring system. We had to map out the entire physical topology: fiber nodes connecting to amplifiers, amplifiers to splitters, splitters to taps, and taps to homes. Thousands of devices in a tree structure.
The interesting part? Writing SQL queries to answer questions like "what's the path from the fiber node to this specific home?" or "if this amplifier fails, which homes lose service?"
Turns out, this same pattern shows up everywhere. Corporate org charts. Git commit graphs. File systems. And the one I'll show you today: supply chain networks.
You're tracking products as they move through your supply chain. A widget starts at a manufacturer, goes through regional distributors, hits local warehouses, arrives at retail stores, and finally reaches customers.
Acme Factory (Manufacturer)
|
├──> West Coast Distributor
| ├──> Bay Area Warehouse
| | ├──> San Francisco Store
| | └──> Oakland Store
| └──> LA Warehouse
| └──> LA Store
|
└──> East Coast Distributor
├──> NYC Warehouse
| ├──> Manhattan Store
| └──> Brooklyn Store
└──> Boston Warehouse
└──> Boston Store
Questions you need to answer:
Let's build this in SQL.
Much simpler than you'd think. One table with a self-referential foreign key:
DROP TABLE IF EXISTS supply_chain;
CREATE TABLE supply_chain (
id SERIAL PRIMARY KEY,
node_name TEXT NOT NULL,
node_type TEXT NOT NULL, -- 'manufacturer', 'distributor', 'warehouse', 'store', 'customer'
parent_id INTEGER REFERENCES supply_chain(id), -- NULL for root (manufacturer)
location TEXT,
inventory INTEGER DEFAULT 0,
distance_from_parent_km NUMERIC(8,2),
avg_transit_days INTEGER,
active BOOLEAN DEFAULT true
);
-- Sample data: A widget supply chain
INSERT INTO supply_chain (id, node_name, node_type, parent_id, location, inventory, distance_from_parent_km, avg_transit_days) VALUES
-- Manufacturer (root - no parent)
(1, 'Acme Widget Factory', 'manufacturer', NULL, 'Shenzhen, China', 50000, NULL, NULL),
-- Regional distributors (parent = manufacturer)
(2, 'West Coast Distribution', 'distributor', 1, 'Los Angeles, CA', 8000, 11000, 14),
(3, 'East Coast Distribution', 'distributor', 1, 'Newark, NJ', 7500, 12500, 16),
-- Warehouses (parent = distributor)
(4, 'Bay Area Warehouse', 'warehouse', 2, 'Oakland, CA', 2000, 560, 1),
(5, 'LA Warehouse', 'warehouse', 2, 'Los Angeles, CA', 2500, 45, 0),
(6, 'NYC Warehouse', 'warehouse', 3, 'Queens, NY', 1800, 15, 0),
(7, 'Boston Warehouse', 'warehouse', 3, 'Boston, MA', 1500, 340, 1),
-- Retail stores (parent = warehouse)
(8, 'SF Downtown Store', 'store', 4, 'San Francisco, CA', 150, 25, 0),
(9, 'Oakland Store', 'store', 4, 'Oakland, CA', 120, 8, 0),
(10, 'LA Beach Store', 'store', 5, 'Santa Monica, CA', 180, 30, 0),
(11, 'Manhattan Flagship', 'store', 6, 'New York, NY', 200, 12, 0),
(12, 'Brooklyn Store', 'store', 6, 'Brooklyn, NY', 100, 18, 0),
(13, 'Boston Commons Store', 'store', 7, 'Boston, MA', 130, 5, 0),
-- Customers (parent = store)
(14, 'Customer: Alice', 'customer', 8, 'San Francisco, CA', 0, 2, 0),
(15, 'Customer: Bob', 'customer', 9, 'Oakland, CA', 0, 3, 0),
(16, 'Customer: Carol', 'customer', 12, 'Brooklyn, NY', 0, 5, 0);
-- Index for performance
CREATE INDEX idx_supply_chain_parent ON supply_chain(parent_id);
The classic recursive CTE. Find the complete path from manufacturer to a specific customer.
WITH RECURSIVE supply_path AS (
-- Base case: start at the manufacturer (no parent)
SELECT
id,
node_name,
node_type,
location,
parent_id,
ARRAY[id] AS path_ids,
ARRAY[node_name] AS path_names,
0 AS depth
FROM supply_chain
WHERE parent_id IS NULL -- root node
UNION ALL
-- Recursive case: follow parent-child relationships
SELECT
c.id,
c.node_name,
c.node_type,
c.location,
c.parent_id,
sp.path_ids || c.id,
sp.path_names || c.node_name,
sp.depth + 1
FROM supply_chain c
JOIN supply_path sp ON c.parent_id = sp.id
)
SELECT
node_name AS destination,
depth AS hops_from_factory,
array_to_string(path_names, ' → ') AS full_path
FROM supply_path
WHERE node_name = 'Customer: Alice' -- Find path to specific customer
ORDER BY depth;
Output:
destination | hops_from_factory | full_path
-----------------|-------------------|----------------------------------------------------------
Customer: Alice | 4 | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → SF Downtown Store → Customer: Alice
Try it yourself: Change 'Customer: Alice' to 'Brooklyn Store' to see the path to that store.
If a warehouse goes offline, which stores lose supply?
WITH RECURSIVE downstream AS (
-- Base case: start at the node we're analyzing
SELECT
id,
node_name,
node_type,
0 AS depth
FROM supply_chain
WHERE node_name = 'Bay Area Warehouse' -- Starting point
UNION ALL
-- Recursive case: find all children (and their children, etc.)
SELECT
c.id,
c.node_name,
c.node_type,
d.depth + 1
FROM supply_chain c
JOIN downstream d ON c.parent_id = d.id
)
SELECT
depth,
node_type,
node_name,
CASE
WHEN depth = 0 THEN 'FAILED NODE'
ELSE 'Impacted'
END AS status
FROM downstream
ORDER BY depth, node_type, node_name;
Output:
depth | node_type | node_name | status
------|-----------|------------------------|---------------
0 | warehouse | Bay Area Warehouse | FAILED NODE
1 | store | Oakland Store | Impacted
1 | store | SF Downtown Store | Impacted
2 | customer | Customer: Alice | Impacted
2 | customer | Customer: Bob | Impacted
Business value: Instantly see the blast radius of a warehouse failure. 2 stores and 2 customers affected.
Going the other direction. For a store, trace back to the manufacturer.
WITH RECURSIVE upstream AS (
-- Base case: start at the node we're tracing
SELECT
id,
node_name,
node_type,
parent_id,
ARRAY[node_name] AS supply_path,
0 AS depth
FROM supply_chain
WHERE node_name = 'Manhattan Flagship'
UNION ALL
-- Recursive case: follow parent relationships up the tree
SELECT
p.id,
p.node_name,
p.node_type,
p.parent_id,
p.node_name || u.supply_path,
u.depth + 1
FROM supply_chain p
JOIN upstream u ON p.id = u.parent_id
)
SELECT
depth,
node_name,
node_type,
array_to_string(supply_path, ' ← ') AS reverse_path
FROM upstream
ORDER BY depth DESC;
Output:
depth | node_name | node_type | reverse_path
------|---------------------------|--------------|----------------------------------------------------------
3 | Acme Widget Factory | manufacturer | Manhattan Flagship ← NYC Warehouse ← East Coast Distribution ← Acme Widget Factory
2 | East Coast Distribution | distributor | Manhattan Flagship ← NYC Warehouse ← East Coast Distribution
1 | NYC Warehouse | warehouse | Manhattan Flagship ← NYC Warehouse
0 | Manhattan Flagship | store | Manhattan Flagship
How much inventory is at a node plus everything downstream from it?
WITH RECURSIVE downstream_inventory AS (
-- Base case: start at the node we want to analyze
SELECT
id,
node_name,
inventory,
0 AS depth
FROM supply_chain
WHERE node_name = 'West Coast Distribution'
UNION ALL
-- Recursive case: accumulate all children's inventory
SELECT
c.id,
c.node_name,
c.inventory,
di.depth + 1
FROM supply_chain c
JOIN downstream_inventory di ON c.parent_id = di.id
)
SELECT
(SELECT node_name FROM supply_chain WHERE node_name = 'West Coast Distribution') AS starting_node,
SUM(inventory) AS total_downstream_inventory,
COUNT(*) AS total_downstream_nodes,
SUM(inventory) FILTER (WHERE depth = 0) AS inventory_at_node,
SUM(inventory) FILTER (WHERE depth > 0) AS inventory_downstream
FROM downstream_inventory;
Output:
starting_node | total_downstream_inventory | total_downstream_nodes | inventory_at_node | inventory_downstream
-------------------------|----------------------------|------------------------|-------------------|---------------------
West Coast Distribution | 13050 | 7 | 8000 | 5050
What this tells you: West Coast Distribution has 8,000 units itself, but controls 13,050 units total when you include all warehouses and stores downstream.
How many hops is each node from the manufacturer?
WITH RECURSIVE node_depth AS (
-- Base case: manufacturer is depth 0 (no parent)
SELECT
id,
node_name,
node_type,
parent_id,
0 AS depth,
node_name AS path
FROM supply_chain
WHERE parent_id IS NULL
UNION ALL
-- Recursive case: increment depth for each child
SELECT
c.id,
c.node_name,
c.node_type,
c.parent_id,
nd.depth + 1,
nd.path || ' → ' || c.node_name
FROM supply_chain c
JOIN node_depth nd ON c.parent_id = nd.id
)
SELECT
depth AS tier,
node_type,
node_name,
path
FROM node_depth
ORDER BY depth, node_type, node_name;
Output:
tier | node_type | node_name | path
-----|--------------|---------------------------|----------------------------------------------------------
0 | manufacturer | Acme Widget Factory | Acme Widget Factory
1 | distributor | East Coast Distribution | Acme Widget Factory → East Coast Distribution
1 | distributor | West Coast Distribution | Acme Widget Factory → West Coast Distribution
2 | warehouse | Bay Area Warehouse | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse
2 | warehouse | Boston Warehouse | Acme Widget Factory → East Coast Distribution → Boston Warehouse
2 | warehouse | LA Warehouse | Acme Widget Factory → West Coast Distribution → LA Warehouse
2 | warehouse | NYC Warehouse | Acme Widget Factory → East Coast Distribution → NYC Warehouse
3 | store | Boston Commons Store | Acme Widget Factory → East Coast Distribution → Boston Warehouse → Boston Commons Store
3 | store | Brooklyn Store | Acme Widget Factory → East Coast Distribution → NYC Warehouse → Brooklyn Store
3 | store | LA Beach Store | Acme Widget Factory → West Coast Distribution → LA Warehouse → LA Beach Store
3 | store | Manhattan Flagship | Acme Widget Factory → East Coast Distribution → NYC Warehouse → Manhattan Flagship
3 | store | Oakland Store | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → Oakland Store
3 | store | SF Downtown Store | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → SF Downtown Store
4 | customer | Customer: Alice | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → SF Downtown Store → Customer: Alice
4 | customer | Customer: Bob | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → Oakland Store → Customer: Bob
4 | customer | Customer: Carol | Acme Widget Factory → East Coast Distribution → NYC Warehouse → Brooklyn Store → Customer: Carol
Business insight: Your supply chain is 4 tiers deep. Customers are 4 hops from the manufacturer.
Which nodes have the most downstream dependents?
WITH RECURSIVE downstream_count AS (
-- Base case: every node starts with itself
SELECT
id,
node_name,
node_type,
id AS downstream_id
FROM supply_chain
UNION ALL
-- Recursive case: add all children as downstream nodes
SELECT
dc.id,
dc.node_name,
dc.node_type,
c.id AS downstream_id
FROM supply_chain c
JOIN downstream_count dc ON c.parent_id = dc.downstream_id
)
SELECT
node_name,
node_type,
COUNT(DISTINCT downstream_id) - 1 AS downstream_nodes_count,
CASE
WHEN COUNT(DISTINCT downstream_id) - 1 >= 6 THEN 'Critical bottleneck'
WHEN COUNT(DISTINCT downstream_id) - 1 >= 3 THEN 'Important node'
ELSE 'Leaf/minor node'
END AS criticality
FROM downstream_count
GROUP BY id, node_name, node_type
ORDER BY downstream_nodes_count DESC
LIMIT 10;
Output:
node_name | node_type | downstream_nodes_count | criticality
--------------------------|--------------|------------------------|--------------------
Acme Widget Factory | manufacturer | 15 | Critical bottleneck
West Coast Distribution | distributor | 6 | Critical bottleneck
East Coast Distribution | distributor | 7 | Critical bottleneck
Bay Area Warehouse | warehouse | 4 | Important node
NYC Warehouse | warehouse | 3 | Important node
LA Warehouse | warehouse | 1 | Leaf/minor node
Boston Warehouse | warehouse | 1 | Leaf/minor node
SF Downtown Store | store | 1 | Leaf/minor node
Oakland Store | store | 1 | Leaf/minor node
LA Beach Store | store | 0 | Leaf/minor node
Risk management: If Bay Area Warehouse fails, 4 downstream nodes lose supply. Need redundancy.
Notice the pattern? Whether it's:
The SQL is always the same.
That's it. One table, one self-referential foreign key, one recursive CTE. Master this pattern once, use it everywhere.
I've used this pattern for:
The queries change slightly, but the recursive CTE structure stays the same.
For large graphs (10,000+ nodes):
Add depth limits to prevent runaway recursion:
WHERE depth < 20 -- max depth
Index the parent_id column (we already did this):
CREATE INDEX idx_supply_chain_parent ON supply_chain(parent_id);
Materialize common paths if you query them frequently:
CREATE MATERIALIZED VIEW common_paths AS WITH RECURSIVE ... ;
For pure trees (no cycles), you don't need cycle detection. But if your graph can have cycles (like if stores can transfer to each other), add:
WHERE NOT c.id = ANY(path_array)
Head over to SQLBook.io, spin up a PostgreSQL sandbox, and paste in the schema and queries above.
Challenges:
Advanced:
Share your solutions!
Tags: #SQL #RecursiveCTE #GraphTraversal #SupplyChain #PostgreSQL #TreeStructures
Difficulty: Intermediate-Advanced
Topics: Recursive CTEs, Graph Traversal, Tree Structures, Path Finding
SQL Output