Tracing Supply Chains with SQL: From Factory to Your Doorstep

The Backstory

Years ago, I worked on a cable network monitoring system. We had to map out the entire physical topology: fiber nodes connecting to amplifiers, amplifiers to splitters, splitters to taps, and taps to homes. Thousands of devices in a tree structure.

The interesting part? Writing SQL queries to answer questions like "what's the path from the fiber node to this specific home?" or "if this amplifier fails, which homes lose service?"

Turns out, this same pattern shows up everywhere. Corporate org charts. Git commit graphs. File systems. And the one I'll show you today: supply chain networks.

The Problem

You're tracking products as they move through your supply chain. A widget starts at a manufacturer, goes through regional distributors, hits local warehouses, arrives at retail stores, and finally reaches customers.

Acme Factory (Manufacturer)
    |
    ├──> West Coast Distributor
    |       ├──> Bay Area Warehouse
    |       |      ├──> San Francisco Store
    |       |      └──> Oakland Store
    |       └──> LA Warehouse
    |              └──> LA Store
    |
    └──> East Coast Distributor
            ├──> NYC Warehouse
            |      ├──> Manhattan Store
            |      └──> Brooklyn Store
            └──> Boston Warehouse
                   └──> Boston Store

Questions you need to answer:

What's the full path from factory to a specific store?
If the Bay Area Warehouse goes down, which stores are affected?
How many hops does it take to get product to customers?
What's the total inventory at a node and everything downstream?

Let's build this in SQL.

The Schema

Much simpler than you'd think. One table with a self-referential foreign key:

DROP TABLE IF EXISTS supply_chain;
CREATE TABLE supply_chain (
  id SERIAL PRIMARY KEY,
  node_name TEXT NOT NULL,
  node_type TEXT NOT NULL, -- 'manufacturer', 'distributor', 'warehouse', 'store', 'customer'
  parent_id INTEGER REFERENCES supply_chain(id), -- NULL for root (manufacturer)
  location TEXT,
  inventory INTEGER DEFAULT 0,
  distance_from_parent_km NUMERIC(8,2),
  avg_transit_days INTEGER,
  active BOOLEAN DEFAULT true
);

-- Sample data: A widget supply chain
INSERT INTO supply_chain (id, node_name, node_type, parent_id, location, inventory, distance_from_parent_km, avg_transit_days) VALUES
  -- Manufacturer (root - no parent)
  (1, 'Acme Widget Factory', 'manufacturer', NULL, 'Shenzhen, China', 50000, NULL, NULL),
  
  -- Regional distributors (parent = manufacturer)
  (2, 'West Coast Distribution', 'distributor', 1, 'Los Angeles, CA', 8000, 11000, 14),
  (3, 'East Coast Distribution', 'distributor', 1, 'Newark, NJ', 7500, 12500, 16),
  
  -- Warehouses (parent = distributor)
  (4, 'Bay Area Warehouse', 'warehouse', 2, 'Oakland, CA', 2000, 560, 1),
  (5, 'LA Warehouse', 'warehouse', 2, 'Los Angeles, CA', 2500, 45, 0),
  (6, 'NYC Warehouse', 'warehouse', 3, 'Queens, NY', 1800, 15, 0),
  (7, 'Boston Warehouse', 'warehouse', 3, 'Boston, MA', 1500, 340, 1),
  
  -- Retail stores (parent = warehouse)
  (8, 'SF Downtown Store', 'store', 4, 'San Francisco, CA', 150, 25, 0),
  (9, 'Oakland Store', 'store', 4, 'Oakland, CA', 120, 8, 0),
  (10, 'LA Beach Store', 'store', 5, 'Santa Monica, CA', 180, 30, 0),
  (11, 'Manhattan Flagship', 'store', 6, 'New York, NY', 200, 12, 0),
  (12, 'Brooklyn Store', 'store', 6, 'Brooklyn, NY', 100, 18, 0),
  (13, 'Boston Commons Store', 'store', 7, 'Boston, MA', 130, 5, 0),
  
  -- Customers (parent = store)
  (14, 'Customer: Alice', 'customer', 8, 'San Francisco, CA', 0, 2, 0),
  (15, 'Customer: Bob', 'customer', 9, 'Oakland, CA', 0, 3, 0),
  (16, 'Customer: Carol', 'customer', 12, 'Brooklyn, NY', 0, 5, 0);

-- Index for performance
CREATE INDEX idx_supply_chain_parent ON supply_chain(parent_id);

Query 1: Find the Path from Factory to Customer

The classic recursive CTE. Find the complete path from manufacturer to a specific customer.

WITH RECURSIVE supply_path AS (
  -- Base case: start at the manufacturer (no parent)
  SELECT 
    id,
    node_name,
    node_type,
    location,
    parent_id,
    ARRAY[id] AS path_ids,
    ARRAY[node_name] AS path_names,
    0 AS depth
  FROM supply_chain
  WHERE parent_id IS NULL  -- root node
  
  UNION ALL
  
  -- Recursive case: follow parent-child relationships
  SELECT 
    c.id,
    c.node_name,
    c.node_type,
    c.location,
    c.parent_id,
    sp.path_ids || c.id,
    sp.path_names || c.node_name,
    sp.depth + 1
  FROM supply_chain c
  JOIN supply_path sp ON c.parent_id = sp.id
)
SELECT 
  node_name AS destination,
  depth AS hops_from_factory,
  array_to_string(path_names, ' → ') AS full_path
FROM supply_path
WHERE node_name = 'Customer: Alice'  -- Find path to specific customer
ORDER BY depth;

Output:

destination      | hops_from_factory | full_path
-----------------|-------------------|----------------------------------------------------------
Customer: Alice  | 4                 | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → SF Downtown Store → Customer: Alice

Try it yourself: Change 'Customer: Alice' to 'Brooklyn Store' to see the path to that store.

Query 2: Find All Downstream Nodes (Who Gets Affected?)

If a warehouse goes offline, which stores lose supply?

WITH RECURSIVE downstream AS (
  -- Base case: start at the node we're analyzing
  SELECT 
    id,
    node_name,
    node_type,
    0 AS depth
  FROM supply_chain
  WHERE node_name = 'Bay Area Warehouse'  -- Starting point
  
  UNION ALL
  
  -- Recursive case: find all children (and their children, etc.)
  SELECT 
    c.id,
    c.node_name,
    c.node_type,
    d.depth + 1
  FROM supply_chain c
  JOIN downstream d ON c.parent_id = d.id
)
SELECT 
  depth,
  node_type,
  node_name,
  CASE 
    WHEN depth = 0 THEN 'FAILED NODE'
    ELSE 'Impacted'
  END AS status
FROM downstream
ORDER BY depth, node_type, node_name;

Output:

depth | node_type | node_name              | status
------|-----------|------------------------|---------------
0     | warehouse | Bay Area Warehouse     | FAILED NODE
1     | store     | Oakland Store          | Impacted
1     | store     | SF Downtown Store      | Impacted
2     | customer  | Customer: Alice        | Impacted
2     | customer  | Customer: Bob          | Impacted

Business value: Instantly see the blast radius of a warehouse failure. 2 stores and 2 customers affected.

Query 3: Find All Upstream Suppliers (Where Does This Come From?)

Going the other direction. For a store, trace back to the manufacturer.

WITH RECURSIVE upstream AS (
  -- Base case: start at the node we're tracing
  SELECT 
    id,
    node_name,
    node_type,
    parent_id,
    ARRAY[node_name] AS supply_path,
    0 AS depth
  FROM supply_chain
  WHERE node_name = 'Manhattan Flagship'
  
  UNION ALL
  
  -- Recursive case: follow parent relationships up the tree
  SELECT 
    p.id,
    p.node_name,
    p.node_type,
    p.parent_id,
    p.node_name || u.supply_path,
    u.depth + 1
  FROM supply_chain p
  JOIN upstream u ON p.id = u.parent_id
)
SELECT 
  depth,
  node_name,
  node_type,
  array_to_string(supply_path, ' ← ') AS reverse_path
FROM upstream
ORDER BY depth DESC;

Output:

depth | node_name                 | node_type    | reverse_path
------|---------------------------|--------------|----------------------------------------------------------
3     | Acme Widget Factory       | manufacturer | Manhattan Flagship ← NYC Warehouse ← East Coast Distribution ← Acme Widget Factory
2     | East Coast Distribution   | distributor  | Manhattan Flagship ← NYC Warehouse ← East Coast Distribution
1     | NYC Warehouse             | warehouse    | Manhattan Flagship ← NYC Warehouse
0     | Manhattan Flagship        | store        | Manhattan Flagship

Query 4: Calculate Total Downstream Inventory

How much inventory is at a node plus everything downstream from it?

WITH RECURSIVE downstream_inventory AS (
  -- Base case: start at the node we want to analyze
  SELECT 
    id,
    node_name,
    inventory,
    0 AS depth
  FROM supply_chain
  WHERE node_name = 'West Coast Distribution'
  
  UNION ALL
  
  -- Recursive case: accumulate all children's inventory
  SELECT 
    c.id,
    c.node_name,
    c.inventory,
    di.depth + 1
  FROM supply_chain c
  JOIN downstream_inventory di ON c.parent_id = di.id
)
SELECT 
  (SELECT node_name FROM supply_chain WHERE node_name = 'West Coast Distribution') AS starting_node,
  SUM(inventory) AS total_downstream_inventory,
  COUNT(*) AS total_downstream_nodes,
  SUM(inventory) FILTER (WHERE depth = 0) AS inventory_at_node,
  SUM(inventory) FILTER (WHERE depth > 0) AS inventory_downstream
FROM downstream_inventory;

Output:

starting_node            | total_downstream_inventory | total_downstream_nodes | inventory_at_node | inventory_downstream
-------------------------|----------------------------|------------------------|-------------------|---------------------
West Coast Distribution  | 13050                      | 7                      | 8000              | 5050

What this tells you: West Coast Distribution has 8,000 units itself, but controls 13,050 units total when you include all warehouses and stores downstream.

Query 5: Find Depth of Each Node (Supply Chain Tiers)

How many hops is each node from the manufacturer?

WITH RECURSIVE node_depth AS (
  -- Base case: manufacturer is depth 0 (no parent)
  SELECT 
    id,
    node_name,
    node_type,
    parent_id,
    0 AS depth,
    node_name AS path
  FROM supply_chain
  WHERE parent_id IS NULL
  
  UNION ALL
  
  -- Recursive case: increment depth for each child
  SELECT 
    c.id,
    c.node_name,
    c.node_type,
    c.parent_id,
    nd.depth + 1,
    nd.path || ' → ' || c.node_name
  FROM supply_chain c
  JOIN node_depth nd ON c.parent_id = nd.id
)
SELECT 
  depth AS tier,
  node_type,
  node_name,
  path
FROM node_depth
ORDER BY depth, node_type, node_name;

Output:

tier | node_type    | node_name                 | path
-----|--------------|---------------------------|----------------------------------------------------------
0    | manufacturer | Acme Widget Factory       | Acme Widget Factory
1    | distributor  | East Coast Distribution   | Acme Widget Factory → East Coast Distribution
1    | distributor  | West Coast Distribution   | Acme Widget Factory → West Coast Distribution
2    | warehouse    | Bay Area Warehouse        | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse
2    | warehouse    | Boston Warehouse          | Acme Widget Factory → East Coast Distribution → Boston Warehouse
2    | warehouse    | LA Warehouse              | Acme Widget Factory → West Coast Distribution → LA Warehouse
2    | warehouse    | NYC Warehouse             | Acme Widget Factory → East Coast Distribution → NYC Warehouse
3    | store        | Boston Commons Store      | Acme Widget Factory → East Coast Distribution → Boston Warehouse → Boston Commons Store
3    | store        | Brooklyn Store            | Acme Widget Factory → East Coast Distribution → NYC Warehouse → Brooklyn Store
3    | store        | LA Beach Store            | Acme Widget Factory → West Coast Distribution → LA Warehouse → LA Beach Store
3    | store        | Manhattan Flagship        | Acme Widget Factory → East Coast Distribution → NYC Warehouse → Manhattan Flagship
3    | store        | Oakland Store             | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → Oakland Store
3    | store        | SF Downtown Store         | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → SF Downtown Store
4    | customer     | Customer: Alice           | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → SF Downtown Store → Customer: Alice
4    | customer     | Customer: Bob             | Acme Widget Factory → West Coast Distribution → Bay Area Warehouse → Oakland Store → Customer: Bob
4    | customer     | Customer: Carol           | Acme Widget Factory → East Coast Distribution → NYC Warehouse → Brooklyn Store → Customer: Carol

Business insight: Your supply chain is 4 tiers deep. Customers are 4 hops from the manufacturer.

Query 6: Find Bottleneck Nodes (Most Traffic Passes Through)

Which nodes have the most downstream dependents?

WITH RECURSIVE downstream_count AS (
  -- Base case: every node starts with itself
  SELECT 
    id,
    node_name,
    node_type,
    id AS downstream_id
  FROM supply_chain
  
  UNION ALL
  
  -- Recursive case: add all children as downstream nodes
  SELECT 
    dc.id,
    dc.node_name,
    dc.node_type,
    c.id AS downstream_id
  FROM supply_chain c
  JOIN downstream_count dc ON c.parent_id = dc.downstream_id
)
SELECT 
  node_name,
  node_type,
  COUNT(DISTINCT downstream_id) - 1 AS downstream_nodes_count,
  CASE 
    WHEN COUNT(DISTINCT downstream_id) - 1 >= 6 THEN 'Critical bottleneck'
    WHEN COUNT(DISTINCT downstream_id) - 1 >= 3 THEN 'Important node'
    ELSE 'Leaf/minor node'
  END AS criticality
FROM downstream_count
GROUP BY id, node_name, node_type
ORDER BY downstream_nodes_count DESC
LIMIT 10;

Output:

node_name                 | node_type    | downstream_nodes_count | criticality
--------------------------|--------------|------------------------|--------------------
Acme Widget Factory       | manufacturer | 15                     | Critical bottleneck
West Coast Distribution   | distributor  | 6                      | Critical bottleneck
East Coast Distribution   | distributor  | 7                      | Critical bottleneck
Bay Area Warehouse        | warehouse    | 4                      | Important node
NYC Warehouse             | warehouse    | 3                      | Important node
LA Warehouse              | warehouse    | 1                      | Leaf/minor node
Boston Warehouse          | warehouse    | 1                      | Leaf/minor node
SF Downtown Store         | store        | 1                      | Leaf/minor node
Oakland Store             | store        | 1                      | Leaf/minor node
LA Beach Store            | store        | 0                      | Leaf/minor node

Risk management: If Bay Area Warehouse fails, 4 downstream nodes lose supply. Need redundancy.

The Pattern

Notice the pattern? Whether it's:

Cable networks: fiber node → amplifier → splitter → tap → home
Supply chains: manufacturer → distributor → warehouse → store → customer
Corporate org: CEO → VP → director → manager → employee
Git repos: root commit → feature branch → merge commit

The SQL is always the same.

That's it. One table, one self-referential foreign key, one recursive CTE. Master this pattern once, use it everywhere.

Real-World Applications

I've used this pattern for:

Network topology (the original cable plant monitoring)
Incident impact analysis (if this server fails, what breaks?)
Bill of materials (what components make up this product?)
Dependency graphs (which packages depend on this library?)
Access control (who reports to whom? who can approve what?)
Content hierarchies (website navigation, folder structures)

The queries change slightly, but the recursive CTE structure stays the same.

Performance Tips

For large graphs (10,000+ nodes):

Add depth limits to prevent runaway recursion:
```
WHERE depth < 20  -- max depth
```

Index the parent_id column (we already did this):

CREATE INDEX idx_supply_chain_parent ON supply_chain(parent_id);

Materialize common paths if you query them frequently:

CREATE MATERIALIZED VIEW common_paths AS
WITH RECURSIVE ... ;

For pure trees (no cycles), you don't need cycle detection. But if your graph can have cycles (like if stores can transfer to each other), add:
```
WHERE NOT c.id = ANY(path_array)
```

Try It Yourself

Head over to SQLBook.io, spin up a PostgreSQL sandbox, and paste in the schema and queries above.

Challenges:

Add a new warehouse between West Coast Distribution and Bay Area Warehouse. How do the paths change?
Find the average number of hops from manufacturer to all stores
Identify which distributor serves the most customers (count downstream customers)
Calculate total transit time from factory to each store (sum up avg_transit_days)
Find stores that can be supplied by multiple paths (redundancy check)

Advanced:

What if a product can take multiple paths? (Add alternate routes and find shortest path)
How would you handle cycles in the graph? (What if stores can ship to each other?)
Can you generate a Graphviz DOT file instead of Mermaid?

Share your solutions!

Tags: #SQL #RecursiveCTE #GraphTraversal #SupplyChain #PostgreSQL #TreeStructures

Difficulty: Intermediate-Advanced
Topics: Recursive CTEs, Graph Traversal, Tree Structures, Path Finding