A few years back, I was analyzing network traffic trying to figure out which servers were talking to each other the most. Millions of flow records. The boss wanted a visual.
I ended up building a flow diagram, you know, those charts that show connections between nodes. The hard part wasn't the visualization, it was the SQL to aggregate source→target→value tuples.
Here's what the end result looked like:
The pattern? Simple:
SELECT source, target, SUM(value) GROUP BY source, targetROW_NUMBER() OVER (ORDER BY value DESC)That same pattern works for any source→target→value data. Network traffic, supply chains, email threads, API calls...
But the most useful application I've found? Tracking where my money actually goes.
You have multiple bank accounts, credit cards, and investment accounts. Money flows between them all the time:
You want to visualize: Where is my money actually going?
DROP TABLE IF EXISTS transactions;
CREATE TABLE transactions (
id SERIAL PRIMARY KEY,
from_account TEXT NOT NULL,
to_account TEXT NOT NULL,
amount NUMERIC(10,2) NOT NULL,
category TEXT,
transaction_date DATE NOT NULL,
description TEXT
);
-- Sample data
INSERT INTO transactions (from_account, to_account, amount, category, transaction_date, description) VALUES
('Paycheck', 'Checking', 3500.00, 'Income', '2025-03-01', 'Salary deposit'),
('Checking', 'Savings', 500.00, 'Transfer', '2025-03-02', 'Monthly savings'),
('Checking', 'Credit Card', 1200.00, 'Payment', '2025-03-05', 'CC payment'),
('Credit Card', 'Groceries', 450.00, 'Expense', '2025-03-01', 'Whole Foods'),
('Credit Card', 'Restaurants', 280.00, 'Expense', '2025-03-03', 'Various dining'),
('Credit Card', 'Gas', 120.00, 'Expense', '2025-03-04', 'Shell station'),
('Checking', 'Rent', 1500.00, 'Expense', '2025-03-01', 'Monthly rent'),
('Checking', 'Utilities', 150.00, 'Expense', '2025-03-10', 'Electric + Internet'),
('Checking', 'Entertainment', 200.00, 'Expense', '2025-03-15', 'Movies + Games'),
('Investment', 'Checking', 45.00, 'Income', '2025-03-20', 'Dividend payment'),
('Paycheck', 'Checking', 3500.00, 'Income', '2025-03-15', 'Salary deposit'),
('Checking', 'Savings', 500.00, 'Transfer', '2025-03-16', 'Monthly savings'),
('Credit Card', 'Shopping', 340.00, 'Expense', '2025-03-12', 'Amazon orders'),
('Checking', 'Healthcare', 75.00, 'Expense', '2025-03-08', 'Doctor copay');
Same pattern, different domain:
WITH money_flows AS (
-- Step 1: Aggregate flows between accounts
SELECT
from_account AS source,
to_account AS target,
SUM(amount) AS total_amount,
COUNT(*) AS num_transactions
FROM transactions
WHERE transaction_date >= '2025-03-01'
AND transaction_date < '2025-04-01'
GROUP BY from_account, to_account
),
ranked_flows AS (
-- Step 2: Rank by dollar amount
SELECT
source,
target,
total_amount,
num_transactions,
ROW_NUMBER() OVER (ORDER BY total_amount DESC) AS rank
FROM money_flows
),
top_flows AS (
-- Step 3: Get top 15 flows
SELECT
source,
target,
total_amount AS value,
num_transactions
FROM ranked_flows
WHERE rank <= 15
)
-- Step 4: Format for visualization
SELECT
source,
target,
value,
'$' || ROUND(value, 2) AS formatted_value,
num_transactions
FROM top_flows
ORDER BY value DESC;
Output:
source | target | value | formatted_value | num_transactions
--------------|----------------|---------|-----------------|------------------
Paycheck | Checking | 7000.00 | $7000.00 | 2
Rent | Checking | 1500.00 | $1500.00 | 1
Credit Card | Checking | 1200.00 | $1200.00 | 1
Checking | Savings | 1000.00 | $1000.00 | 2
Groceries | Credit Card | 450.00 | $450.00 | 1
Shopping | Credit Card | 340.00 | $340.00 | 1
Restaurants | Credit Card | 280.00 | $280.00 | 1
Entertainment | Checking | 200.00 | $200.00 | 1
Utilities | Checking | 150.00 | $150.00 | 1
Gas | Credit Card | 120.00 | $120.00 | 1
Healthcare | Checking | 75.00 | $75.00 | 1
Dividend | Investment | 45.00 | $45.00 | 1
Visualization:
And just like that, you can see where your money is flowing!
Notice the pattern here? Whether it's:
src_ip → dst_ip with bytesfrom_account → to_account with amountwarehouse → store with unitssender → recipient with message_countservice_a → service_b with request_countFrom my experience building these:
Always limit to top N flows
Consider bidirectional flows
Add human-readable labels
Filter by time window
Watch for self-loops
WHERE source != target to exclude same-account transfersWant to get fancy? Show flows at multiple levels:
-- Budget example: Income → Checking → Categories → Specific Expenses
WITH level1_flows AS (
-- Income sources → Checking
SELECT 'Income' AS source, 'Checking' AS target, SUM(amount) AS value
FROM transactions
WHERE category = 'Income'
),
level2_flows AS (
-- Checking → Categories
SELECT 'Checking' AS source, category AS target, SUM(amount) AS value
FROM transactions
WHERE from_account = 'Checking' AND category != 'Transfer'
GROUP BY category
),
level3_flows AS (
-- Categories → Specific accounts
SELECT category AS source, to_account AS target, SUM(amount) AS value
FROM transactions
WHERE from_account = 'Checking' AND category NOT IN ('Income', 'Transfer')
GROUP BY category, to_account
)
SELECT * FROM level1_flows
UNION ALL
SELECT * FROM level2_flows
UNION ALL
SELECT * FROM level3_flows
ORDER BY value DESC;
This creates a multi-tier flow:
Income → Checking → [Rent, Groceries, Entertainment] → [Landlord, Whole Foods, Netflix]
Want to build your own Sankey flow query? Here's the template:
Identify your source and target
Define your metric
Aggregate the flows
SELECT source, target, SUM(metric) AS value
FROM table
GROUP BY source, target
Rank and filter
WHERE rank <= 10
ORDER BY value DESC
Format for your visualization tool
This pattern saved my ass multiple times:
The SQL is straightforward once you see the pattern. The hard part is recognizing when you have a flow problem in the first place.
Next time someone asks "where's all this [traffic/money/data] going?" — you'll know exactly what query to write.
Try it on SQLBook: Head over to sqlbook.io, or fork this book to spin up a PostgreSQL sandbox, and paste in the budget tracking example. Modify it for your own use case. See if you can generate valid Mermaid syntax that renders a flow diagram.
Challenge: Can you modify the query to:
Drop your solutions in the comments!
Tags: #SQL #DataVisualization #FlowDiagrams #NetworkAnalysis #PostgreSQL #Budgeting #FlowAnalysis
Difficulty: Intermediate
Topics: Aggregation, Window Functions, Data Visualization, CTEs
SQL Output