SQL for Flow Diagrams: Visualizing Money Flows Between Accounts

The Backstory

A few years back, I was analyzing network traffic trying to figure out which servers were talking to each other the most. Millions of flow records. The boss wanted a visual.

I ended up building a flow diagram, you know, those charts that show connections between nodes. The hard part wasn't the visualization, it was the SQL to aggregate source→target→value tuples.

Here's what the end result looked like:

graph LR Server_A[Server A] -->|25.3GB| Database[Database Server] Server_B[Server B] -->|18.7GB| Database[Database Server] Server_C[Server C] -->|12.4GB| API_Gateway[API Gateway] API_Gateway[API Gateway] -->|8.9GB| Server_A[Server A]

The pattern? Simple:

Aggregate flows: SELECT source, target, SUM(value) GROUP BY source, target
Rank by volume: ROW_NUMBER() OVER (ORDER BY value DESC)
Take top N flows
Generate visualization syntax

That same pattern works for any source→target→value data. Network traffic, supply chains, email threads, API calls...

But the most useful application I've found? Tracking where my money actually goes.

The Real Problem: Budget Tracking

The Budgeting Problem

You have multiple bank accounts, credit cards, and investment accounts. Money flows between them all the time:

Paycheck → Checking
Checking → Savings
Checking → Credit Card (payment)
Credit Card → Expenses
Investment Account → Checking (dividend)

You want to visualize: Where is my money actually going?

The Schema

DROP TABLE IF EXISTS transactions;
CREATE TABLE transactions (
  id SERIAL PRIMARY KEY,
  from_account TEXT NOT NULL,
  to_account TEXT NOT NULL,
  amount NUMERIC(10,2) NOT NULL,
  category TEXT,
  transaction_date DATE NOT NULL,
  description TEXT
);

-- Sample data
INSERT INTO transactions (from_account, to_account, amount, category, transaction_date, description) VALUES
  ('Paycheck', 'Checking', 3500.00, 'Income', '2025-03-01', 'Salary deposit'),
  ('Checking', 'Savings', 500.00, 'Transfer', '2025-03-02', 'Monthly savings'),
  ('Checking', 'Credit Card', 1200.00, 'Payment', '2025-03-05', 'CC payment'),
  ('Credit Card', 'Groceries', 450.00, 'Expense', '2025-03-01', 'Whole Foods'),
  ('Credit Card', 'Restaurants', 280.00, 'Expense', '2025-03-03', 'Various dining'),
  ('Credit Card', 'Gas', 120.00, 'Expense', '2025-03-04', 'Shell station'),
  ('Checking', 'Rent', 1500.00, 'Expense', '2025-03-01', 'Monthly rent'),
  ('Checking', 'Utilities', 150.00, 'Expense', '2025-03-10', 'Electric + Internet'),
  ('Checking', 'Entertainment', 200.00, 'Expense', '2025-03-15', 'Movies + Games'),
  ('Investment', 'Checking', 45.00, 'Income', '2025-03-20', 'Dividend payment'),
  ('Paycheck', 'Checking', 3500.00, 'Income', '2025-03-15', 'Salary deposit'),
  ('Checking', 'Savings', 500.00, 'Transfer', '2025-03-16', 'Monthly savings'),
  ('Credit Card', 'Shopping', 340.00, 'Expense', '2025-03-12', 'Amazon orders'),
  ('Checking', 'Healthcare', 75.00, 'Expense', '2025-03-08', 'Doctor copay');

The Money Flow Query

Same pattern, different domain:

WITH money_flows AS (
  -- Step 1: Aggregate flows between accounts
  SELECT 
    from_account AS source,
    to_account AS target,
    SUM(amount) AS total_amount,
    COUNT(*) AS num_transactions
  FROM transactions
  WHERE transaction_date >= '2025-03-01' 
    AND transaction_date < '2025-04-01'
  GROUP BY from_account, to_account
),
ranked_flows AS (
  -- Step 2: Rank by dollar amount
  SELECT 
    source,
    target,
    total_amount,
    num_transactions,
    ROW_NUMBER() OVER (ORDER BY total_amount DESC) AS rank
  FROM money_flows
),
top_flows AS (
  -- Step 3: Get top 15 flows
  SELECT 
    source,
    target,
    total_amount AS value,
    num_transactions
  FROM ranked_flows
  WHERE rank <= 15
)
-- Step 4: Format for visualization
SELECT 
  source,
  target,
  value,
  '$' || ROUND(value, 2) AS formatted_value,
  num_transactions
FROM top_flows
ORDER BY value DESC;

Output:

source        | target         | value   | formatted_value | num_transactions
--------------|----------------|---------|-----------------|------------------
Paycheck      | Checking       | 7000.00 | $7000.00        | 2
Rent          | Checking       | 1500.00 | $1500.00        | 1
Credit Card   | Checking       | 1200.00 | $1200.00        | 1
Checking      | Savings        | 1000.00 | $1000.00        | 2
Groceries     | Credit Card    | 450.00  | $450.00         | 1
Shopping      | Credit Card    | 340.00  | $340.00         | 1
Restaurants   | Credit Card    | 280.00  | $280.00         | 1
Entertainment | Checking       | 200.00  | $200.00         | 1
Utilities     | Checking       | 150.00  | $150.00         | 1
Gas           | Credit Card    | 120.00  | $120.00         | 1
Healthcare    | Checking       | 75.00   | $75.00          | 1
Dividend      | Investment     | 45.00   | $45.00          | 1

Visualization:

graph LR Paycheck[Paycheck] -->|$7000| Checking[Checking] Checking[Checking] -->|$1500| Rent[Rent] Checking[Checking] -->|$1200| Credit_Card[Credit Card] Checking[Checking] -->|$1000| Savings[Savings] Credit_Card[Credit Card] -->|$450| Groceries[Groceries] Credit_Card[Credit Card] -->|$340| Shopping[Shopping] Credit_Card[Credit Card] -->|$280| Restaurants[Restaurants] Checking[Checking] -->|$200| Entertainment[Entertainment] Checking[Checking] -->|$150| Utilities[Utilities] Credit_Card[Credit Card] -->|$120| Gas[Gas] Checking[Checking] -->|$75| Healthcare[Healthcare] Investment[Investment] -->|$45| Checking[Checking]

And just like that, you can see where your money is flowing!

The Pattern

Notice the pattern here? Whether it's:

Network traffic: src_ip → dst_ip with bytes
Money flows: from_account → to_account with amount
Supply chain: warehouse → store with units
Email threads: sender → recipient with message_count
API calls: service_a → service_b with request_count

Real-World Tips

From my experience building these:

Always limit to top N flows
- Sankey diagrams get messy with 100+ nodes
- Top 10-20 flows usually tell the story
Consider bidirectional flows
- Networks: A→B and B→A might be the same conversation
- Money: They're definitely different (income vs. expense)
Add human-readable labels
- Convert IPs to hostnames
- Add port numbers → service names (443 → HTTPS)
- Format large numbers (7523234 bytes → 7.2 MB)
Filter by time window
- "Last 24 hours" or "This month"
- Rolling windows are more useful than all-time
Watch for self-loops
- WHERE source != target to exclude same-account transfers
- Unless that's interesting data (servers talking to themselves)

Advanced: Multi-Level Flow Diagrams

Want to get fancy? Show flows at multiple levels:

-- Budget example: Income → Checking → Categories → Specific Expenses
WITH level1_flows AS (
  -- Income sources → Checking
  SELECT 'Income' AS source, 'Checking' AS target, SUM(amount) AS value
  FROM transactions
  WHERE category = 'Income'
),
level2_flows AS (
  -- Checking → Categories
  SELECT 'Checking' AS source, category AS target, SUM(amount) AS value
  FROM transactions
  WHERE from_account = 'Checking' AND category != 'Transfer'
  GROUP BY category
),
level3_flows AS (
  -- Categories → Specific accounts
  SELECT category AS source, to_account AS target, SUM(amount) AS value
  FROM transactions
  WHERE from_account = 'Checking' AND category NOT IN ('Income', 'Transfer')
  GROUP BY category, to_account
)
SELECT * FROM level1_flows
UNION ALL
SELECT * FROM level2_flows
UNION ALL
SELECT * FROM level3_flows
ORDER BY value DESC;

This creates a multi-tier flow:

Income → Checking → [Rent, Groceries, Entertainment] → [Landlord, Whole Foods, Netflix]

Try It Yourself

Want to build your own Sankey flow query? Here's the template:

Identify your source and target
- What entities are connected?
- What direction does the flow go?
Define your metric
- What are you measuring? (bytes, dollars, count, etc.)

Aggregate the flows

SELECT source, target, SUM(metric) AS value
FROM table
GROUP BY source, target

Rank and filter
```
WHERE rank <= 10
ORDER BY value DESC
```
Format for your visualization tool
- Mermaid syntax
- D3.js JSON
- Plotly format
- Whatever your frontend expects

Wrapping Up

This pattern saved my ass multiple times:

Network troubleshooting (identifying top talkers)
Budget analysis (personal finance)
Application performance (microservice calls)
Customer journey mapping (clickstream data)
Supply chain visibility (logistics data)

The SQL is straightforward once you see the pattern. The hard part is recognizing when you have a flow problem in the first place.

Next time someone asks "where's all this [traffic/money/data] going?" — you'll know exactly what query to write.

Try it on SQLBook: Head over to sqlbook.io, or fork this book to spin up a PostgreSQL sandbox, and paste in the budget tracking example. Modify it for your own use case. See if you can generate valid Mermaid syntax that renders a flow diagram.

Challenge: Can you modify the query to:

Show only flows above a certain threshold?
Include percentage of total for each flow?
Detect circular flows (A→B→C→A)?
Calculate the "efficiency" of money flows (how many hops from income to final expense)?

Drop your solutions in the comments!

Tags: #SQL #DataVisualization #FlowDiagrams #NetworkAnalysis #PostgreSQL #Budgeting #FlowAnalysis

Difficulty: Intermediate
Topics: Aggregation, Window Functions, Data Visualization, CTEs