Use desktop for interactive SQL sandbox

Querying Nobel Prize Data: JOINs, Aggregations, and Historical Analysis

The Backstory

I stumbled across the Nobel Prize API and couldn't resist pulling the data. It's free, well-structured, and spans over a century of achievements across science, literature, and peace.

The Nobel data has quirks that make it perfect for learning SQL: prizes can be shared. Marie Curie won twice. Some years have no prize in certain categories. The data is messy in exactly the ways real-world data is messy.

Let's dig in.

The Schema

The data lives in three tables in SQL tab. Switch to that tab and run the SQL to setup the tables and data.

nobel_prizes
├── nobel_prizes_pk (int, primary key)
├── year (text)
├── category (text)
└── overallMotivation (text)

nobel_prizes_laureates
├── nobel_prizes_laureates_pk (int, primary key)
├── nobel_prizes_fk (int, foreign key → nobel_prizes)
├── id (text) - laureate ID from the API
├── firstname (text)
├── surname (text)
├── motivation (text)
└── share (text) - "1", "2", or "3"

Key relationships:

  • Each nobel_prizes row is one prize in one category for one year
  • Each nobel_prizes_laureates row is one winner
  • The share column tells you how the prize money was split (1 = sole, 2 = halved, 3 = thirds)

Query 1: Find All Winners for a Specific Year

Start simple. Who won in 1977?

SELECT
  p.year,
  p.category,
  l.firstname,
  l.surname,
  l.motivation
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
WHERE p.year = '1977'
ORDER BY p.category, l.surname;

Output:

year | category   | firstname    | surname      | motivation
-----|------------|--------------|--------------|------------------------------------------
1977 | chemistry  | Ilya         | Prigogine    | for contributions to non-equilibrium...
1977 | economics  | Bertil       | Ohlin        | for pioneering contribution to the...
1977 | economics  | James        | Meade        | for pioneering contribution to the...
1977 | literature | Vicente      | Aleixandre   | for a creative poetic writing which...
1977 | medicine   | Roger        | Guillemin    | for discoveries concerning peptide...
1977 | medicine   | Andrew       | Schally      | for discoveries concerning peptide...
1977 | medicine   | Rosalyn      | Yalow        | for development of radioimmunoassays...
1977 | peace      | Amnesty      | International| (organization)
1977 | physics    | Philip       | Anderson     | for fundamental theoretical...
1977 | physics    | John         | Van Vleck    | for fundamental theoretical...
1977 | physics    | Nevill       | Mott         | for fundamental theoretical...

What this shows: 1977 had shared prizes in economics (2 winners), medicine (3 winners), and physics (3 winners). Literature and peace had sole winners.

Query 2: Count Prizes by Category

Which category has awarded the most prizes?

SELECT
  category,
  COUNT(*) AS total_prizes
FROM nobel_prizes
GROUP BY category
ORDER BY total_prizes DESC;

Output:

category total_prizes
peace 124
medicine 124
physics 124
chemistry 124
literature 124
economics 56

Economics has fewer prizes because it started in 1969 (the others started in 1901).

Query 3: Find Shared vs. Solo Prizes

How often are prizes shared? Let's count by share type:

SELECT
  p.category,
  l.share,
  COUNT(*) AS laureate_count
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
GROUP BY p.category, l.share
ORDER BY p.category, l.share;

Output:

category   | share | laureate_count
-----------|-------|---------------
chemistry  | 1     | 63
chemistry  | 2     | 59
chemistry  | 3     | 57
...

Literature is almost always a solo prize. Physics and chemistry are frequently shared three ways.

Query 4: Which Years Had No Peace Prize?

The peace prize is sometimes not awarded (during wars, or when no suitable candidate exists):

SELECT DISTINCT year
FROM nobel_prizes
WHERE year NOT IN (
  SELECT DISTINCT year
  FROM nobel_prizes
  WHERE category = 'peace'
)
ORDER BY year;

Or using a LEFT JOIN approach:

SELECT DISTINCT p1.year
FROM nobel_prizes p1
LEFT JOIN nobel_prizes p2 ON p1.year = p2.year AND p2.category = 'peace'
WHERE p2.nobel_prizes_pk IS NULL
ORDER BY p1.year;

Output includes: 1914, 1915, 1916, 1918, 1939, 1940, 1941, 1942, 1943 (war years, as expected).

Query 5: Multiple-Time Winners

Who won more than once? This is rare — only a handful of people have done it:

SELECT
  l.firstname,
  l.surname,
  COUNT(*) AS win_count,
  STRING_AGG(p.category || ' (' || p.year || ')', ', ' ORDER BY p.year) AS prizes
FROM nobel_prizes_laureates l
INNER JOIN nobel_prizes p ON l.nobel_prizes_fk = p.nobel_prizes_pk
WHERE l.surname IS NOT NULL  -- Exclude organizations
GROUP BY l.firstname, l.surname
HAVING COUNT(*) > 1
ORDER BY win_count DESC, l.surname;

Output:

firstname  | surname           | win_count | prizes
-----------|-------------------|-----------|----------------------------------
Marie      | Curie             | 2         | physics (1903), chemistry (1911)
Linus      | Pauling           | 2         | chemistry (1954), peace (1962)
John       | Bardeen           | 2         | physics (1956), physics (1972)
Frederick  | Sanger            | 2         | chemistry (1958), chemistry (1980)

Marie Curie won in two different categories. John Bardeen and Frederick Sanger each won twice in the same category.

Query 6: Prize Trends Over Decades

Are prizes getting more shared over time? Let's look at average share by decade:

SELECT
  (p.year::INTEGER / 10) * 10 AS decade,
  ROUND(AVG(l.share::INTEGER), 2) AS avg_share,
  COUNT(*) AS total_laureates
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
WHERE p.category IN ('physics', 'chemistry', 'medicine')  -- Science prizes only
GROUP BY (p.year::INTEGER / 10) * 10
ORDER BY decade;

Output:

decade | avg_share | total_laureates
-------|-----------|----------------
1900   | 1.23      | 26
1910   | 1.31      | 23
1920   | 1.35      | 29
...
1990   | 2.21      | 67
2000   | 2.45      | 74
2010   | 2.51      | 78
2020   | 2.67      | 32

Science prizes are increasingly shared. In the early 1900s, most were solo. Now, three-way splits are common.

Breaking Down the JOINs

The key to this dataset is understanding the one-to-many relationship:

nobel_prizes (1) ──────< nobel_prizes_laureates (many)
     │                            │
     │ nobel_prizes_pk            │ nobel_prizes_fk
     └────────────────────────────┘

INNER JOIN returns only prizes that have laureates:

FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk

LEFT JOIN would include prizes with no laureates (useful for finding gaps):

FROM nobel_prizes p
LEFT JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
WHERE l.nobel_prizes_laureates_pk IS NULL  -- Prizes with no laureate records

Challenge 1: Most Shared Category

Goal: Find which category has the highest percentage of shared prizes (share > '1').

Expected output:

category  | total_prizes | shared_prizes | shared_pct
----------|--------------|---------------|----------
physics   | 117          | 98            | 83.8%
medicine  | 114          | 89            | 78.1%
...

Hints:

  1. Count total prizes per category
  2. Count prizes where at least one laureate has share::INTEGER > 1
  3. Calculate the percentage
  4. Use a subquery or CTE to get distinct prize counts

Pseudocode:

WITH prize_stats AS (
  -- For each prize, determine if it was shared
  SELECT prize_id, category, MAX(share::INTEGER) > 1 AS is_shared
  FROM ...
)
SELECT
  category,
  COUNT(*) AS total,
  SUM(is_shared::INTEGER) AS shared,
  percentage calculation
FROM prize_stats
GROUP BY category

Challenge 2: Country Analysis

Goal: The laureate data includes country information. Find which countries have won the most prizes by category.

Expected output:

category   | country        | prize_count
-----------|----------------|------------
physics    | United States  | 94
physics    | United Kingdom | 25
physics    | Germany        | 24
chemistry  | United States  | 72
...

Hints:

  1. Join prizes to laureates
  2. Group by category and country
  3. Use ROW_NUMBER() to rank countries within each category
  4. Filter to top 3 per category

Pseudocode:

WITH country_counts AS (
  -- Count prizes per category per country
),
ranked AS (
  -- ROW_NUMBER() OVER (PARTITION BY category ORDER BY count DESC)
)
SELECT * FROM ranked WHERE rank <= 3

Challenge 3: Prize Gaps

Goal: Find years where a category had no prize awarded (not just peace — any category).

Expected output:

year | missing_categories
-----|-------------------
1940 | physics, chemistry, medicine, literature, peace
1941 | physics, chemistry, medicine, literature, peace
1942 | physics, chemistry, medicine, literature, peace
...

Hints:

  1. Generate all year-category combinations that should exist
  2. LEFT JOIN to actual prizes
  3. Filter for NULLs
  4. Aggregate missing categories per year

Real-World Applications

This same pattern — parent table with child records, aggregation across relationships — appears everywhere:

  1. E-commerce: Orders → Order Items (who bought the most items?)
  2. Education: Courses → Enrollments (which course has the most students?)
  3. Healthcare: Patients → Visits (who has the most appointments?)
  4. HR: Departments → Employees (which department has highest headcount?)
  5. Publishing: Authors → Books (who's the most prolific?)

The Nobel dataset is small enough to understand but complex enough to demonstrate real patterns.

Performance Tips

For large historical datasets:

  1. Index the foreign key:
CREATE INDEX idx_laureates_fk ON nobel_prizes_laureates(nobel_prizes_fk);
  1. Index commonly filtered columns:
CREATE INDEX idx_prizes_year_cat ON nobel_prizes(year, category);
  1. Use covering indexes for frequent queries:
CREATE INDEX idx_laureates_covering ON nobel_prizes_laureates(nobel_prizes_fk, firstname, surname, share);

Now the JOIN can be satisfied entirely from the index without hitting the table.

Try It Yourself

Load the Nobel Prize data and try these:

Challenges:

  1. Find the youngest and oldest laureate in each category (requires birth date data)
  2. Which decade had the most peace prizes withheld?
  3. Find laureates who won in the same year as someone with the same surname (family members?)
  4. Calculate the running total of prizes per category over time

Advanced:

  • What's the average time between a discovery and the prize? (requires additional research data)
  • Build a query that finds "prize streaks" — consecutive years where a country won in a category
  • Create a pivot table showing prizes per category per decade

Tags: #SQL #JOINs #Aggregation #NobelPrize #HistoricalData

Difficulty: Beginner to Intermediate
Topics: INNER JOIN, GROUP BY, Aggregation Functions, Subqueries, Window Functions

SQL Output

Getting Started
1Switch to a SQL tab and select a database
2Write or paste your SQL in the editor
3Press Alt+Enter to run
Tip:Select text in the editor to run only that portion
Nobel Prizes Updated 2/3/26, 1:04 PM
Notes SQL Tab-3
Alt+Enter to run