I stumbled across the Nobel Prize API and couldn't resist pulling the data. It's free, well-structured, and spans over a century of achievements across science, literature, and peace.
The Nobel data has quirks that make it perfect for learning SQL: prizes can be shared. Marie Curie won twice. Some years have no prize in certain categories. The data is messy in exactly the ways real-world data is messy.
Let's dig in.
The data lives in three tables in SQL tab. Switch to that tab and run the SQL to setup the tables and data.
nobel_prizes
├── nobel_prizes_pk (int, primary key)
├── year (text)
├── category (text)
└── overallMotivation (text)
nobel_prizes_laureates
├── nobel_prizes_laureates_pk (int, primary key)
├── nobel_prizes_fk (int, foreign key → nobel_prizes)
├── id (text) - laureate ID from the API
├── firstname (text)
├── surname (text)
├── motivation (text)
└── share (text) - "1", "2", or "3"
Key relationships:
nobel_prizes row is one prize in one category for one yearnobel_prizes_laureates row is one winnershare column tells you how the prize money was split (1 = sole, 2 = halved, 3 = thirds)Start simple. Who won in 1977?
SELECT
p.year,
p.category,
l.firstname,
l.surname,
l.motivation
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
WHERE p.year = '1977'
ORDER BY p.category, l.surname;
Output:
year | category | firstname | surname | motivation
-----|------------|--------------|--------------|------------------------------------------
1977 | chemistry | Ilya | Prigogine | for contributions to non-equilibrium...
1977 | economics | Bertil | Ohlin | for pioneering contribution to the...
1977 | economics | James | Meade | for pioneering contribution to the...
1977 | literature | Vicente | Aleixandre | for a creative poetic writing which...
1977 | medicine | Roger | Guillemin | for discoveries concerning peptide...
1977 | medicine | Andrew | Schally | for discoveries concerning peptide...
1977 | medicine | Rosalyn | Yalow | for development of radioimmunoassays...
1977 | peace | Amnesty | International| (organization)
1977 | physics | Philip | Anderson | for fundamental theoretical...
1977 | physics | John | Van Vleck | for fundamental theoretical...
1977 | physics | Nevill | Mott | for fundamental theoretical...
What this shows: 1977 had shared prizes in economics (2 winners), medicine (3 winners), and physics (3 winners). Literature and peace had sole winners.
Which category has awarded the most prizes?
SELECT
category,
COUNT(*) AS total_prizes
FROM nobel_prizes
GROUP BY category
ORDER BY total_prizes DESC;
Output:
| category | total_prizes |
|---|---|
| peace | 124 |
| medicine | 124 |
| physics | 124 |
| chemistry | 124 |
| literature | 124 |
| economics | 56 |
Economics has fewer prizes because it started in 1969 (the others started in 1901).
How often are prizes shared? Let's count by share type:
SELECT
p.category,
l.share,
COUNT(*) AS laureate_count
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
GROUP BY p.category, l.share
ORDER BY p.category, l.share;
Output:
category | share | laureate_count
-----------|-------|---------------
chemistry | 1 | 63
chemistry | 2 | 59
chemistry | 3 | 57
...
Literature is almost always a solo prize. Physics and chemistry are frequently shared three ways.
The peace prize is sometimes not awarded (during wars, or when no suitable candidate exists):
SELECT DISTINCT year
FROM nobel_prizes
WHERE year NOT IN (
SELECT DISTINCT year
FROM nobel_prizes
WHERE category = 'peace'
)
ORDER BY year;
Or using a LEFT JOIN approach:
SELECT DISTINCT p1.year
FROM nobel_prizes p1
LEFT JOIN nobel_prizes p2 ON p1.year = p2.year AND p2.category = 'peace'
WHERE p2.nobel_prizes_pk IS NULL
ORDER BY p1.year;
Output includes: 1914, 1915, 1916, 1918, 1939, 1940, 1941, 1942, 1943 (war years, as expected).
Who won more than once? This is rare — only a handful of people have done it:
SELECT
l.firstname,
l.surname,
COUNT(*) AS win_count,
STRING_AGG(p.category || ' (' || p.year || ')', ', ' ORDER BY p.year) AS prizes
FROM nobel_prizes_laureates l
INNER JOIN nobel_prizes p ON l.nobel_prizes_fk = p.nobel_prizes_pk
WHERE l.surname IS NOT NULL -- Exclude organizations
GROUP BY l.firstname, l.surname
HAVING COUNT(*) > 1
ORDER BY win_count DESC, l.surname;
Output:
firstname | surname | win_count | prizes
-----------|-------------------|-----------|----------------------------------
Marie | Curie | 2 | physics (1903), chemistry (1911)
Linus | Pauling | 2 | chemistry (1954), peace (1962)
John | Bardeen | 2 | physics (1956), physics (1972)
Frederick | Sanger | 2 | chemistry (1958), chemistry (1980)
Marie Curie won in two different categories. John Bardeen and Frederick Sanger each won twice in the same category.
Are prizes getting more shared over time? Let's look at average share by decade:
SELECT
(p.year::INTEGER / 10) * 10 AS decade,
ROUND(AVG(l.share::INTEGER), 2) AS avg_share,
COUNT(*) AS total_laureates
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
WHERE p.category IN ('physics', 'chemistry', 'medicine') -- Science prizes only
GROUP BY (p.year::INTEGER / 10) * 10
ORDER BY decade;
Output:
decade | avg_share | total_laureates
-------|-----------|----------------
1900 | 1.23 | 26
1910 | 1.31 | 23
1920 | 1.35 | 29
...
1990 | 2.21 | 67
2000 | 2.45 | 74
2010 | 2.51 | 78
2020 | 2.67 | 32
Science prizes are increasingly shared. In the early 1900s, most were solo. Now, three-way splits are common.
The key to this dataset is understanding the one-to-many relationship:
nobel_prizes (1) ──────< nobel_prizes_laureates (many)
│ │
│ nobel_prizes_pk │ nobel_prizes_fk
└────────────────────────────┘
INNER JOIN returns only prizes that have laureates:
FROM nobel_prizes p
INNER JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
LEFT JOIN would include prizes with no laureates (useful for finding gaps):
FROM nobel_prizes p
LEFT JOIN nobel_prizes_laureates l ON p.nobel_prizes_pk = l.nobel_prizes_fk
WHERE l.nobel_prizes_laureates_pk IS NULL -- Prizes with no laureate records
Goal: Find which category has the highest percentage of shared prizes (share > '1').
Expected output:
category | total_prizes | shared_prizes | shared_pct
----------|--------------|---------------|----------
physics | 117 | 98 | 83.8%
medicine | 114 | 89 | 78.1%
...
Hints:
share::INTEGER > 1Pseudocode:
WITH prize_stats AS (
-- For each prize, determine if it was shared
SELECT prize_id, category, MAX(share::INTEGER) > 1 AS is_shared
FROM ...
)
SELECT
category,
COUNT(*) AS total,
SUM(is_shared::INTEGER) AS shared,
percentage calculation
FROM prize_stats
GROUP BY category
Goal: The laureate data includes country information. Find which countries have won the most prizes by category.
Expected output:
category | country | prize_count
-----------|----------------|------------
physics | United States | 94
physics | United Kingdom | 25
physics | Germany | 24
chemistry | United States | 72
...
Hints:
ROW_NUMBER() to rank countries within each categoryPseudocode:
WITH country_counts AS (
-- Count prizes per category per country
),
ranked AS (
-- ROW_NUMBER() OVER (PARTITION BY category ORDER BY count DESC)
)
SELECT * FROM ranked WHERE rank <= 3
Goal: Find years where a category had no prize awarded (not just peace — any category).
Expected output:
year | missing_categories
-----|-------------------
1940 | physics, chemistry, medicine, literature, peace
1941 | physics, chemistry, medicine, literature, peace
1942 | physics, chemistry, medicine, literature, peace
...
Hints:
This same pattern — parent table with child records, aggregation across relationships — appears everywhere:
The Nobel dataset is small enough to understand but complex enough to demonstrate real patterns.
For large historical datasets:
CREATE INDEX idx_laureates_fk ON nobel_prizes_laureates(nobel_prizes_fk);
CREATE INDEX idx_prizes_year_cat ON nobel_prizes(year, category);
CREATE INDEX idx_laureates_covering ON nobel_prizes_laureates(nobel_prizes_fk, firstname, surname, share);
Now the JOIN can be satisfied entirely from the index without hitting the table.
Load the Nobel Prize data and try these:
Challenges:
Advanced:
Tags: #SQL #JOINs #Aggregation #NobelPrize #HistoricalData
Difficulty: Beginner to Intermediate
Topics: INNER JOIN, GROUP BY, Aggregation Functions, Subqueries, Window Functions
SQL Output