A small business has been tracking orders in a single spreadsheet-style table. Every row duplicates customer details, product details, and shipping information. The table has grown to thousands of rows and the team is running into problems: a customer's address was updated in some rows but not others, a product's price was changed but old orders now show the wrong historical price, and deleting the only order for a product loses the product information entirely.
These are classic symptoms of a denormalized schema -- update anomalies, inconsistencies, and data loss on deletion.
Normalization is the process of organizing a database to reduce redundancy and prevent anomalies. The key concept is functional dependency -- if knowing column A always tells you column B, then B is functionally dependent on A.
The normal forms build on each other:
A useful mnemonic: every non-key column must provide a fact about "the key, the whole key, and nothing but the key."
This pattern applies in almost every database design scenario:
Knowing when to normalize (OLTP, transactional systems) and when to intentionally denormalize (reporting, read-heavy analytics) is a fundamental design skill.
Here is the denormalized table the business is currently using:
CREATE TABLE flat_orders (
order_id INT,
order_date DATE,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
customer_city VARCHAR(50),
customer_state VARCHAR(2),
product_name VARCHAR(100),
product_category VARCHAR(50),
product_unit_price DECIMAL(10,2),
quantity INT,
shipping_method VARCHAR(20),
shipping_cost DECIMAL(10,2)
);
INSERT INTO flat_orders VALUES
(1001, '2024-01-15', 'Alice Smith', 'alice@email.com', 'Portland', 'OR', 'Mechanical Keyboard', 'Electronics', 89.99, 1, 'Standard', 5.99),
(1001, '2024-01-15', 'Alice Smith', 'alice@email.com', 'Portland', 'OR', 'USB-C Cable', 'Electronics', 12.99, 2, 'Standard', 5.99),
(1002, '2024-01-18', 'Bob Jones', 'bob@email.com', 'Seattle', 'WA', 'Mechanical Keyboard', 'Electronics', 89.99, 1, 'Express', 12.99),
(1003, '2024-01-20', 'Alice Smith', 'alice@email.com', 'Portland', 'OR', 'Desk Lamp', 'Office', 45.00, 1, 'Standard', 5.99),
(1004, '2024-01-22', 'Charlie Davis', 'charlie@email.com', 'Denver', 'CO', 'USB-C Cable', 'Electronics', 12.99, 3, 'Express', 12.99),
(1004, '2024-01-22', 'Charlie Davis', 'charlie@email.com', 'Denver', 'CO', 'Desk Lamp', 'Office', 45.00, 2, 'Express', 12.99);
Notice the problems:
Redesign this into a properly normalized schema (3NF). Your solution should:
Try your solution here
Click here to see answer
SQL Output