Every database you have ever used — Gmail, Instagram, your bank — stores data in tables. This topic explains the rules behind those tables: how rows and columns are defined, how tables link to each other, and the mathematical operations that make SQL work.
Think of a relation as a spreadsheet tab. Each row is one record — in database language, a row is called a tuple. Each column is one piece of data about that record — a column is called an attribute. All rows in the same table share the same columns. There is no "first row" or "last row" — order is not guaranteed, just like a deck of cards you have shuffled.
Every row needs a way to be uniquely identified — like a student ID number. Any column (or combination of columns) that can identify a row is called a superkey. If you strip that down to the minimum needed (no extra columns), it becomes a candidate key. The one you officially designate as the row's identifier is the primary key. A foreign key is a column that points to the primary key of a different table — it is how tables link to each other.
If every employee must belong to a department, the database enforces that you cannot delete a department that still has employees. This enforcement is called referential integrity. When the parent row is deleted, you decide: should child rows be deleted too (CASCADE)? Should their link be cleared (SET NULL)? Or should the deletion be blocked until the children are gone first (RESTRICT)? Each option encodes a real business decision.
Before SQL existed, mathematicians described database operations using symbols. Pick only certain rows: sigma. Pick only certain columns: pi. Combine two tables: join. Combine two result sets: union. Everything you write in SQL compiles down to these six basic operations. You do not need to memorize the symbols — but understanding them explains why SQL works the way it does.
Every column has a declared type and set of legal values — that is its domain. For example, an age column should not allow -5 or the word "banana." Domain constraints let you declare rules like "age must be a non-negative integer" or "status can only be 'active' or 'inactive'." The database rejects any insert or update that violates these rules before the data is ever saved.
Every row must be uniquely identifiable — you cannot have a row with no identity. This means the primary key column can never be empty (NULL) and can never be duplicated. In practice, many tables use a generated ID number (like BIGSERIAL or UUID) as the primary key rather than something like an email address — email addresses can change, be shared, or be left blank, which would break this rule.
When one table has a column pointing to another table's primary key (a foreign key), the database checks that the reference is valid. You cannot have an order row pointing to a customer that does not exist. When you delete the parent (the customer), you choose what happens to the children (the orders): delete them too (CASCADE), blank out the link (SET NULL), or refuse the deletion until you manually remove the children first (RESTRICT).
Sometimes you need business rules that go beyond basic types and keys. For example, "end date must be after start date," or "each user can only have one review per product." These are user-defined constraints. You can declare them directly in the table definition, and the database will enforce them automatically — no application code needed. For more complex rules (like checking data across multiple tables at once), you can use triggers.
This operation picks certain rows and throws away the rest. The symbol sigma (looks like a capital E rotated) represents "filter rows where this condition is true." In SQL, this is your WHERE clause. For example, "give me only employees earning over $80,000." The result has the same columns as before — you have only reduced the number of rows.
This operation keeps only certain columns and hides the rest. The symbol pi represents "project columns." In SQL, this is the list of columns after SELECT — for example, "I only want the name and salary columns, not the full employee record." In strict math, projection also removes duplicate rows. In SQL, duplicates are kept by default for speed — add DISTINCT if you want to remove them.
A join takes two tables and stitches them together based on a matching column. Think of it like looking up a person's department name by matching the department ID on the employee row with the department ID on the department row. In SQL, you use JOIN ... ON to say exactly which columns should match. Always use an explicit ON condition in real code — letting the database guess (natural join) is risky and error-prone.
Sometimes you want to combine two query results rather than two tables. UNION stacks them together (like combining two lists). INTERSECT gives you only rows that appear in both lists. EXCEPT gives you rows from the first list that do not appear in the second. Both queries must return the same number of columns. UNION removes duplicates by default; use UNION ALL to keep them (which is faster when you know there are no duplicates).
Relational algebra: pi(name,dept_name)(sigma(salary>80000)(employees JOIN departments)) SELECT e.name, d.dept_name FROM employees e JOIN departments d ON e.dept_id = d.dept_id WHERE e.salary > 80000; Key constraints in DDL CREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY, customer_id INT NOT NULL REFERENCES customers(customer_id) ON DELETE RESTRICT, status VARCHAR(20) NOT NULL CHECK (status IN ('pending','paid','shipped')), total NUMERIC(10,2) CHECK (total > 0), created_at TIMESTAMPTZ DEFAULT now() ); UNION: customers who ordered OR registered this year SELECT customer_id FROM orders UNION SELECT id FROM customers WHERE created_at >= '2024-01-01'; EXCEPT: customers who registered but never ordered SELECT id FROM customers EXCEPT SELECT DISTINCT customer_id FROM orders;
A foreign key in employees references the primary key in departments. The JOIN operation walks that reference to stitch the two relations together into a flat result set.
Every relational algebra operation takes one or more relations as input and produces a relation as output — this is closure. It enables arbitrary composition: the output of a SELECT is itself a relation that can be used as input to another SELECT. This is why subqueries, CTEs, and derived tables all work naturally in SQL. Every query result is a "virtual table" with the same interface as a stored table — that is why you can write SELECT * FROM (SELECT ...) AS sub.
Using INNER JOIN when you need LEFT JOIN silently drops rows. Always verify expected result cardinality with a COUNT(*) sanity check. A missing row in the result is often worse than an error because it is invisible.
Storing FK values without declaring FOREIGN KEY lets orphaned rows accumulate — child rows pointing to deleted parents. This silent data corruption is expensive to find and fix later, especially across millions of rows.
NATURAL JOIN matches on ALL common column names. Adding a new column to a table can silently change join semantics. Always use explicit ON conditions in production code.
NULLs in PK columns violate entity integrity. NULL means "unknown" — two NULLs are not equal in SQL's three-valued logic. Always make PKs NOT NULL. Use surrogate keys (BIGSERIAL, UUID) for stability.
SELECT * FROM (SELECT ...) AS sub works.SELECT e.name, m.name AS manager FROM employees e JOIN employees m ON e.manager_id = m.id. This retrieves each employee alongside their manager's name from the same table. Other uses: finding pairs of employees in the same department, comparing rows within the same table, or finding records that share a value. The DBMS treats the two aliases as two logical relations — same physical table, two logical names.