Topic 05

Transactions & ACID

Imagine transferring $100 from your bank account to a friend's. If the system crashes right after taking the money from your account but before adding it to theirs, that $100 disappears. Transactions are the database's way of preventing that. This topic explains the four guarantees — ACID — that make databases trustworthy.

InnoDB (MySQL)PostgreSQL CockroachDBSpannerOracle

At a Glance

The Four Guarantees

What ACID Actually Means

ACID is not a monolithic feature — it is four separate, independently enforceable guarantees. A database engine may implement them using completely different mechanisms. Understanding each one individually is key to reasoning about what can go wrong when you weaken or bypass them.

Atomicity

Think of the bank transfer: debit one account, credit another. Atomicity means both happen, or neither happens — there is no middle state. If anything goes wrong between the debit and the credit, the debit is automatically reversed. The database keeps an "undo log" — a record of how to reverse each change — so that even a mid-transaction crash can be cleaned up on restart.

Consistency

The database has rules — like "every order must reference a real customer" or "account balance cannot go below zero." Consistency means the database is never left in a state that breaks those rules. A transaction starts with a valid database and must end with a valid database. If a transaction would break a rule, the database rejects it entirely. Some rules are enforced by the database automatically (foreign keys, NOT NULL). Others are up to the application to enforce.

Isolation

Imagine two people both booking the last seat on a flight at the same moment. Isolation ensures they cannot both succeed. Each transaction behaves as if it is the only one running — they cannot see each other's half-finished work. The database achieves this by either placing locks (like a "do not disturb" sign — called Two-Phase Locking or 2PL) or by keeping multiple versions of data so readers see a consistent snapshot (called MVCC). Higher isolation means fewer problems but can slow things down.

Durability

Once the database says "transaction complete," that data is permanent — even if the power cuts out a second later. The database achieves this by writing a record to a journal on disk (the Write-Ahead Log, or WAL) before confirming the commit. If the server crashes, it replays the journal on restart and recovers every committed transaction. Data you committed before the crash is never lost.

Atomicity

How Undo Logging Works

Atomicity is implemented through an undo log — a record of how to reverse every change made during a transaction. Before any page is modified, the old value is written to the undo log. If the transaction aborts or the server crashes, the recovery process reads the undo log and reverses the changes in reverse order.

1

Write undo record

Before the database changes anything, it first writes a note describing how to undo the change. Think of it as saving a "before" photo. For a row update, this is the original row value before the change. For a new row being inserted, the undo record simply says "delete this row if we need to roll back." This note goes into the undo log, which is written to disk first — before any actual data changes.

2

Modify the page

Now the actual data is updated — but only in memory (the buffer pool), not on disk yet. This is intentional: writing to RAM is fast, writing to disk is slow. These in-memory changes are called "dirty pages." If a crash happens here, it is fine — the undo log on disk still has the "before" photos, so the database can revert everything on restart.

3

On COMMIT: flush redo log

When you run COMMIT, the database writes a "this transaction is complete" record to its journal (the redo log, part of the WAL) and forces that journal to physical storage. Only after this is safely on disk does the database confirm success to you. The data pages in memory might still be dirty (not yet written to disk) — that is fine, because the redo log has everything needed to reconstruct those changes if a crash occurs.

4

On ABORT or CRASH: apply undo log

If something goes wrong — the transaction is cancelled, or the server crashes — the database reads the undo log backwards (most recent change first) and reverses each change. Each step re-applies the "before" photo, restoring the data to exactly what it was before the transaction started. No partial changes survive. Old undo log entries are cleaned up periodically once no running transaction could need them anymore.

SQL — Transaction Control
-- Basic transfer: atomic debit + credit
BEGIN;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

-- Pessimistic lock: read-then-update without lost update
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  SELECT balance FROM accounts WHERE id = 1 FOR UPDATE; -- exclusive lock
  UPDATE accounts SET balance = balance - 50 WHERE id = 1;
COMMIT;

-- Savepoints: partial rollback within a transaction
BEGIN;
  INSERT INTO orders (user_id, total) VALUES (42, 199.99);
  SAVEPOINT after_order;
  INSERT INTO payments (order_id, amount) VALUES (1001, 199.99);
  -- payment insert failed validation
  ROLLBACK TO SAVEPOINT after_order; -- undo payment only, keep order
  INSERT INTO payments (order_id, amount, method) VALUES (1001, 199.99, 'card');
COMMIT;

-- Optimistic locking with version column (no DB-level lock)
BEGIN;
  SELECT balance, version FROM accounts WHERE id = 1;
  -- application reads version=7, computes new balance
  UPDATE accounts
    SET balance = 850, version = 8
  WHERE id = 1 AND version = 7; -- fails if another tx already updated
  -- check affected rows; if 0, someone else won, retry
COMMIT;
MVCC

Multi-Version Concurrency Control

MVCC is the key insight behind modern databases: instead of making readers wait for writers (or vice versa), keep multiple versions of each row and give each transaction a consistent snapshot of the data at the moment it began.

1

Transaction IDs and row versions

Every transaction gets a unique ID number (like a timestamp). Every row in the database secretly has two hidden fields: when it was created (xmin — the ID of the transaction that created it) and when it was deleted or replaced (xmax — the ID of the transaction that replaced it, or empty if it is still the current version). When a row is updated, the database does not overwrite it — it marks the old version as "replaced" and creates a new version. Both versions exist on disk simultaneously.

2

Snapshot at transaction start

When your transaction begins, the database takes a "snapshot" — it records which transactions were completed at that moment. From that point on, your reads only see row versions that were committed before your snapshot. If another transaction updates a row while you are running, you continue seeing the old version — not the new one. It is like viewing a photo of the database from the moment you started, even as others are still making changes. No locks needed for reads.

3

Readers never block writers

Because every reader sees their own consistent snapshot, a writer updating a row at the same time does not interfere. The reader just sees the old version from before the write started. Writers only block other writers — two people cannot update the same row simultaneously. This is what makes MVCC powerful for high-traffic applications: reads and writes can happen in parallel on the same rows without either having to wait for the other.

4

VACUUM cleans up dead versions

Over time, the old row versions pile up — every update creates a new version and leaves the old one behind. This is called table bloat. PostgreSQL periodically runs a process called VACUUM that scans for old versions no active transaction could possibly be reading anymore, and frees that space for reuse. Without regular VACUUM, the table grows indefinitely on disk. Most production databases are configured to run VACUUM automatically in the background, but monitoring it is important on busy tables.

Isolation Levels

Anomalies vs. Isolation

The SQL standard defines four isolation levels in order of increasing strictness. Higher isolation prevents more anomalies but typically reduces concurrency (more blocking, more aborts). The right choice depends on what anomalies your application can tolerate.

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadLost UpdateTypical Use
READ UNCOMMITTEDpossiblepossiblepossiblepossibleApproximate analytics
READ COMMITTEDpreventedpossiblepossiblepossiblePostgreSQL default; most OLTP apps
REPEATABLE READpreventedpreventedpossible*preventedMySQL default; financial reads
SERIALIZABLEpreventedpreventedpreventedpreventedAccounting, inventory, compliance

*PostgreSQL's REPEATABLE READ also prevents phantom reads via snapshot isolation. MySQL InnoDB uses next-key locks to prevent phantoms at REPEATABLE READ.

Anomalies

Read Anomalies Explained

1

Dirty Read

T1 reads a row modified by T2 before T2 has committed. If T2 rolls back, T1 has read data that never officially existed. Example: T2 updates a user's balance from 100 to 50, T1 reads 50, T2 aborts — T1 acted on a false balance. Only occurs at READ UNCOMMITTED.

2

Non-Repeatable Read

T1 reads a row, T2 commits an UPDATE to that row, T1 reads the same row again and gets a different value. The same query produced two different results within one transaction. Prevented at REPEATABLE READ and above, where T1 sees a snapshot frozen at its start time.

3

Phantom Read

T1 executes SELECT * FROM orders WHERE amount > 100 and gets 5 rows. T2 commits an INSERT of an order with amount=200. T1 re-runs the same query and now gets 6 rows — the "phantom" row appeared. Prevented only at SERIALIZABLE, which either uses predicate locks or detects the dependency (SSI in PostgreSQL).

4

Lost Update

T1 and T2 both read balance=100. T1 writes 150 (100+50). T2 writes 150 (100+50). T2's write overwrites T1's — one update is lost; the balance should be 200. Fixed with SELECT FOR UPDATE, atomic updates (SET balance = balance + 50), or optimistic locking with a version column.

5

Write Skew

Two transactions each read overlapping data and each write non-overlapping data, but their combined effect violates an invariant. Example: hospital on-call system requires at least 1 doctor on duty. T1 and T2 each see 2 doctors on duty, each assumes the other doctor stays, each removes themselves — both leave, violating the constraint. Only SERIALIZABLE prevents write skew.

Durability Mechanism

Write-Ahead Logging (WAL)

WAL is the mechanism behind both atomicity and durability. The rule: before any data page is written to disk, the log record describing that change must already be on disk. This "write-ahead" constraint means the log is always ahead of the data — so recovery after a crash can always reconstruct the correct state.

1

Log record types

The WAL contains two types of records. Redo records describe the change that was made — enough information to replay the change if needed. Undo records describe how to reverse a change — used when a transaction is cancelled or the database is recovering from a crash and needs to clean up incomplete work. There is also a special type of log entry (called a Compensation Log Record) that records when an undo step has already been applied, preventing any step from being applied twice.

2

Group commit

Writing to disk is slow — it can take several milliseconds per write. If the database had to write to disk for every single transaction before confirming it, performance would collapse under load. Instead, databases wait just a moment and batch up multiple transactions' journal entries into one disk write. This "group commit" trick means one disk write confirms dozens of transactions at once. It is a major reason high-traffic databases can handle thousands of transactions per second.

3

Checkpoints

Periodically, the database takes a "checkpoint" — it writes all in-memory changes to disk and marks a safe point in the journal. Think of it like saving your progress in a video game. If a crash happens, recovery does not need to replay the entire journal from the very beginning — only from the last checkpoint. More frequent checkpoints mean faster recovery after a crash, but they also mean more work during normal operation. The interval is configurable.

4

ARIES recovery algorithm

When the database crashes, it replays a journal (called the WAL) to rebuild exactly what happened, then undoes any incomplete work. This process is called ARIES recovery. It happens in three steps: first, scan the journal to figure out which transactions were in-progress when the crash happened (Analysis). Second, replay all the journal entries from the last checkpoint to reconstruct the exact state at the moment of the crash (Redo). Third, reverse any transactions that were still in-progress — they were never committed, so their changes should not survive (Undo). After these three steps, the database is in a clean, consistent state.

Locking Protocol

Two-Phase Locking (2PL)

2PL is the classic locking protocol for achieving serializability. It has two strict rules: (1) before reading, acquire a shared lock; before writing, acquire an exclusive lock. (2) Once a lock is released, no new locks can be acquired. This creates a "growing phase" and a "shrinking phase."

Lock TypeCompatible With Shared?Compatible With Exclusive?Use Case
Shared (S)Yes — multiple readersNo — blocks writersSELECT, SELECT FOR SHARE
Exclusive (X)No — blocks readersNo — only one writerUPDATE, DELETE, SELECT FOR UPDATE
Intent Shared (IS)YesNoSignals intent to lock rows; allows table-level checks
Intent Exclusive (IX)YesNoTable-level marker for row-level exclusive locks
1

Strict 2PL

Two-Phase Locking (2PL) is a classic strategy for preventing transactions from interfering. It has two rules: first, acquire all locks you need before making any changes. Second, release all locks at the end (when you commit or rollback). In the strict version used in practice, all locks are held until the very end. This prevents a chain reaction where one transaction's rollback forces another's — called cascading aborts. The downside is that it can cause contention when many transactions compete for the same rows.

2

Deadlock detection

Imagine two people each holding one key and waiting for the other's key. Neither can move forward — this is a deadlock. In databases, it happens when Transaction A is waiting for a row locked by Transaction B, and Transaction B is waiting for a row locked by Transaction A. Both wait forever. The database detects this by periodically checking whether any transactions are caught in a circular wait. When a deadlock is found, the database picks one transaction as the "victim" and rolls it back, letting the other proceed. The application must catch this error and retry.

Interactive Demo

Transaction Timeline

Drag the progress slider to advance time through two concurrent transactions. Change the isolation level to see which anomalies appear or disappear.

Atomicity All-or-nothing. Rollback on failure. Consistency Constraints always hold before & after. Isolation Concurrent txns appear sequential. Durability Committed writes survive any crash. T1 BEGIN READ/WRITE COMMIT T2 BEGIN READ/WRITE ROLLBACK WAL flush before COMMIT

ACID is a contract between the application and the DB engine. Isolation is the expensive property — weaker levels trade anomaly risk for throughput. Durability is free after the WAL write; the actual page flush can happen lazily.

Concurrent Transaction Simulator

Drag the blue progress dot left/right to travel through time. Anomalies appear in red.
Database Comparison

How Major Databases Implement Transactions

DatabaseDefault IsolationConcurrency StrategyWAL FormatNotable Behavior
PostgreSQLREAD COMMITTEDMVCC (heap + xmin/xmax)WAL (redo only)SSI at SERIALIZABLE; VACUUM needed for dead tuple cleanup
MySQL InnoDBREPEATABLE READMVCC + 2PLRedo log + undo log (separate)Gap locks prevent phantoms; next-key locking
OracleREAD COMMITTEDMVCC (undo segments)Redo log (online + archived)Undo tablespace holds old versions; autonomous transactions
SQL ServerREAD COMMITTED2PL (default) or RCSI/SITransaction logRCSI (Read Committed Snapshot Isolation) opt-in; TempDB for versions
CockroachDBSERIALIZABLEMVCC + SSIRaft log per rangeDistributed transactions via 2PC over Raft; automatic retries on serialization failures
Google SpannerSERIALIZABLEMVCC + True TimeDistributed Paxos logExternal consistency via TrueTime atomic clocks; linearizable reads globally
Application Patterns

Safe Transaction Patterns

SQL — Patterns for Correctness
-- Pattern 1: Atomic increment (safe under any isolation level)
UPDATE counters SET value = value + 1 WHERE id = 1;
-- The DB evaluates value+1 atomically — no read-then-write race

-- Pattern 2: Upsert (INSERT ... ON CONFLICT)
INSERT INTO user_sessions (user_id, token, created_at)
  VALUES (42, 'abc123', now())
  ON CONFLICT (user_id) DO UPDATE SET
    token = EXCLUDED.token,
    created_at = EXCLUDED.created_at;
-- Atomic: no TOCTOU race between checking and inserting

-- Pattern 3: Conditional update (optimistic concurrency)
UPDATE inventory
  SET quantity = quantity - 1, version = version + 1
WHERE product_id = 99
  AND version = 14     -- stale version check
  AND quantity >= 1;   -- prevent negative stock
-- If 0 rows affected, someone else won — retry at application level

-- Pattern 4: SELECT FOR UPDATE SKIP LOCKED (work queue)
BEGIN;
SELECT id, payload FROM job_queue
WHERE status = 'pending'
ORDER BY priority DESC
LIMIT 1
FOR UPDATE SKIP LOCKED; -- skip rows locked by other workers
UPDATE job_queue SET status = 'processing' WHERE id = :id;
COMMIT;
-- Each worker atomically claims a different job — no double-processing
Anti-patterns

Long-running transactions

Transactions held open for seconds or minutes accumulate MVCC versions (bloating the database and preventing VACUUM from reclaiming space), block DDL operations, and increase the blast radius of failures. Keep transactions short. Never hold a transaction open waiting for user input. Prepare data, then open the transaction and commit immediately.

HTTP calls inside a transaction

If the HTTP call hangs (30s timeout), your transaction holds locks for 30 seconds. Every concurrent transaction touching those rows is blocked. Never call external services inside BEGIN...COMMIT. Prepare and validate all data, commit, then make external calls. Use the outbox pattern for reliable event publishing.

Using READ UNCOMMITTED in application code

READ UNCOMMITTED is almost never the right choice. The only defensible use is approximate analytics dashboards where you genuinely need the fastest possible read and can tolerate seeing in-progress writes. PostgreSQL doesn't even support true dirty reads at READ UNCOMMITTED — it silently uses READ COMMITTED behavior. In MySQL/SQL Server, the risk is real.

Read-modify-write outside a transaction

Fetching a value in application code, modifying it, then writing it back is a race condition under any isolation level. Two requests reading the same value concurrently will both write based on the stale read. Use atomic SQL updates (SET x = x + 1), SELECT FOR UPDATE, or optimistic locking with a version column instead.

Ignoring serialization failures

At SERIALIZABLE isolation (especially PostgreSQL's SSI), the DB can abort a transaction with a 40001: could not serialize access error to prevent a write skew anomaly. Applications must retry on this error. Silently propagating it to the user as a 500 error is wrong — wrap transaction code in a retry loop.

Quiz
Question 1 of 5
Which isolation level prevents dirty reads but still allows non-repeatable reads?
AREAD UNCOMMITTED
BREAD COMMITTED
CREPEATABLE READ
DSERIALIZABLE
Question 2 of 5
In MVCC, a row is visible to your transaction if:
AYou hold a shared lock on the row
BIts xmin committed before your snapshot and its xmax is 0 or not yet committed
CIt is the most recently written version of the row
DNo other transaction has a pending write to that row
Question 3 of 5
A transaction that aborts must:
ACommit any completed partial changes
BRetain all acquired locks indefinitely
CRoll back all changes, leaving no visible effect
DRetry automatically until it succeeds
Question 4 of 5
Which anomaly can occur even at REPEATABLE READ isolation level (according to the SQL standard)?
ADirty read
BNon-repeatable read
CPhantom read
DLost update
Question 5 of 5
What does WAL's "write-ahead" rule mean?
AData pages must be flushed to disk before the log record is written
BThe log record must be on durable storage before the data page is written to disk
CEach transaction must wait for the previous transaction's log record to flush
DA checkpoint must complete before each commit
Interview Q&A
How does MVCC work in PostgreSQL? Walk me through the xmin/xmax mechanism.
PostgreSQL stores multiple versions of each row in the heap. Every row version has two hidden system columns: xmin (XID of the transaction that created this version) and xmax (XID of the transaction that deleted/updated it, or 0 if still live). When a transaction begins, it acquires a snapshot: the current XID high-water mark and the set of all in-progress XIDs. A row version is visible if its xmin is committed and below the snapshot's XID (and not in the in-progress set), and its xmax is either 0, uncommitted, or above the snapshot's XID. Writes create a new row version with a new xmin; they set the old version's xmax to the writing XID but do not physically delete it. VACUUM later reclaims dead versions. This means readers never block writers and writers never block readers — only writer-writer conflicts cause blocking.
What is a dirty read, and when can it actually be useful?
A dirty read is when transaction T1 reads data written by T2 that has not committed yet. If T2 rolls back, T1 read data that never officially existed. In most application code, this is never acceptable. Legitimate uses: (1) approximate analytics dashboards — you want the freshest in-progress data for a live view counter and a slightly stale read is fine; (2) monitoring queries where seeing in-progress state is the intent; (3) debugging — seeing what an in-progress transaction is doing. Important caveat: PostgreSQL does not actually implement dirty reads even at READ UNCOMMITTED — it silently uses READ COMMITTED semantics. True dirty reads require SQL Server, MySQL, or Oracle with READ UNCOMMITTED specified.
Explain the lost update problem. What are the different solutions?
Lost update: T1 reads balance=100, T2 reads balance=100, T1 writes 150 (100+50), T2 writes 150 (100+50) — T1's update is overwritten by T2. The correct result is 200. Three solutions: (1) Atomic SQL update: UPDATE SET balance = balance + 50 — the DB evaluates this atomically, serializing concurrent increments correctly. Best when you just need to increment/decrement. (2) SELECT FOR UPDATE: acquire an exclusive row lock at read time. T2 blocks until T1 commits, then reads the updated value 150 and writes 200. Correct but creates contention. (3) Optimistic locking with version column: read version=5, write WHERE version=5, if 0 rows affected someone else won, retry. Best for low-contention workloads — no lock held between read and write.
What is write skew and why does it require SERIALIZABLE to prevent?
Write skew is a subtle anomaly where two transactions each read an overlapping set of rows and each write to non-overlapping rows, but their combined effect violates an application-level invariant that neither transaction individually violated. Classic example: a hospital requires at least 1 doctor on duty. Two doctors each check: "are there 2+ doctors on duty?" — yes. Each assumes the other stays, each marks themselves off duty. Result: 0 doctors on duty. Both transactions were individually valid; together they broke the invariant. REPEATABLE READ cannot prevent this because the writes don't touch overlapping rows — there is no write-write conflict to detect. SERIALIZABLE (specifically SSI in PostgreSQL) detects the anti-dependency: T1's decision depended on the rows T2 modified, creating a cycle, and aborts one of them.
How does PostgreSQL's Serializable Snapshot Isolation (SSI) work?
SSI (introduced in PostgreSQL 9.1) achieves true SERIALIZABLE isolation without the blocking of traditional 2PL. It works by detecting dangerous anti-dependency cycles in the transaction dependency graph. Transactions are tracked in a "SIREAD lock" structure: every read acquires a predicate lock recording what was read. When a write happens, the engine checks whether any concurrent transaction read data that this write affects — recording an "rw-anti-dependency." If a cycle of anti-dependencies is detected (T1 depends on T2 which depends on T1), one transaction is aborted with error code 40001. Applications must retry. SSI has lower throughput than READ COMMITTED and can cause false positives (aborting transactions that actually would have been fine), but it is the gold standard for correctness under concurrency.
What are savepoints and when should you use them?
A savepoint marks a point within a transaction to which you can roll back without aborting the entire transaction. Syntax: SAVEPOINT name, ROLLBACK TO SAVEPOINT name, RELEASE SAVEPOINT name. Use cases: (1) Bulk imports with partial error tolerance — insert a row, SAVEPOINT, if it fails roll back to savepoint and skip that row, continue with the next. (2) Multi-step workflows where early steps must survive even if later steps fail. (3) Testing — wrap the test body in a transaction, use savepoints to reset between sub-tests, roll back the outer transaction at the end to leave no test data. Important: ROLLBACK TO SAVEPOINT does not release the savepoint — you must RELEASE it explicitly or it persists until end of transaction. Do not use savepoints as a substitute for proper error handling design.
When would you choose optimistic locking over SELECT FOR UPDATE?
Choose optimistic locking when contention is low and the cost of occasional retries is acceptable. SELECT FOR UPDATE holds an exclusive lock from read to commit — under high contention this creates a queue and throughput collapses. Optimistic locking with a version column holds no DB lock between read and write; retries only happen on actual conflicts. It scales better horizontally and works across microservices (you can pass the version to a different service that does the write). Use SELECT FOR UPDATE when: (1) conflicts are frequent and you need guaranteed forward progress rather than retry-based progress; (2) you need to read-then-write atomically and cannot use an atomic SQL expression; (3) you are implementing a queue (FOR UPDATE SKIP LOCKED is excellent for job queues). The key question: is the read-write window measured in milliseconds (optimistic fine) or seconds (use FOR UPDATE or redesign the flow)?
What happens when a transaction deadlocks? How does the database handle it?
A deadlock occurs when T1 holds lock A and waits for lock B, while T2 holds lock B and waits for lock A — both block forever. Detection: the DBMS maintains a waits-for graph. Each lock wait is an edge: "T1 waits for T2". A cycle in this graph is a deadlock. Most databases run deadlock detection periodically (every 1 second in PostgreSQL) or on every lock acquisition attempt. When a deadlock is found, one transaction is chosen as the "victim" and rolled back (typically the cheapest to abort — fewest rows modified). The victim receives an error (PostgreSQL: 40P01: deadlock detected). The application must catch this error and retry. Prevention strategies: always acquire locks in a consistent order across transactions (e.g., always lock lower account ID before higher); use lock timeouts (SET lock_timeout = '2s'); keep transactions short to reduce the window for deadlocks.
Further Reading
Previous
Normalization