Imagine transferring $100 from your bank account to a friend's. If the system crashes right after taking the money from your account but before adding it to theirs, that $100 disappears. Transactions are the database's way of preventing that. This topic explains the four guarantees — ACID — that make databases trustworthy.
ACID is not a monolithic feature — it is four separate, independently enforceable guarantees. A database engine may implement them using completely different mechanisms. Understanding each one individually is key to reasoning about what can go wrong when you weaken or bypass them.
Think of the bank transfer: debit one account, credit another. Atomicity means both happen, or neither happens — there is no middle state. If anything goes wrong between the debit and the credit, the debit is automatically reversed. The database keeps an "undo log" — a record of how to reverse each change — so that even a mid-transaction crash can be cleaned up on restart.
The database has rules — like "every order must reference a real customer" or "account balance cannot go below zero." Consistency means the database is never left in a state that breaks those rules. A transaction starts with a valid database and must end with a valid database. If a transaction would break a rule, the database rejects it entirely. Some rules are enforced by the database automatically (foreign keys, NOT NULL). Others are up to the application to enforce.
Imagine two people both booking the last seat on a flight at the same moment. Isolation ensures they cannot both succeed. Each transaction behaves as if it is the only one running — they cannot see each other's half-finished work. The database achieves this by either placing locks (like a "do not disturb" sign — called Two-Phase Locking or 2PL) or by keeping multiple versions of data so readers see a consistent snapshot (called MVCC). Higher isolation means fewer problems but can slow things down.
Once the database says "transaction complete," that data is permanent — even if the power cuts out a second later. The database achieves this by writing a record to a journal on disk (the Write-Ahead Log, or WAL) before confirming the commit. If the server crashes, it replays the journal on restart and recovers every committed transaction. Data you committed before the crash is never lost.
Atomicity is implemented through an undo log — a record of how to reverse every change made during a transaction. Before any page is modified, the old value is written to the undo log. If the transaction aborts or the server crashes, the recovery process reads the undo log and reverses the changes in reverse order.
Before the database changes anything, it first writes a note describing how to undo the change. Think of it as saving a "before" photo. For a row update, this is the original row value before the change. For a new row being inserted, the undo record simply says "delete this row if we need to roll back." This note goes into the undo log, which is written to disk first — before any actual data changes.
Now the actual data is updated — but only in memory (the buffer pool), not on disk yet. This is intentional: writing to RAM is fast, writing to disk is slow. These in-memory changes are called "dirty pages." If a crash happens here, it is fine — the undo log on disk still has the "before" photos, so the database can revert everything on restart.
When you run COMMIT, the database writes a "this transaction is complete" record to its journal (the redo log, part of the WAL) and forces that journal to physical storage. Only after this is safely on disk does the database confirm success to you. The data pages in memory might still be dirty (not yet written to disk) — that is fine, because the redo log has everything needed to reconstruct those changes if a crash occurs.
If something goes wrong — the transaction is cancelled, or the server crashes — the database reads the undo log backwards (most recent change first) and reverses each change. Each step re-applies the "before" photo, restoring the data to exactly what it was before the transaction started. No partial changes survive. Old undo log entries are cleaned up periodically once no running transaction could need them anymore.
-- Basic transfer: atomic debit + credit BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT; -- Pessimistic lock: read-then-update without lost update BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; SELECT balance FROM accounts WHERE id = 1 FOR UPDATE; -- exclusive lock UPDATE accounts SET balance = balance - 50 WHERE id = 1; COMMIT; -- Savepoints: partial rollback within a transaction BEGIN; INSERT INTO orders (user_id, total) VALUES (42, 199.99); SAVEPOINT after_order; INSERT INTO payments (order_id, amount) VALUES (1001, 199.99); -- payment insert failed validation ROLLBACK TO SAVEPOINT after_order; -- undo payment only, keep order INSERT INTO payments (order_id, amount, method) VALUES (1001, 199.99, 'card'); COMMIT; -- Optimistic locking with version column (no DB-level lock) BEGIN; SELECT balance, version FROM accounts WHERE id = 1; -- application reads version=7, computes new balance UPDATE accounts SET balance = 850, version = 8 WHERE id = 1 AND version = 7; -- fails if another tx already updated -- check affected rows; if 0, someone else won, retry COMMIT;
MVCC is the key insight behind modern databases: instead of making readers wait for writers (or vice versa), keep multiple versions of each row and give each transaction a consistent snapshot of the data at the moment it began.
Every transaction gets a unique ID number (like a timestamp). Every row in the database secretly has two hidden fields: when it was created (xmin — the ID of the transaction that created it) and when it was deleted or replaced (xmax — the ID of the transaction that replaced it, or empty if it is still the current version). When a row is updated, the database does not overwrite it — it marks the old version as "replaced" and creates a new version. Both versions exist on disk simultaneously.
When your transaction begins, the database takes a "snapshot" — it records which transactions were completed at that moment. From that point on, your reads only see row versions that were committed before your snapshot. If another transaction updates a row while you are running, you continue seeing the old version — not the new one. It is like viewing a photo of the database from the moment you started, even as others are still making changes. No locks needed for reads.
Because every reader sees their own consistent snapshot, a writer updating a row at the same time does not interfere. The reader just sees the old version from before the write started. Writers only block other writers — two people cannot update the same row simultaneously. This is what makes MVCC powerful for high-traffic applications: reads and writes can happen in parallel on the same rows without either having to wait for the other.
Over time, the old row versions pile up — every update creates a new version and leaves the old one behind. This is called table bloat. PostgreSQL periodically runs a process called VACUUM that scans for old versions no active transaction could possibly be reading anymore, and frees that space for reuse. Without regular VACUUM, the table grows indefinitely on disk. Most production databases are configured to run VACUUM automatically in the background, but monitoring it is important on busy tables.
The SQL standard defines four isolation levels in order of increasing strictness. Higher isolation prevents more anomalies but typically reduces concurrency (more blocking, more aborts). The right choice depends on what anomalies your application can tolerate.
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Lost Update | Typical Use |
|---|---|---|---|---|---|
| READ UNCOMMITTED | possible | possible | possible | possible | Approximate analytics |
| READ COMMITTED | prevented | possible | possible | possible | PostgreSQL default; most OLTP apps |
| REPEATABLE READ | prevented | prevented | possible* | prevented | MySQL default; financial reads |
| SERIALIZABLE | prevented | prevented | prevented | prevented | Accounting, inventory, compliance |
*PostgreSQL's REPEATABLE READ also prevents phantom reads via snapshot isolation. MySQL InnoDB uses next-key locks to prevent phantoms at REPEATABLE READ.
T1 reads a row modified by T2 before T2 has committed. If T2 rolls back, T1 has read data that never officially existed. Example: T2 updates a user's balance from 100 to 50, T1 reads 50, T2 aborts — T1 acted on a false balance. Only occurs at READ UNCOMMITTED.
T1 reads a row, T2 commits an UPDATE to that row, T1 reads the same row again and gets a different value. The same query produced two different results within one transaction. Prevented at REPEATABLE READ and above, where T1 sees a snapshot frozen at its start time.
T1 executes SELECT * FROM orders WHERE amount > 100 and gets 5 rows. T2 commits an INSERT of an order with amount=200. T1 re-runs the same query and now gets 6 rows — the "phantom" row appeared. Prevented only at SERIALIZABLE, which either uses predicate locks or detects the dependency (SSI in PostgreSQL).
T1 and T2 both read balance=100. T1 writes 150 (100+50). T2 writes 150 (100+50). T2's write overwrites T1's — one update is lost; the balance should be 200. Fixed with SELECT FOR UPDATE, atomic updates (SET balance = balance + 50), or optimistic locking with a version column.
Two transactions each read overlapping data and each write non-overlapping data, but their combined effect violates an invariant. Example: hospital on-call system requires at least 1 doctor on duty. T1 and T2 each see 2 doctors on duty, each assumes the other doctor stays, each removes themselves — both leave, violating the constraint. Only SERIALIZABLE prevents write skew.
WAL is the mechanism behind both atomicity and durability. The rule: before any data page is written to disk, the log record describing that change must already be on disk. This "write-ahead" constraint means the log is always ahead of the data — so recovery after a crash can always reconstruct the correct state.
The WAL contains two types of records. Redo records describe the change that was made — enough information to replay the change if needed. Undo records describe how to reverse a change — used when a transaction is cancelled or the database is recovering from a crash and needs to clean up incomplete work. There is also a special type of log entry (called a Compensation Log Record) that records when an undo step has already been applied, preventing any step from being applied twice.
Writing to disk is slow — it can take several milliseconds per write. If the database had to write to disk for every single transaction before confirming it, performance would collapse under load. Instead, databases wait just a moment and batch up multiple transactions' journal entries into one disk write. This "group commit" trick means one disk write confirms dozens of transactions at once. It is a major reason high-traffic databases can handle thousands of transactions per second.
Periodically, the database takes a "checkpoint" — it writes all in-memory changes to disk and marks a safe point in the journal. Think of it like saving your progress in a video game. If a crash happens, recovery does not need to replay the entire journal from the very beginning — only from the last checkpoint. More frequent checkpoints mean faster recovery after a crash, but they also mean more work during normal operation. The interval is configurable.
When the database crashes, it replays a journal (called the WAL) to rebuild exactly what happened, then undoes any incomplete work. This process is called ARIES recovery. It happens in three steps: first, scan the journal to figure out which transactions were in-progress when the crash happened (Analysis). Second, replay all the journal entries from the last checkpoint to reconstruct the exact state at the moment of the crash (Redo). Third, reverse any transactions that were still in-progress — they were never committed, so their changes should not survive (Undo). After these three steps, the database is in a clean, consistent state.
2PL is the classic locking protocol for achieving serializability. It has two strict rules: (1) before reading, acquire a shared lock; before writing, acquire an exclusive lock. (2) Once a lock is released, no new locks can be acquired. This creates a "growing phase" and a "shrinking phase."
| Lock Type | Compatible With Shared? | Compatible With Exclusive? | Use Case |
|---|---|---|---|
| Shared (S) | Yes — multiple readers | No — blocks writers | SELECT, SELECT FOR SHARE |
| Exclusive (X) | No — blocks readers | No — only one writer | UPDATE, DELETE, SELECT FOR UPDATE |
| Intent Shared (IS) | Yes | No | Signals intent to lock rows; allows table-level checks |
| Intent Exclusive (IX) | Yes | No | Table-level marker for row-level exclusive locks |
Two-Phase Locking (2PL) is a classic strategy for preventing transactions from interfering. It has two rules: first, acquire all locks you need before making any changes. Second, release all locks at the end (when you commit or rollback). In the strict version used in practice, all locks are held until the very end. This prevents a chain reaction where one transaction's rollback forces another's — called cascading aborts. The downside is that it can cause contention when many transactions compete for the same rows.
Imagine two people each holding one key and waiting for the other's key. Neither can move forward — this is a deadlock. In databases, it happens when Transaction A is waiting for a row locked by Transaction B, and Transaction B is waiting for a row locked by Transaction A. Both wait forever. The database detects this by periodically checking whether any transactions are caught in a circular wait. When a deadlock is found, the database picks one transaction as the "victim" and rolls it back, letting the other proceed. The application must catch this error and retry.
Drag the progress slider to advance time through two concurrent transactions. Change the isolation level to see which anomalies appear or disappear.
ACID is a contract between the application and the DB engine. Isolation is the expensive property — weaker levels trade anomaly risk for throughput. Durability is free after the WAL write; the actual page flush can happen lazily.
| Database | Default Isolation | Concurrency Strategy | WAL Format | Notable Behavior |
|---|---|---|---|---|
| PostgreSQL | READ COMMITTED | MVCC (heap + xmin/xmax) | WAL (redo only) | SSI at SERIALIZABLE; VACUUM needed for dead tuple cleanup |
| MySQL InnoDB | REPEATABLE READ | MVCC + 2PL | Redo log + undo log (separate) | Gap locks prevent phantoms; next-key locking |
| Oracle | READ COMMITTED | MVCC (undo segments) | Redo log (online + archived) | Undo tablespace holds old versions; autonomous transactions |
| SQL Server | READ COMMITTED | 2PL (default) or RCSI/SI | Transaction log | RCSI (Read Committed Snapshot Isolation) opt-in; TempDB for versions |
| CockroachDB | SERIALIZABLE | MVCC + SSI | Raft log per range | Distributed transactions via 2PC over Raft; automatic retries on serialization failures |
| Google Spanner | SERIALIZABLE | MVCC + True Time | Distributed Paxos log | External consistency via TrueTime atomic clocks; linearizable reads globally |
-- Pattern 1: Atomic increment (safe under any isolation level) UPDATE counters SET value = value + 1 WHERE id = 1; -- The DB evaluates value+1 atomically — no read-then-write race -- Pattern 2: Upsert (INSERT ... ON CONFLICT) INSERT INTO user_sessions (user_id, token, created_at) VALUES (42, 'abc123', now()) ON CONFLICT (user_id) DO UPDATE SET token = EXCLUDED.token, created_at = EXCLUDED.created_at; -- Atomic: no TOCTOU race between checking and inserting -- Pattern 3: Conditional update (optimistic concurrency) UPDATE inventory SET quantity = quantity - 1, version = version + 1 WHERE product_id = 99 AND version = 14 -- stale version check AND quantity >= 1; -- prevent negative stock -- If 0 rows affected, someone else won — retry at application level -- Pattern 4: SELECT FOR UPDATE SKIP LOCKED (work queue) BEGIN; SELECT id, payload FROM job_queue WHERE status = 'pending' ORDER BY priority DESC LIMIT 1 FOR UPDATE SKIP LOCKED; -- skip rows locked by other workers UPDATE job_queue SET status = 'processing' WHERE id = :id; COMMIT; -- Each worker atomically claims a different job — no double-processing
Transactions held open for seconds or minutes accumulate MVCC versions (bloating the database and preventing VACUUM from reclaiming space), block DDL operations, and increase the blast radius of failures. Keep transactions short. Never hold a transaction open waiting for user input. Prepare data, then open the transaction and commit immediately.
If the HTTP call hangs (30s timeout), your transaction holds locks for 30 seconds. Every concurrent transaction touching those rows is blocked. Never call external services inside BEGIN...COMMIT. Prepare and validate all data, commit, then make external calls. Use the outbox pattern for reliable event publishing.
READ UNCOMMITTED is almost never the right choice. The only defensible use is approximate analytics dashboards where you genuinely need the fastest possible read and can tolerate seeing in-progress writes. PostgreSQL doesn't even support true dirty reads at READ UNCOMMITTED — it silently uses READ COMMITTED behavior. In MySQL/SQL Server, the risk is real.
Fetching a value in application code, modifying it, then writing it back is a race condition under any isolation level. Two requests reading the same value concurrently will both write based on the stale read. Use atomic SQL updates (SET x = x + 1), SELECT FOR UPDATE, or optimistic locking with a version column instead.
At SERIALIZABLE isolation (especially PostgreSQL's SSI), the DB can abort a transaction with a 40001: could not serialize access error to prevent a write skew anomaly. Applications must retry on this error. Silently propagating it to the user as a 500 error is wrong — wrap transaction code in a retry loop.
xmin (XID of the transaction that created this version) and xmax (XID of the transaction that deleted/updated it, or 0 if still live). When a transaction begins, it acquires a snapshot: the current XID high-water mark and the set of all in-progress XIDs. A row version is visible if its xmin is committed and below the snapshot's XID (and not in the in-progress set), and its xmax is either 0, uncommitted, or above the snapshot's XID. Writes create a new row version with a new xmin; they set the old version's xmax to the writing XID but do not physically delete it. VACUUM later reclaims dead versions. This means readers never block writers and writers never block readers — only writer-writer conflicts cause blocking.UPDATE SET balance = balance + 50 — the DB evaluates this atomically, serializing concurrent increments correctly. Best when you just need to increment/decrement. (2) SELECT FOR UPDATE: acquire an exclusive row lock at read time. T2 blocks until T1 commits, then reads the updated value 150 and writes 200. Correct but creates contention. (3) Optimistic locking with version column: read version=5, write WHERE version=5, if 0 rows affected someone else won, retry. Best for low-contention workloads — no lock held between read and write.SAVEPOINT name, ROLLBACK TO SAVEPOINT name, RELEASE SAVEPOINT name. Use cases: (1) Bulk imports with partial error tolerance — insert a row, SAVEPOINT, if it fails roll back to savepoint and skip that row, continue with the next. (2) Multi-step workflows where early steps must survive even if later steps fail. (3) Testing — wrap the test body in a transaction, use savepoints to reset between sub-tests, roll back the outer transaction at the end to leave no test data. Important: ROLLBACK TO SAVEPOINT does not release the savepoint — you must RELEASE it explicitly or it persists until end of transaction. Do not use savepoints as a substitute for proper error handling design.40P01: deadlock detected). The application must catch this error and retry. Prevention strategies: always acquire locks in a consistent order across transactions (e.g., always lock lower account ID before higher); use lock timeouts (SET lock_timeout = '2s'); keep transactions short to reduce the window for deadlocks.