Learn everything you wanted to know about ACID transactions.
The acronym ACID stands for atomicity, consistency, isolation, and durability. These describe the set of properties of database transactions that guarantee data integrity despite errors, system failures, power failures, or other issues.
A database transaction is a sequence of operations on a database that satisfies the ACID properties. Let’s take a look at an example: Imagine Alice is sending Bob $100 via your SuperFastTransaction app. You are the administrator in charge of the database management systems (DBMS) where the money transfers are recorded.
In SQL code you would write something along the lines of:
As you can see, the entire process consists of multiple operations, but they are regarded as a single transaction.
Why a single transaction?
Imagine the following. As you withdrew the money from Alice’s account, the system crashes before you put it into Bob’s. Now you would have a weird state, where Alice would be missing $100 from her account, but Bob would not have received them.
Luckily, modern relational databases like PostgreSQL or MySQL implement transactions with ACID properties that prevent such things from happening.
ACID characteristics can be broken down into four properties: atomicity, consistency, isolation, and durability.
Atomicity refers to the fact that a transaction succeeds or it fails. It is an all-or-nothing operation. Despite being composed of multiple steps, those steps are treated as a single operation or a unit. In the example above, where a system crash stopped the database mid-transaction, the transaction fails, rolling the database back to the previous state and re-instating Alice’s money.
Consistency refers to the characteristic that requires data updated via transactions to respect the other constraints or rules within the database systems to keep data in a consistent state
For example, you set in place SQL triggers or integrity constraints that check personal balances and prevent an account from withdrawing more money than they have - your app offers no credit. So if Alice started with $50, she would not be allowed to send 100 dollars to Bob.
Modern DBMSs allow users to access data concurrently and in parallel. Isolation is the characteristic that allows concurrency control so modifications from one transaction are not affecting operations in another transaction. Two parallel transactions are in reality isolated and seem to be performed sequentially.
The last ACID property, durability, refers to the persistence of committed transactions. Transactions and database modifications are not kept in volatile memory but are saved to permanent storage, such as disks.
This prevents data loss during system failure, such as a power outage.
The most common implementation of ACID transactions is done via locks. Data is locked (not accessible by another transaction) until a transaction completes or fails, to guarantee atomicity, isolation, and consistency.
To guarantee durability, databases often implement write-ahead logs. Transactions are first stored into transaction logs, and only once they are saved to this separate repository, they are implemented in the actual database. In case of system failure mid-transaction, the transaction is either rolled back or continued from the transaction log left off.
ACID compliance offers multiple benefits:
ACID transactions also carry negative consequences that need to be weighed against the advantages. Database systems that rely on ACID transactions are usually slower at read and write operations, because of the locking mechanism.
For a high throughput system - such as Netflix, Facebook, or other big data applications, distributed systems perform better at ingesting large quantities of data in parallel.
Distributed databases, such as NoSQL databases like Casandra or MongoDB, replicate data across several nodes or servers. Each node carries a copy of the overall data but does not necessarily update the data at the same time across all nodes. This design allows for faster data ingestion and reads.
In other words, has higher availability. But availability comes at the cost of consistency - data can be stale or inconsistent across nodes at any point in time. Distributed databases guarantee eventual consistency, aka data will be consistent, but not necessarily when you retrieve it from your closest node.
Depending on what you value more - strong consistency or availability, you might have to choose between slower and ACID-compliant databases or faster but not ACID-compliant ones.
Put logic behind ACID, eventual consistency, and other paradigms to use by building better and more informed data pipelines.
Learn more about modern data engineering, improve your skills, and get certified with Keboola’s Data Engineer Certificate.