A CRDT (Conflict-Free Replicated Data Type) is a special type of data structure designed for distributed systems that guarantees multiple replicas of data will eventually converge to the same state without requiring coordination between nodes. Even when different users simultaneously modify the same data in different locations, CRDTs automatically resolve conflicts in a mathematically consistent way that ensures all replicas eventually agree.
The main insight behind CRDTs is that certain operations can be designed to be commutative, meaning the order in which you apply them doesn’t matter. If operation A followed by operation B produces the same result as operation B followed by operation A, you can apply updates in any order and still reach the same final state. This property eliminates the need for complex conflict resolution logic.
What’s the Point of CRDTs?
Distributed systems face a fundamental challenge: how do you let multiple users or systems modify data simultaneously without requiring constant coordination? Traditional approaches either lock data while someone edits it (slow and limiting) or require complex conflict resolution when simultaneous edits collide (error-prone and complicated).
CRDTs offer an elegant solution by designing data structures where conflicts mathematically cannot occur. Instead of detecting and resolving conflicts after they happen, CRDTs prevent conflicting states from ever arising. All replicas can accept updates independently, and those updates will automatically merge correctly when replicas synchronize.
This enables true offline-first applications. Users can work without network connectivity, making changes locally. When they reconnect, their changes merge seamlessly with others’ changes, no matter how long they were offline or how many conflicting edits occurred. Applications like collaborative document editors, distributed databases, and mobile apps with local-first architectures rely heavily on CRDT properties.
Types of CRDTs
CRDTs come in two main categories with different tradeoffs:
- State-based CRDTs (CvRDTs) – These replicas send their entire current state to each other during synchronization. When a replica receives another replica’s state, it merges the two states using a merge function that’s commutative, associative, and idempotent. The merge always produces a consistent result regardless of order or duplication. State-based CRDTs are conceptually simpler but require sending complete state, which can be bandwidth-intensive for large data structures.
- Operation-based CRDTs (CmRDTs) – These replicas send individual operations (like “increment counter” or “add element to set”) rather than full state. Operations must be delivered in causal order so that applying them at all replicas produces the same result. Operation-based CRDTs are more bandwidth-efficient since they only send changes, but they require causal delivery of all operations, with each operation delivered exactly once (or deduplicated), to ensure all operations reach all replicas.
In practice, many systems use hybrid approaches that combine aspects of both types. Delta-based CRDTs (or Delta-state CRDTs) are particularly popular. They send only the changes (deltas) in state rather than the full state, combining the conceptual simplicity of state-based CRDTs with the bandwidth efficiency of operation-based CRDTs. Other hybrid implementations choose the most efficient mechanism for each specific use case.
Common CRDT Data Structures
Different CRDT implementations exist for various data types, each designed to handle specific use cases:
- Counters – One of the simplest CRDTs. A grow-only counter can only increase, which naturally avoids conflicts. More sophisticated counters support both increments and decrements by tracking increases and decreases separately, then computing the final value as increases minus decreases. Each replica maintains its own increment and decrement counts, which merge trivially.
- Registers – Store single values with strategies for handling concurrent writes. A last-write-wins register uses timestamps (often with replica IDs as tie-breakers) to keep the most recent value, though this can be sensitive to clock skew. Multi-value registers keep all concurrent values and let the application choose or merge them. These are useful for simple key-value stores.
- Sets – Grow-only sets (G-Sets) allow adding elements but never removing them, making merging simple (just take the union). Two-phase sets (2P-Sets) track both added and removed elements separately, allowing deletions but preventing re-adding previously deleted elements. Observed-remove sets (OR-Sets) are more sophisticated, allowing elements to be added and removed multiple times by tagging each addition with a unique identifier.
- Lists and Sequences – Among the most complex CRDTs because maintaining order is tricky. When two users insert elements at the same position simultaneously, the CRDT must decide on a consistent order. RGA (Replicated Growable Array) and WOOT are examples of list CRDTs that assign identifiers to elements in ways that remain consistently ordered across replicas. These enable collaborative text editing where multiple people type simultaneously.
- Maps and Documents – Combine multiple CRDTs to create structured data. Each field in a document might be a different CRDT type (counters for numeric values, registers for strings, sets for collections). These enable complex collaborative applications like shared spreadsheets or documents.
- Graphs – Often modeled using CRDTs for nodes and edges (typically sets), allowing concurrent additions and removals. These are useful for social networks, organizational structures, or any relationship-based data that multiple users modify concurrently.
How CRDTs Work in Practice
Consider a simple grow-only counter used to track likes on a social media post. Multiple users can simultaneously like the post from different locations. Each device maintains its own counter and increments it when the user likes the post. Periodically, devices synchronize their counters by sending their values to each other. Since each replica knows its own contribution and can see others’ contributions, merging is straightforward – take the maximum count seen for each replica. The total likes is the sum of all replicas’ contributions.
For a more complex example, imagine a collaborative shopping list. Two people simultaneously add items while offline. Person A adds “milk” and “bread.” Person B adds “eggs” and “milk.” When they sync, both should see all three items: milk, bread, and eggs. An OR-Set CRDT handles this by tagging each addition with a unique identifier. When merging, it takes the union of all added items. Even though both added milk, the CRDT recognizes these as potentially different operations and includes the item once in the merged result.
List CRDTs for collaborative text editing are more intricate. When two users type in the same position, the CRDT must consistently order their insertions. It does this by assigning each character a globally unique position identifier that establishes order even when insertions happen concurrently. These position identifiers ensure that once both users see both edits, they see the same character sequence.
Benefits of CRDTs
Here are some of the main benefits to be gained from using CRDTs:
- CRDTs enable strong eventual consistency without coordination. Replicas can operate independently, accepting local updates immediately without waiting for network communication or locks. This provides excellent availability and responsiveness, especially important for mobile and web applications where network connectivity is unpredictable.
- Offline operation becomes natural with CRDTs. Applications can function fully while disconnected, with changes merging automatically when connectivity returns. Users don’t face conflicts requiring manual resolution or lost data when working offline.
- Scalability improves because replicas don’t need to coordinate on every operation. There’s no central coordinator bottleneck or distributed consensus overhead. Each replica can handle operations independently, allowing the system to scale by adding more replicas.
- The mathematical guarantees CRDTs provide are powerful. Developers don’t need to write custom conflict resolution logic for each data type. The CRDT’s design ensures correctness automatically, reducing bugs and simplifying application code.
Limitations and Challenges
CRDTs aren’t a universal solution and come with tradeoffs. Some of the main ones include:
- They require more memory than simple data structures because they must track metadata about operations – unique identifiers, timestamps, tombstones for deleted elements, or operation history. This overhead can be substantial for large datasets.
- Performance characteristics differ from traditional data structures. Some operations that are simple in non-replicated structures become more complex in CRDTs. List CRDTs, for example, have higher computational costs than simple arrays because they must maintain ordering metadata.
- Not all data types have obvious CRDT representations. While counters, sets, and registers work well, some data structures are difficult to model as CRDTs. Complex business rules or constraints that depend on global state are particularly challenging. For instance, ensuring a bank account never goes negative requires coordination that CRDTs can’t provide by themselves.
- Semantic conflicts can still occur even though technical conflicts don’t. A CRDT might correctly merge two users’ simultaneous edits to a document, but the result might be nonsensical. For example one user writes a sentence while another deletes the paragraph containing it. The CRDT produces a consistent result, but it might not match either user’s intent. Some level of application logic or user awareness remains necessary.
- Garbage collection is tricky for some CRDTs. Tombstones marking deleted elements must be retained to prevent those elements from reappearing when old operations are replayed. Over time, tombstones accumulate, consuming memory. Safely removing old tombstones requires knowing all replicas have seen the deletion, which requires coordination.
CRDTs in Real Systems
Several production systems have successfully deployed CRDTs to manage high availability and seamless data synchronization.
- Riak and Redis: Riak, a distributed NoSQL database, pioneered the use of CRDTs for counters, sets, and maps to enable coordination-free updates. Similarly, Redis Enterprise utilizes CRDT-based data structures to power multi-region, “Active-Active” deployments.
- Apple: For its iCloud service, Apple reportedly leverages CRDTs to sync Notes and Reminders across devices. This ensures that concurrent edits merge predictably without data loss, even when devices are offline.
- Figma: The collaborative design platform employs CRDT-inspired techniques to facilitate real-time co-editing. While not a “pure” decentralized CRDT, their architecture borrows these principles to maintain a consistent state across global clients.
- Microsoft and Open Source: Microsoft’s Fluid Framework combines CRDT concepts with specialized ordering logic to scale collaborative Office experiences to millions of users. Meanwhile, libraries like Automerge provide a standardized foundation for developers to build similar conflict-free collaborative tools.
Combining CRDTs with Other Approaches
CRDTs work well alongside other distributed system techniques. Many systems use CRDTs for data that benefits from their conflict-free properties while using traditional approaches for data requiring stronger consistency guarantees.
You might use CRDTs for user-facing collaborative features where eventual consistency is acceptable, while using strongly consistent databases for financial transactions or inventory management where immediate accuracy is essential. This hybrid approach provides the best of both worlds.
Event sourcing combines naturally with CRDTs. Storing operations as events and using CRDTs to compute current state from those events provides both auditability and conflict-free replication. The event log becomes the authoritative history, while CRDT state is a derived view.
Operational transformation, an older technique for collaborative editing, can be combined with CRDTs. Some systems use CRDTs for the underlying data model while borrowing OT ideas for presentation or user interface concerns.
When to Use CRDTs
CRDTs are ideal for collaborative applications where multiple users or devices modify shared data simultaneously. Document editors, spreadsheets, whiteboards, and design tools benefit immensely from CRDT properties. The ability to merge concurrent edits automatically enables seamless collaboration.
Offline-first applications need CRDTs or similar techniques. Mobile apps that must function without connectivity, allowing users to work offline and sync later, rely on conflict-free merging to provide good user experiences.
Distributed databases aiming for high availability use CRDTs to allow replicas to accept writes independently without coordination. This improves responsiveness and resilience but requires accepting eventual consistency rather than immediate consistency.
Conversely, don’t use CRDTs when you need immediate consistency or have complex constraints requiring global coordination. Banking systems preventing overdrafts, inventory systems preventing overselling, or reservation systems preventing double-booking need traditional coordination mechanisms that CRDTs can’t provide.
If your data doesn’t actually need distribution or collaboration, simpler data structures are better. The overhead and complexity of CRDTs only make sense when you genuinely need their conflict-free properties.
The Future of CRDTs
Research has moved past basic proofs and is now focused on making CRDTs as efficient as traditional data structures. Modern designs, such as Delta-state CRDTs, have significantly reduced network overhead by transmitting only incremental changes rather than full states. Meanwhile, new techniques in causal stability tracking allow systems to “garbage collect” old metadata, solving the long-standing problem of ever-growing file sizes.
As the industry shifts toward Local-first software, CRDTs have become the foundational architecture for the next generation of apps. This philosophy moves the primary “source of truth” to the user’s device, treating the cloud as a secondary synchronization point. Because CRDTs ensure that data can be merged from any device at any time, they are the only viable way to meet modern user expectations for instant, offline-capable, and multi-player experiences.
We are also seeing a “democratization” of the technology. Major cloud providers and edge platforms (like Cloudflare and Ably) are now offering managed CRDT services, abstracting away the complex mathematics. This allows developers to implement sophisticated collaborative features without needing a PhD in distributed systems.
Ultimately, CRDTs represent a fundamental shift in distributed computing. By building conflict resolution directly into the data’s mathematical DNA (using properties like commutativity and associativity) we no longer have to treat concurrent edits as “errors” to be fixed. Instead, they are simply states to be merged.