A Conflict-free Replicated Data Type is a data structure that can be replicated across multiple peers, edited concurrently without coordination, and merged deterministically — no consensus protocol required. The key insight is that if your merge function forms a join-semilattice (i.e. it’s associative, commutative, and idempotent), then replicas converge regardless of message ordering or delivery.
CRDTs are one of the core enabling technologies for local-first software — they let you keep working offline and sync up later without conflict resolution dialogs.
Two Flavours
| State-based (CvRDT) | Operation-based (CmRDT) | |
|---|---|---|
| What’s sent | Full state | Individual operations |
| Merge | Join on lattice | Apply op (must be commutative) |
| Delivery | At least once | Exactly once, causal order |
| Bandwidth | Higher (sends whole state) | Lower (sends deltas) |
| Typical use | Simpler protocols, gossip | Real-time collaboration |
In practice the line blurs — Automerge is technically op-based but ships a columnar binary format that looks a lot like state, and delta-state CRDTs split the difference.
Common CRDT Types
| Type | What it models | Merge strategy |
|---|---|---|
| G-Counter | Grow-only counter | Per-replica max |
| PN-Counter | Increment/decrement counter | Two G-Counters |
| G-Set | Grow-only set | Union |
| OR-Set (Observed-Remove) | Add/remove set | Tag elements with unique IDs; union of adds minus observed removes |
| LWW-Register | Last-writer-wins single value | Timestamp comparison |
| MV-Register | Multi-value register | Keep all concurrent values (let the app decide) |
| RGA / LSEQ / Fugue | Ordered sequences (text) | Interleave with unique position IDs |
| Map / JSON | Nested documents | Recursive merge per key |
Why CRDTs Are Hard
The theory is elegant; the engineering is not. A few recurring pain points:
- Tombstones — deleting something from a set means you have to remember that you deleted it (forever, or at least until you can prove all replicas saw the delete). This is the “metadata grows without bound” problem.
- Large documents — naive CRDTs store the full operation history. For a text document, that means every keystroke. Compaction and garbage collection are non-trivial — see Automerge Binary Document Format for how Automerge handles this with columnar compression.
- Access control — CRDTs assume all replicas are trusted. If you want to revoke someone’s access or enforce permissions, you need something on top — this is exactly the problem Keyhive solves.
- Encryption — syncing encrypted CRDTs means reconciling state you can’t inspect. Subduction addresses this by operating entirely on content hashes and fingerprints. WNFS took a different approach — an encrypted file system built on CRDTs with a Skip Ratchet for key management.
- Rich text — merging concurrent edits to formatted text (bold, headings, lists) is significantly harder than plain text. See PeriText for the state of the art.
The CALM Connection
The CALM theorem (Consistency And Logical Monotonicity) gives a formal foundation: any monotone program is eventually consistent without coordination. CRDTs are the data structure instantiation of this — their merge functions are monotone by construction. Keep CALM and CRDT On extends this further with a monotone query model that tells you which reads are safe without coordination.
My Work in This Space
Most of my current research at Ink & Switch is about making CRDTs practical for real-world applications:
| Project | What it addresses |
|---|---|
| Keyhive | Authorisation for local-first CRDTs (who can read/write?) |
| Subduction | Encrypted peer-to-peer sync (how do you reconcile what you can’t decrypt?) |
| Authomerge | The earlier design exploration behind Keyhive |
| UCAN | Decentralised auth that doesn’t need a server to check permissions |
Further Reading
Papers
- Keep CALM and CRDT On — extends CRDTs with a monotone query model
- Mergeable Replicated Data Types — auto-derived three-way merge from causal graphs
- Convergence — Kleppmann and Alvaro’s survey on CRDTs and eventual consistency
Talks
- The Dark Forest — private CRDTs as a global database
- Feeling the Local-First Elephant — state of the union on local-first (PLF ‘23 keynote)
Notes
- Automerge — the CRDT I work with most
- Automerge Sync — how Automerge’s sync protocol works under the hood
- Local-first — the broader movement CRDTs enable
- CALM — the theoretical foundation