CRDT

641 words ~4 min read

A Conflict-free Replicated Data Type is a data structure that can be replicated across multiple peers, edited concurrently without coordination, and merged deterministically -- no consensus protocol required. The key insight is that if your merge function forms a join-semilattice (i.e. it's associative, commutative, and idempotent), then replicas converge regardless of message ordering or delivery.

CRDTs are one of the core enabling technologies for local-first software -- they let you keep working offline and sync up later without conflict resolution dialogs.

Two Flavours

State-based (CvRDT) Operation-based (CmRDT)
What's sent Full state Individual operations
Merge Join on lattice Apply op (must be commutative)
Delivery At least once Exactly once, causal order
Bandwidth Higher (sends whole state) Lower (sends deltas)
Typical use Simpler protocols, gossip Real-time collaboration

In practice the line blurs -- Automerge is technically op-based but ships a columnar binary format that looks a lot like state, and delta-state CRDTs split the difference.

Common CRDT Types

Type What it models Merge strategy
G-Counter Grow-only counter Per-replica max
PN-Counter Increment/decrement counter Two G-Counters
G-Set Grow-only set Union
OR-Set (Observed-Remove) Add/remove set Tag elements with unique IDs; union of adds minus observed removes
LWW-Register Last-writer-wins single value Timestamp comparison
MV-Register Multi-value register Keep all concurrent values (let the app decide)
RGA / LSEQ / Fugue Ordered sequences (text) Interleave with unique position IDs
Map / JSON Nested documents Recursive merge per key

Why CRDTs Are Hard

The theory is elegant; the engineering is not. A few recurring pain points:

  • Tombstones -- deleting something from a set means you have to remember that you deleted it (forever, or at least until you can prove all replicas saw the delete). This is the "metadata grows without bound" problem.
  • Large documents -- naive CRDTs store the full operation history. For a text document, that means every keystroke. Compaction and garbage collection are non-trivial -- see Automerge Binary Document Format for how Automerge handles this with columnar compression.
  • Access control -- CRDTs assume all replicas are trusted. If you want to revoke someone's access or enforce permissions, you need something on top -- this is exactly the problem Keyhive solves.
  • Encryption -- syncing encrypted CRDTs means reconciling state you can't inspect. Subduction addresses this by operating entirely on content hashes and fingerprints. WNFS took a different approach -- an encrypted file system built on CRDTs with a Skip Ratchet for key management.
  • Rich text -- merging concurrent edits to formatted text (bold, headings, lists) is significantly harder than plain text. See PeriText for the state of the art.

The CALM Connection

The CALM theorem (Consistency And Logical Monotonicity) gives a formal foundation: any monotone program is eventually consistent without coordination. CRDTs are the data structure instantiation of this -- their merge functions are monotone by construction. Keep CALM and CRDT On extends this further with a monotone query model that tells you which reads are safe without coordination.

My Work in This Space

Most of my current research at Ink & Switch is about making CRDTs practical for real-world applications:

Project What it addresses
Keyhive Authorisation for local-first CRDTs (who can read/write?)
Subduction Encrypted peer-to-peer sync (how do you reconcile what you can't decrypt?)
Authomerge The earlier design exploration behind Keyhive
UCAN Decentralised auth that doesn't need a server to check permissions

Further Reading

Papers

Talks

Notes

  • Automerge -- the CRDT I work with most
  • Automerge Sync -- how Automerge's sync protocol works under the hood
  • Local-first -- the broader movement CRDTs enable
  • CALM -- the theoretical foundation
Graph