Abstract

Despite decades of research and practical experience, developers have few tools for programming reliable distributed applications without resorting to expensive coordination techniques. Conflict-free replicated datatypes (CRDTs) are a promising line of work that enable coordination-free replication and offer certain eventual consistency guarantees in a relatively simple object-oriented API. Yet CRDT guarantees extend only to data updates; observations of CRDT state are unconstrained and unsafe. We propose an agenda that embraces the simplicity of CRDTs, but provides richer, more uniform guarantees. We extend CRDTs with a query model that reasons about which queries are safe without coordination by applying monotonicity results from the CALM Theorem, and lay out a larger agenda for developing CRDT data stores that let developers safely and efficiently interact with replicated application state.

Commentary

This paper is like reading the inside of my head. It’s pointing at some stuff that I think most people who engage with CRDTs on the regular have an intuition for. I like the connection to CALM, which we talked about continuously at Fission. We had already been applying these principles β€” especially with Rhizome and IPVM β€” for some time, so it was exciting to see confirmation from a research group that we held in very high regard. The equally excellent paper New Directions in Cloud Programming takes these ideas even further.

Notes

Quote

Safety: Queries should be sequentially consistent, regardless of the replica at which they are evaluated.

Efficiency: Queries should be evaluated locally without coordination whenever possible.

Simplicity: The query model should be easy for developers to reason about.

Quote

each replica of a CRDT effectively represents an under-approximation of some true global state

Quote

Monotonic queries are the natural query model for CRDTs; their resilience to update re-ordering mirrors the convergent nature of updates within a CRDT. We acknowledge, however, that there are large classes of queries performed on CRDTs which are not monotonic