Notes
During the Q&A following my PLF 2023 keyote, someone asked about βMRDTsβ. I was unfamiliar and followed up after.
There are some similarities to things we explored as part of the Rhizome Database.
The basic idea is as follows: what if state-based RDTs could automatically derive a merge function? The most general case on any causal graph is a three-way merge. The downside is that you must keep the entire history around, since you need a least common ancestor (LCA) between the nodes that are merging. If youβre using a hash-linked causal history (such as a BFT-CRDT), then you have this already. In fact, the paper calls out content addressed storage as being a great way to implement these ideas. The βhash hardnessβ skip list from WNFS could easily be adapted here for performance, as well.
General Strategy
For example, if thereβs graph along there linesβ¦
flowchart TD
Root([Root]):::irrel --> History([More History]):::irrel --> LCA(Least Common Ancestor)
LCA --> MoreHistoryA([More History]):::irrel --> MoreHistoryB([More History]):::irrel --> Left --> Merged(Merged)
LCA --> MoreHistoryC([More History]):::irrel --> Right --> Merged
linkStyle 0,1,2,3,4,6,7 stroke:grey,stroke-width:0.5px,color:red;
classDef irrel stroke-width:1px,color:grey;
β¦then you ignore all but the causal heads (Left
and Right
) and the LCAβ¦
flowchart TD
LCA --> Left -.-> Merged(Merged)
LCA --> Right -.-> Merged
β¦and perform a merge on the states with the following strategy:
Counter Example
A concrete example form the paper is merging two counters by their state:
flowchart TD
lca(5) -->|"x2 = +5"| left(10) -.-> merged(9)
lca(5) -->|"-1"| right(4) -.-> merged(9)
Note that this is not op-based. We didnβt record two streams ([x2], [-1])
, since doing so would require inspecting the entire history (all of those intermediate nodes that we dropped for the 3-way merge).
IMO this is a much better result than a LWW register, and itβs derived from a simple mechanism.
Extending to ADTs
They go further for arbitrary ADTs. Their example is how to extend the above to a pair of counters:
flowchart TD
lca("(1, 2)") --> left("(3, 4)") -.-> merged("(?, ?)")
lca --> right("(5, 6)") -.-> merged
In a typical LWW, weβd drop one side and get either (3,4)
or (5,6)
. This paper proposes essentially mapping the merge function over both sides. This is a very familiar exercise for anyone with experience in ML-family languages (they use OCaml, Iβm using Haskell). And product type case be reduced to pairs so IMO this gives a great intuition (coproducts are less satisfying because youβre forced to pick a branch).
The above example would reduce as follows:
Or pictorially:
flowchart TD
lca("(1, 2)") -->|"(+2, +2)"| left("(3, 4)") -.-> merged("(1 + 2 + 4, 2 + 2 + 4)") -.-> final("(7, 8)")
lca -->|"(+4, +4)"| right("(5, 6)") -.-> merged
Linearlization
The other way of thinking about this (from the paper) is to treat the causal history as something that can be linearlized. This is great (and again: not unlike Rhizome Database), because you only have to define ordering and reducing functions. Take this causal history:
flowchart LR
a --> b
a --> c
We can think of this as the following merge graph:
flowchart TD
a(a) --> left
a --> right
subgraph left
direction LR
al(a) --> bl(b)
end
subgraph right
direction LR
ar(a) --> cl(c)
end
subgraph linearize
subgraph connect
direction LR
am(a) --> bm(b)
am --> cm(c)
end
subgraph compare
direction LR
ac(a) --> bc(b)
ac --> cc(c)
bc -.-> cc
end
subgraph flatten
direction LR
at(a) --> bt(b) --> ct(c)
end
end
left --> connect
right --> connect --> compare --> flatten
This process produces a linear stream. To perform the generalised merge, you write a standard reducer from the LCA (again, something that we had played with in Rhizome).