- Spec Website
- The Automerge compressed binary document format spec
- Authors
Notes
Automerge is an operation-based CRDT that models concurrently editing a JSON document. This means that there are lot of little changes. In their uncompressed form, this often leads to a lot of change metadata overhead. The goals of this spec is to compress those changes to as small of a representation as possible by taking advantage of how people actually write JSON in practice (i.e. in a linear sequence of changes).
At its core, this is a columnar format with larger data lke hashes extracted into indices. I donβt think that anything prevents these from describing broken or circular documents, but this will be discovered during decompression.
Types
Action
Actions are a sum of op types supported by Automerge. These are rougly the AST leaves.
Layout
βββββββββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββ¬βββββββββββ
β Magic Bytes β Checksum βChunk Typeβ Length β
β β β β β β β β β β ββββ β β β β β β β β βΌ β β β β βββ β β β β β€
β 0x85 0x6f 0x4a 0x83 β Truncated SHA256 β Enum β ULEB-128 β
βββββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββ΄βββββββββββ€
β β
β β
β Content β
β β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
NOTE
I think that the spec uses the terms βchunkβ and βblockβ interchangably. Iβve used βchunkβ everywhere in these notes for consistency, but there may be some distinction that Iβm just not picking up on.
Chunk Type
The chunk type
field contains the following enum:
Tag | Meaning | Description |
---|---|---|
0x00 | Document Chunk | Graph of related changes |
0x01 | Change Chunk | Single βchangeβ (many ops) |
0x02 | Compressed Change Chunk | Same as 0x01 , but DEFLATE compressed |
Content
Based on the Chunk Type, the Content (payload) section is laid out differently.
Document Chunk
Documents represent the entire(?) causal history
Change Chunk
A βchangeβ is perhaps better described as a βcommitβ. The distinction is that it may contain arbitrarily many ops, but theyβre considered to be related. Also like a commit, a βchangeβ does not include the entire document history.
Compressed Change Chunk
The same as Change Chunk, except compressed with DEFLATE.
Questions
Documenting both for my own understanding, and to provide (hopefully) helpful feedback to Alex for the next update of the spec.
- Why these three specific chunk types?
- Purely out of curiosity: how were the magic bytes derived?
- I assume that the Heads Index only contains the minimal set of hashes, not hashes for every change
- Per the name, this would only need to be the latest (concurrent) heads
- Document Chunks: the name may be confusing
- Do these always contain the entire document, or can they be the entire document broken into e.g. TCP packets?
- Was it called a βchangeβ instead of a βcommitβ because thereβs no required commit message?
- FWIW, I find this terminology confusing (Iβll get over it, but itβs not immedietly clear)
- each op changes the document; I expected βopβ and βchangeβ to be synonyms
- FWIW, I find this terminology confusing (Iβll get over it, but itβs not immedietly clear)
- Actions donβt include things like appending to text
- Does Automerge not use something like RGA?
- I can look this up myself, but Iβd expect it to
- Same w.r.t. the
del
action: are strings expressed as[Char]
, maybe?
- ANSWER
- Automerge uses rich text, which is likely Peritext
- Based on the description in the Automerge docs, this is done with spans into existing strings, which means no need to track each keystroke
- Does Automerge not use something like RGA?