Notes

Automerge is an operation-based CRDT that models concurrently editing a JSON document. This means that there are lot of little changes. In their uncompressed form, this often leads to a lot of change metadata overhead. The goals of this spec is to compress those changes to as small of a representation as possible by taking advantage of how people actually write JSON in practice (i.e. in a linear sequence of changes).

At its core, this is a columnar format with larger data lke hashes extracted into indices. I don’t think that anything prevents these from describing broken or circular documents, but this will be discovered during decompression.

Types

Action

Actions are a sum of op types supported by Automerge. These are rougly the AST leaves.

Layout

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Magic Bytes    β”‚     Checksum     β”‚Chunk Typeβ”‚  Length  β”‚
β”œ ─ ─ ─ ─ ─ ─ ─ ─ ─ ──│─ ─ ─ ─ ─ ─ ─ ─ ─ β”Ό ─ ─ ─ ─ ─│─ ─ ─ ─ ─ ─
β”‚ 0x85 0x6f 0x4a 0x83 β”‚ Truncated SHA256 β”‚   Enum   β”‚ ULEB-128 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                              β”‚
β”‚                                                              β”‚
β”‚                           Content                            β”‚
β”‚                                                              β”‚
β”‚                                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

NOTE

I think that the spec uses the terms β€œchunk” and β€œblock” interchangably. I’ve used β€œchunk” everywhere in these notes for consistency, but there may be some distinction that I’m just not picking up on.

Chunk Type

The chunk type field contains the following enum:

TagMeaningDescription
0x00Document ChunkGraph of related changes
0x01Change ChunkSingle β€œchange” (many ops)
0x02Compressed Change ChunkSame as 0x01, but DEFLATE compressed

Content

Based on the Chunk Type, the Content (payload) section is laid out differently.

Document Chunk

Documents represent the entire(?) causal history

Change Chunk

A β€œchange” is perhaps better described as a β€œcommit”. The distinction is that it may contain arbitrarily many ops, but they’re considered to be related. Also like a commit, a β€œchange” does not include the entire document history.

Compressed Change Chunk

The same as Change Chunk, except compressed with DEFLATE.

Questions

Documenting both for my own understanding, and to provide (hopefully) helpful feedback to Alex for the next update of the spec.

  • Why these three specific chunk types?
  • Purely out of curiosity: how were the magic bytes derived?
  • I assume that the Heads Index only contains the minimal set of hashes, not hashes for every change
    • Per the name, this would only need to be the latest (concurrent) heads
  • Document Chunks: the name may be confusing
    • Do these always contain the entire document, or can they be the entire document broken into e.g. TCP packets?
  • Was it called a β€œchange” instead of a β€œcommit” because there’s no required commit message?
    • FWIW, I find this terminology confusing (I’ll get over it, but it’s not immedietly clear)
      • each op changes the document; I expected β€œop” and β€œchange” to be synonyms
  • Actions don’t include things like appending to text
    • Does Automerge not use something like RGA?
      • I can look this up myself, but I’d expect it to
      • Same w.r.t. the del action: are strings expressed as [Char], maybe?
    • ANSWER