βAuthomergeβ (WIP title) is a project that Iβm working on with Ink & Switch to bring authorization to Automerge.
Motivation
Automerge is a CRDT that models concurrently editing JSON documents. It has been used successfully in trusted production environments, but matters of access control have been left to each application. Having an in-built ability to restrict who can read or write to a document β and any policies that govern such behaviour β would aid the adoption of Automerge in such contexts.
Goals
- Coarse read control
- Granular write control
- Partition tolerant revocation
- Post-compromise security (PCS)
- (Unclear if forward secrecy is desirable due to how automerge docs work?)
- It is not possible to retract data that has already been seen by a later revoked party, so backward secrecy is only in the threat model for users that have never been authorized
- Trust-minimized sync servers
- An opinionated, minimal API
- Nice to haves:
- Align write control mechanism with parts of DCGKA
- Transative access (if you have authority over
/foo
, you also have authority over/foo/bar
)
Size:
- Millions of readers
- Tens-of-thousands of documents
- Thousands of writers (e.g. a medium sized biz like the NYT or Snyk)
- Dozens-to-hundreds of admins (or equivalent)
Antigoals
- Constraining all downstream applications to use a small predefined set of policies or roles
- Dependence on interactive protocols
- Reliance on central authority (we will if we find that we have to, but Iβd much prefer to avoid this)
Out of Scope
Crypto is a tool for turning a whole swathe of problems into key management problems. Key management problems are way harder than (virtually all) cryptographers think.
β Lea Kissner on X
This project aims to provide the minimal set of functionality required for a solid access control foundation, and appropriate abstractions to provide to higher levels. As such, all of the following are out of scope of this project, but should be possible to build on top of it:
- User identity (the binding of a human agent to one or more public keys)
- PKI discovery
- Secrets management
- Fully untrusted sync servers. To break down whatβs possible, but not in scope:
- Zero knowledge write control
- Oblivious transfer
- Hiding number of ops
- Hiding which writes are concurrent
- Cryptographic agility (though may be added in a later version)
- FIPS (or similar) compliance
Interaction with Other Subsystems
Static authorisation typically impacts the design of all other layers of a project. As a helpful intuition, the storage layer will need to support data that is encrypted-at-rest, and so its deisgn has a dependency on the auth layer. This means that since the design of an authorisation mechanism may impose downstream constraints, its design should consider such potential impacts on the design of the rest of the stack. As much as possible, this project attempts to minimise imposing such constraints on other layers.
The layering is not this clean (there are far more bidirectional arrows in reality), but as a simplification it can be helpful to think of a picture similar to the below:
ββββββββββββββββββββββββββββββββββ β
β Synchronisation ββ Compression β β
ββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββ β
β Data Store β β
ββββββββββββββββββββββββββββββββββ Constrained
ββββββββββββββββββββββββββββββββββ By
β Encryption ββ Write Access β β
ββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββ β
β Core Data Structures β β
ββββββββββββββββββββββββββββββββββ βΌ
Terminology
Term | Definition |
---|---|
Agent | Something (or someone) with a public key |
Group | A collection of agents |
Capability | The authorty to perform certain actions on a resource |
Attenuation | A reduction of authority |
Read Control | The ability to decrypt some data |
Write Control | The ability to update a document or group |
Effect Control | Contols for things like using a storage quote, pushing to specific replicas, etc |
Wire Compression | The RLE scheme used for moving ops over the wire. This is the main encoding referenced in this doc |
Disk Compression | Another (RLE?) scheme used for in-memory representation(?) |
NOTE
A side note on terminology:
Capabilities are a fairly broad category, but their common philosophy is that the graph of access should describe the flow of authority. For instance, this is how the transative access is described in the read section.
βObjectsβ in full-on object capabilities very more powerful because they have internal state and arbitrary executable code. UCAN simulates this by statically describing a graph, and ridigly following the delegation hierarchy. This can be weakened if we can convergent state that replicas are already pulling (like in a CRDT setting).
Traditionally, object capabilities (like in E-lang) explicitely use fail-stop semantics (PC/EC in PACELC), which puts them decidedly outside of our use case here. From the whiteboarding in Berlin, we can almost certainly do something between certificate capabilities and ocap, since we already want causal+ consistency in features like revocation.
Iβm adopting the term βagentβ here instead of βobjectβ to indicate the different, and to bring it inline with how many folks talk about principles in non-ocap systems.
Threat Model
Sensitive use cases that are in scope include:
- Suprise party planning (the classic low-stakes example)
- Company secrects (i.e. corporate espionage)
- Vulnerable populations (e.g. domestic abuse survivors)
- Journalists, whistleblowers, political activists
Resistent to βnormalβ attacks, but not against specialist state actors. As a very rough intuition, this means resistent to e.g. commercial security or the FBI, but not the NSA or Mossad.
STRIDE
(Because weβre keepinβ it oldschoolβ¦ for now π)
Thereβs a LOT more to analyse via STRIDE, but this is a (very) minimal start to get us thinking about these factors
Property | EdDSA | BLAKE3 | ChaCha20 | DCGKA | Caps | Authβd Sync | Notes |
---|---|---|---|---|---|---|---|
Proofing / AuthN | β | ||||||
Tamper-Proof / Integrity | β | ||||||
(Non-)Repudiation | β | ||||||
(Non-)Disclosure (confidentialty) | β | β | β | Anonymous or ZK mode is probably out of scope; hosts will probably know which keys are allowed to write. Also need to use a message and/or key committing variant (ChaCha20-Blake3?) | |||
DoS / sensorship resistence | β | ||||||
Escelation (of authority) / AuthZ | β | β |
Additional factors include how to handle (or not) history poisoning, such as discovering that CSAM had been written into the document history but has lots of dependent writes on top of it now.
Primitives
Crypto Suite
Feature | Algorithm |
---|---|
Hash | BLAKE3 (possibly Bao) |
Symmetric Crypto | (X)ChaCha20-Poly1305 (possibly (X)ChaCha20-Blake3) |
Assymetric Crypto | Curve25519, EdDSA, X25519 |
Sub-Protocols
Feature | Algorithm |
---|---|
Read Group Key Agreement | DCGKA (βDucklingβ) |
Read Access Revocation | DCGKAβs in-built PCS mechanism |
Transitive Read Access | DCKGA + Caps1 |
Granular Write Access | Modified (partition tolerant) OCap, predicate attenuation |
Write Revocation | Causality locking, backdating detection, default to βwhiteoutβ (skip materialisation) |
High Level Overview
All of the below can be described from a capabilities worldview. This is a flexible technique that deals very close to the ground truth, and can describe other patterns such as RBAC without assuming them as a conceptual βgroundβ.
Read Access
Static read access implies encryption at rest. The two main components are thus:
- Encryption-at-rest scheme
- Level of granularity of encryption envelope (for space efficiency)
- Snapshot access via sharing derived key rather than state for the KDF
- Key agreement groups
- Adding users still an authorised action (interacts with write access)
- Revocation is managed via the DCGKA group management and the automatic PCS
NOTE
Itβs been a hot minute since Iβve read the DCGKA paper, and need to spend some time bringing all of the details back into working memory. Iβm about 60% through brushing up on the details. Expect the below to contain innaccuracies, misunderstandings, and incorrect terms in the meantime.
Encrypted-at-Rest Storage
As we saw with Serenity, encrypting every Automerge operation would be far too space inefficient. Instead, encrypting at wire compression boundaries likely makes sense. This means that runs stay together, and are encrypted with the same key after compression (not before).
Causal Read Access
Itβs probably fair that we assume that if you have access to op 3, you automatically get access to all transative dependent ops. To say another way: backwards security is only a nice-to-have, as it only guards against edge acses and rules out many snapshot read access schemes.
flowchart RL subgraph A[Key Epoch A] op0 op1 op2 end subgraph B[Key Epoch B] op3 end subgraph C[Key Epoch C] op4 op5 end subgraph D[Key Epoch D] op6 end op0 op1 --> op0 op2 --> op1 op3 --> op2 op4 --> op1 op5 --> op3 op5 --> op4 op6 --> op5
ASCII version
β β β β β β β β
Key Epoch B
β β β β β β β β β β β β β β β β β β β β
Key Epoch A ββββββ ββββββ
β βββββββββOp 2ββββΌβββΌββββOp 3ββββββΌββ
β ββββββ ββββββ β
β β β β β β
β β β β β β β β β
β β β β
β β β β β β β β β β β
β β β β β β β β β β βββ β β β Key Epoch D
βΌ Key Epoch C β β β
β ββββββ ββββββ β β ββββββ β ββββββ
βOp 0βββββββββOp 1β βOp 5βββββββββΌβββββOp 6β β
β ββββββ ββββββ β β ββββββ β ββββββ
β² β β β β β β β β β β
β β β β β β
β β β β β β β β βββ β β β β β β β β
β β ββββββ β β
ββββββββββββββββββββββββββββOp 4ββββ
β ββββββ β
β β β β β β β β β β β
In the above diagram, having the key material for Epoch B should grant you access to Epoch A. Access to Epoch D should grat you access to the entire graph.
To avoid
ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
β β β β β β β β
β Encryption β β Encryption β β Encryption β β Encryption β
β Envelope β β Envelope β β Envelope β β Envelope β
β (Key A) β β (Key B) β β (Key C) β β (Key D) β
β β β β β β β β
β β β ββββββββββ β β ββββββββββ β β ββββββββββ β
β βββββββββββββ Key A β ββββββββββββββ Key B β βββββββββββββ Key C β β
β β β ββββββββββ β β ββββββββββ β β ββββββββββ β
β β β β β β β β
β ββββββββββββββββββ β β ββββββββββββββββββ β β ββββββββββββββββββ β β ββββββββββββββββββ β
β β β β β β β β β β β β β β β β
β β Wire β β β β Wire β β β β Wire β β β β Wire β β
β β Compressed β β β β Compressed β β β β Compressed β β β β Compressed β β
β β β β β β β β β β β β β β β β
β β ββββββββββ β β β β ββββββββββ β β β β ββββββββββ β β β β ββββββββββ β β
β β β Op 0 β β β β β β Op 3 β β β β β β Op 4 β β β β β β Op 6 β β β
β β ββββββββββ β β β β ββββββββββ β β β β ββββββββββ β β β β ββββββββββ β β
β β ββββββββββ β β β β β β β β ββββββββββ β β β β β β
β β β Op 1 β β β β ββββββββββββββββββ β β β β Op 5 β β β β ββββββββββββββββββ β
β β ββββββββββ β β β β β β ββββββββββ β β β β
β β ββββββββββ β β β β β β β β β β
β β β Op 2 β β β β β β ββββββββββββββββββ β β β
β β ββββββββββ β β β β β β β β
β β β β β β β β β β
β ββββββββββββββββββ β β β β β β β
β β β β β β β β
ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
While it is possible to derive the state for each point in the history if youβre a member of the group, publishing use cases require granting access without being a member of the group. As such, it likely makes sense to include the symmetric key itself instead of the inputs to the KDF. Each key rotation incurs 32-bytes of symmetric key material, plus 12 or 24 bytes of nonce (depending on our security margin), so a worst-case of 44-56 bytes of encryption overhead per envelope. This can be further reduced by using hashing to derive next keys instead of a full DCGKA key rotation (and thus omitting the intermediate keys), but this loses PCS until the next full (randomised) DCGKA rotation.
Canoncial Hash
An open question is if the βcanonicalβ hash of a particular op (e.g. the first in a wire-compressed run) should be the ciphertext or plaintext. I see advantages to both. Ultimately it doensnβt really change the core system semantics, but it does have a few knock-on effects. TODO swing back and expand.
Just something Iβm playing with:
flowchart RL subgraph enc1[Encryption Envelope 1] Key1((Key 1)) subgraph wc1[Wire Compressed] opB[Op B] -->|after| opA end opA[Op A] -->|after| Key1 end subgraph enc2[Encryption Envelope 2] Key2((Key 2)) subgraph wc2[Wire Compressed] Key2_1((Key 1)) opC[Op C] -->|after| opB opD[Op D] -->|after| opC opE[Op E] -->|after| opD end opC[Op C] -->|after| Key2 end Key2_1((Key 1)) -.->|to read history| Key1
Transative Read Access
This section immedietly starts to cross into write/mutation control. There will be (much) more to say about inter-document authority in the mutation section, but here I wanted to note a few ideas for hierarchical read access.
Several projects in the space have had need for hierarchical document access. At minimum, WNFS, Peergos, and DXOS all have some variation on this. It also closely matches the access control semantics for Dropbox and Google Drive (I double checked after Berlin).
Due to PCS, we are unable to directly derive keys for linked documents. Snapshot encryption is still possible by pointing at a document and including its symmetric key (and then iteratively walking the history backwards as above), but this does not grant access to future updates. I believe that the two best options (thereβs several) in this setting are to either use lockboxes1, or to make full use of the transative groups to run DCGKA on the entire graph of readers. I like that the later option is more passive / has fewer moving parts but in theory both will work., and Iβm increasingly convinced that DCGKA+Caps would be especially clean & secure (fewer points of secret leakage) if we want transitive access like in DXOS, WNFS, etc.
In all cases, we run the risk of over-exposing a key in the case that a revocation hasnβt been communicated to a writer. Thereβs always an incentive for an attacker to try and eclipse messages about their revocation, but unlike write access (which can be handled later), once something is read, itβs been read and thereβs no way to take that back. Just like in WNFS, I think that we just need to be very clear that read revocation is very much βeventualβ. The other option is to add some synchronous step to the protocol, but Iβd like to avoid that as much as possible.
ββββββββββ ββββββββββ ββββββββββ ββββββββββ ββββββββββ
βAlice PKβ β Bob PK β βCarol PKβ β Dan PK β βFran PK β
ββββββββββ ββββββββββ ββββββββββ ββββββββββ ββββββββββ
β² β² β² β² β²
β β β β β
β β β β β
β β β β β
β ββββββββββ β β β
βββββββObject Aβββββββββ β β β²
ββββββββββ β β β
β² β β β
β β β Delegate
β β β Access
β β β β
ββββββββββ β β β
β Doc 1 βββββββββββββββββββββ β
ββββββββββ β
β² β
β β
β β
β β
ββββββββββ β
β Doc 2 ββββββββββββββββββββββββββββββββ
ββββββββββ
NOTE
The exact details of this mechanism are subject to change, but these are the rough mechanics
In the diagram above, updates to Doc 1
would find all of the PKs that are transatively reachable from Doc 1
. This would mean then running DCGKA on the following public keys: Doc 1
, Object A
(i.e. some group), Alice PK
, Bob PK
, Carol PK
, Dan PK
. Note that this excludes Doc 2
and Fran PK
.
NOTE
Esp. since DCGKA runs in , it may make sense to omit intermediate objects PKs, but I have an intuition that not treating them any differently simplifies the protocol. Thereβs possibly some interesting things that we can do with the object PKs hashed with their latest state to force a ratcheting effect on transative dependencies, but further thought is still needed at this time.
Reader Revocation
As mentioned somewhere above, this should be handled straightforwardly by DCGKA by βsimplyβ remving the revoked member from the group, and rotating the group key on next write. Note that this is a distinct question from them being able to retrieve bytes from the network in the future. At minimum they could have a copy (which is also the case in a centralised setting: screenshots are a thing).
Write Access
My current mental image broadly breaks write access into two topics: eventually consistent βagentsβ (to use a modified capabilties term), and granular attenuation.
Agents
Agents are an abstraction over users, roles, groups, and policies. They provide a consistent interface for modelling authorisation. Here is a simple example of agents modelling the equivalent of users, devices, roles, etc:
flowchart TD PvH[[PvH]] -->|caveat: only reads| Laptop PvH --> Phone Laptop --> Firefox Laptop --> CLI Phone --> MobileSafari Tablet --> TabletApp Tablet --> MobileChrome IS[[Ink & Switch]] --> PvH IS -->|caveat: only write| Alex[[Alex]] Alex --> Tablet LoFiConfDoc --> AdamW[[AdamW]] --> dotAW[...] LoFiConfDoc>LoFi Conf Doc] --> IS GrantDoc>Grant Doc] --> IS GrantDoc -->|caveat: only comment| NLNet[[NLNet]] --> OfficeDesktop NLNet --> dot[...]
Following this graph includes attenuation. Since any agent may have more than one path to a resource, we can either calculate that implicitely, or request that a path be provided at write time. The best practice is to avoid ambient authority at all costs, and referencing the Merkle path may have acceptable overhead (and be compressable).
Granular Attenuation
We didnβt firmly settle on this in Berlin, but to achieve the project goals of granular auth thatβs flexible enough for applications to define their own policies (and from experience on UCAN), Iβd recommend an attenuatable policy DSL. In UCAN this is syntactically deriven (more on this in a moment). It is attached as caveats to a delegation chain (βpath of authorityβ), and is constrains writes by adding predicates to the JSON that can be included in the update that validate on write.
The exact details of if we want a single policy per agent, or to attach policies to each delegation is still open. This is a major distinction between certificate- and object-capabilities, but ultimately the semantics are the same. The tradeoff is essentially that in one we need to tack more internal state in an agent, and in the other we have more agents.
I have no concerns about this working for peers with read access, but trust-minimised sync servers are unable to validate such updates. Whether capabilities, RBAC, or some other mechanism, weβre going to have the same challenge. It is technicallt possible to validate these in zero knowledge. Were we to go down that path, we could gain quite a lot in terms of privacy and verifiable computation (e.g. produce materialised views that are known to be βcorrectβ) while having untrusted validators, but techniques like SNARKs are not generally in the category of βsimpleβ.
Instead, I (currently) propose one of three options, thogh the first two are the main ones:
Option A: Public Write Groups
A three layer system:
- Layer 1: Write group membership (only PKs) in general is public. Note that these PKs donβt need to represent βthe userβ, but can be one of many delegated public keys under their control. This is still correlatable to the same agent, but does not have to expose which human is behind those keys. This lets a validator accept signatures from any keys in the public group.
- Layer 2: Encrypted policies that are validated by other users with read access.
- Layer 3: Encrypted-at-rest data (per the read access section)
βββββββββββββββββββββββββββββ
β β
β World-Readable Group PKs β
β β
βββββββββββββββ²ββββββββββββββ
β
β
βββββββββββββββΌββββββββββββββ
β β
β Encrypted At Rest β
β β
β βββββββββββββββββββββββββ β
β β β β
β β Document Policies β β
β β β β
β βββββββββββββββββββββββββ β
β βββββββββββββββββββββββββ β
β β β β
β β Document Ops β β
β β β β
β βββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββ
Option B: Authorised Sync
Make this a problem of the sync protocol. If youβre allowed to push to another replica, then youβre provably allowed to write to that replicaβs store. This can still lead to cases where a mallicious node cases a lot of bandwidth to get used, but itβs more minimized than βanyone can use as much storage as they want, and the replica has no way to checkβ.
Option C: Zero Knowledge
See my earlier comments wrt SNARKs or similar ZK protocol, but in short: they would solve this challenge but at a complexity cost (and overall they have poor tooling). It may be worth considering, but my intuition is that this is a last resort. We may be able to explore other homomorphic schemes or blinding protocols as well, but I donβt knwo if we have a strong enough case to go this deep. Iβm open to hearing disagreement here, though!
Writer Revocation
Revocation in this view is βjustβ an action like any other. Instead of updating the data in an Automerge document log, youβre writing an event to the auth log. Attaching a special auth log per application document (in addition to βfreeβ auth agents) seems to make sense at first glance. These can then reference each other in their caual histories, like so:
flowchart TD subgraph CRDT Agent direction RL addPk3[Alice adds Carol\nπ©βπ§β‘οΈπ©βπΎ] --> addPk2[Alice adds Bob\nπ©βπ§β‘οΈπ¨βπ«] --> addPk1[Agent adds Alice\nπ€β‘οΈπ©βπ§] addPk4[Alice adds Dani\nπ©βπ§β‘οΈπ§ββοΈ] --> addPk2 revokePk3[Bob removes Carol\nπ¨βπ«β‘οΈβπ©βπΎ] --> addPk4 revokePk3 --> addPk3 addPk6[Erin adds Fran\nπ§βπ€β‘οΈπ·ββοΈ] --> revokePk3 addPk5 --> addPk2 addPk6 --> addPk5[Bob adds Erin\nπ¨βπ«β‘οΈπ§βπ€] end subgraph CRDT Document direction RL opF --> opC[Op C\nBy Dani π§ββοΈ] --> opB[Op B\nBy Alice π©βπ§] --> opA[Op A\nBy Alice π©βπ§] opF --> opD[Op D\nBy Carol π©βπΎ] --> opB opF[Op F\nBy Bob π¨βπ«] --> opE[Op E\nBy Bob π¨βπ«] --> opB opF -.-> opG[Maybe Bad Op?\nBy Concurrently\nRevoked Carol π©βπΎ] -.->|concurrent with revocation| opE end opA -.-> addPk1 opB -.-> addPk5 opF -.-> revokePk3 revokePk3 -.-> opC revokePk3 -.-> opD
This causal history property is especially important for βlockingβ the history for detecting backdated (or even legitimately concurrent) writes from the revoked PK.
We have established that there are broadly two classes for dealing with revocation: monotone and non-monotone. In the monotone scenario, Automerge detects a concurrent write from an unauthorised user which has causal dependents; it treats this as if it had been backspaced over in the final materialised view. In various non-monotone variants, this scenario is handled by dropping those writes and asking any users that have written on top of them to effectively βrebaseβ. At present, we know that we want to support detection of this scenario, and likely the default behaviour of not materialising those updates, but also eventually the possibly the option to plug in arbitrary application behaviour to actaully drop those writes (or use another mechanism like resealing at a higher authority).