Sparse Merkle Trees for Maximum Privacy DID Updates

Introduction

Beacon aggregation in did:btcr2 reduces on-chain costs by batching multiple DID updates into a single Bitcoin transaction. The simplest aggregation approach (collecting updates into a published map) reveals which DIDs are being updated to anyone who reads it. The SMT (Sparse Merkle Tree) beacon minimizes that leakage: nothing goes on-chain but a single root, the update-to-DID map is never published, and a per-leaf nonce keeps even the aggregation service from telling which cohort members actually updated. It's the most privacy-preserving tier in btcr2's beacon system, and it's built on a data structure with remarkable properties.

What's a Sparse Merkle Tree?

A Sparse Merkle Tree is a Merkle tree where most leaves are empty. The tree has a fixed depth: in btcr2's case, 256 levels, corresponding to the 256-bit output of SHA-256. This means the tree has 2^256 possible leaf positions, far more than could ever be populated.

In a regular Merkle tree, you'd need to store all leaves. In a sparse Merkle tree, empty subtrees are represented by a single default hash. A subtree where every leaf is empty always produces the same hash, so it can be computed once and reused. This makes the tree practical despite its astronomical theoretical size: you only store non-empty leaves and the path from each to the root.

How the SMT Beacon Works

Position Derivation

Each DID has a deterministic position in the tree, derived from the DID identifier:

position = int(SHA-256(did_string))

This maps every possible DID to a unique 256-bit position. Note that the derivation is not what hides identities: the aggregation service must know each participant's DID anyway, since it needs the DIDs to compute their leaf positions and generate the correct proof paths (did:btcr2 privacy considerations). The privacy comes from elsewhere, as the next steps show.

Update Insertion

When a controller wants to anchor an update:

Generate a fresh 256-bit nonce, unique per DID and per signal
Compute the leaf value: hash(hash(nonce) + hash(signedUpdate))
Compute the tree position from the DID (index = int(SHA-256(did)))
Submit the leaf value for that position to the aggregation service

The aggregation service knows the cohort's DIDs (it needs them to build the tree and generate proof paths), but the nonce-mixed leaf value tells it nothing about the update's content. And because a non-updating participant contributes a nonce-only leaf (hash(hash(nonce))) that is indistinguishable from an update leaf, the service can't even tell which positions actually carry a real update (did:btcr2 Optimized SMT appendix).

Root Computation

The aggregation service inserts all submitted leaf values into the SMT and computes the Merkle root. This single 32-byte hash commits to the entire tree state: every leaf at every position. The root is anchored to Bitcoin in an OP_RETURN output.

Inclusion Proofs

To prove their update was included, a controller generates a Merkle inclusion proof: the sibling hashes along the path from their leaf to the root. The proof is logarithmic in the tree depth: at most 256 hashes, though in practice far fewer (empty subtree hashes are constant and don't need to be transmitted).

A verifier can:

Take the controller's claimed leaf value and position
Walk the proof path from leaf to root
Confirm the computed root matches the on-chain root

Non-Update Proofs

The SMT also lets a controller prove a non-update: that a given position carries no new update. In btcr2's design a participant's leaf is never simply "empty": a non-updating participant stores value = hash(hash(nonce)), and proves the non-update by revealing the nonce and supplying the collapsed bitmap plus the non-empty sibling hashes, so the verifier can recompute the root (did:btcr2 Optimized SMT appendix). Because index = int(SHA-256(did)) binds exactly one leaf to each DID, this is what lets a resolver confirm there is no update for a DID at a specific anchor point.

Privacy Analysis

What the aggregation service learns

The cohort's DIDs and their leaf positions (it needs them to build the tree and generate proofs)
A nonce-mixed leaf value for each position
Not whether any given participant actually updated: the per-leaf nonce makes updates and non-updates indistinguishable
Nothing about the content of any update (delivered separately, e.g. via sidecar)

What an on-chain observer learns

A single Merkle root hash per beacon transaction
Nothing about the number of updates
Nothing about which positions are populated
Nothing about the DIDs or update content

What a verifier with a proof learns

That a specific update was included at a specific position in a specific root
The relationship between the DID and the position (since they know the DID)
Nothing about other updates in the same tree

Comparison with CAS beacon

The CAS beacon uses a Beacon Announcement Map that links each DID to its update hash; because that map is published to content-addressable storage, anyone who fetches it (not just the aggregation service) can see which DIDs were updated. The SMT beacon publishes no such map: only the root goes on-chain, and the per-leaf nonce hides which cohort members actually updated even from the aggregation service (which still must know the cohort's DIDs to build the tree). The tradeoff is added complexity in proof generation and verification.

Implementation Details

btcr2's reference SMT implementation (@did-btcr2/smt) uses:

Plain node types: The tree is built from BaseNode/LeafNode/ParentNode classes as an in-memory structure; leaves are indexed by index = int(SHA-256(did))
A single null hash for empty subtrees: Empty leaves and subtrees use one fixed NULL_HASH (32 zero bytes) rather than a precomputed per-level array; empty siblings are skipped rather than transmitted, with a converge bitmap (serialized as collapsed) recording which proof levels actually carry a sibling
A build-then-finalize lifecycle: Entries are added with addEntries(), the tree is computed with finalize(), and reset() clears computed hashes so the structure can be reused; there is no per-leaf delete operation
Proof generation: Inclusion and non-update proofs are generated with the minimum set of sibling hashes needed for verification

The tree depth of 256 is deliberately chosen to match SHA-256's output space. This means:

Every DID maps to a unique position (collision resistance inherits from SHA-256)
The tree doesn't need to be resized as more DIDs are created
Position derivation is a single hash operation

When to Use the SMT Beacon

The SMT beacon is the right choice when:

The DID controller wants the strongest privacy btcr2 offers against the beacon operator: the operator learns the cohort's DIDs but, thanks to the per-leaf nonce, can't tell whether any given controller actually updated, and never sees update content
The controller doesn't want on-chain observers or co-participants to learn anything about their updates
The use case involves sensitive identities (whistleblowers, dissidents, medical credentialing)

For less sensitive use cases, the CAS beacon provides the same cost savings with simpler proof handling. And for individual, infrequent updates, the Singleton beacon's simplicity is hard to beat.

Conclusion

The Sparse Merkle Tree beacon pushes did:btcr2's privacy guarantees to their practical maximum. By keeping update content and update-vs-non-update status private from the aggregation service, and revealing nothing but a single root to on-chain observers, it creates an aggregation system where participation is cryptographically shielded. The 256-bit tree provides collision-free position mapping, and the proof system enables efficient verification without exposing co-participants. It's the most complex tier in btcr2's beacon system, and the one that matters most when privacy is non-negotiable.