Xet Protocol Specification
This specification defines the end-to-end Xet protocol for content-addressed data: chunking and hashing rules, deduplication strategy, xorb and shard object formats, file reconstruction semantics, authentication, and the CAS APIs for upload and download.
Its goal is interoperability and determinism: independent implementations must produce the same hashes, objects, and API behavior so data written by one client can be read by another with integrity and performance.
Implementors can create their own clients, SDKs, and tools that speak the Xet protocol and interface with the CAS service, as long as they adhere to the requirements defined here.
Building a client library for xet storage
Overall Xet architecture
- Content-Defined Chunking: Gearhash-based CDC with parameters, boundary rules, and performance optimizations.
- Hashing Methods: Descriptions and definitions of the different hashing functions used for chunks, xorbs and term verification entries.
- File Reconstruction: Defining "term"-based representation of files using xorb hash + chunk ranges.
- Xorb Format: Explains grouping chunks into xorbs, 64 MiB limits, binary layout, and compression schemes.
- Shard Format: Binary shard structure (header, file info, CAS info, footer), offsets, HMAC key usage, and bookends.
- Deduplication: Explanation of chunk level dedupe including global system-wide chunk level dedupe.