Reproduction artifact for Practical Limits of Lossless Compression for bf16 Transformer LLM Weights

CPU-only benchmark; smoke test passes 3/3 in seconds; full benchmark reproduces in ~6.5 h on c3-highcpu-88. Includes the evaluation scripts, manifests, trained-codec artifacts, smoke-test fixtures, and figure-generation pipeline used for the results reported in the paper.

Paper: paper.md. Code: github.com/NimoRotem/llm-compression-limits (Apache-2.0 for code; CC-BY-4.0 for data, results, profiles, figures). Reproduction notes: NOTES.md. Software Heritage: swh:1:rel:08d597be136278838e8cc2fef2a68303d990208d.

Headline empirical findings

DomainBest lossless ratioAtlas-derived ceiling
bf16 (290-tensor test corpus)1.499× byte-weighted (trained OpenZL le-u16, 617 B profile; 1.4986 geomean per-tensor)1.495× byte-weighted (R_marginal; median per-tensor 1.498)
Q4_K tensor stream (530-tensor test corpus)1.052× byte-weighted (qb_mixture_k4; 1.0517 geomean)1.076×
GGUF whole file (3 held-out models)1.0422× geomean (decomp_perstream_zstd19_bg); deployable artifact-level 1.041–1.045×
Cross-layer bf16 redundancy (250 pairs, Qwen2.5 family)net null — median Pearson +0.0004

Browse

How to reproduce

git clone https://github.com/NimoRotem/llm-compression-limits.git
cd llm-compression-limits
git checkout v1.0.0
./reproducibility_smoke_test.sh    # 3/3 sha256[:16] checks pass

Expected output (sha256[:16] of compressed bytes):

The script auto-fetches the 379 MiB GGUF from this mirror if absent. End-to-end roundtrip is byte-exact for all three methods.

License

Code: Apache-2.0 (LICENSE). Data / results / profiles / figures: CC-BY-4.0 (LICENSE-DATA).

Paper: paper.md. Code: github.com/NimoRotem/llm-compression-limits.