Reproduction artifact for Practical Limits of Lossless Compression for bf16 Transformer LLM Weights
CPU-only benchmark; smoke test passes 3/3 in seconds; full benchmark reproduces in ~6.5 h on c3-highcpu-88. Includes the evaluation scripts, manifests, trained-codec artifacts, smoke-test fixtures, and figure-generation pipeline used for the results reported in the paper.
Paper: paper.md. Code: github.com/NimoRotem/llm-compression-limits (Apache-2.0 for code; CC-BY-4.0 for data, results, profiles, figures). Reproduction notes: NOTES.md. Software Heritage: swh:1:rel:08d597be136278838e8cc2fef2a68303d990208d.
Headline empirical findings
| Domain | Best lossless ratio | Atlas-derived ceiling |
| bf16 (290-tensor test corpus) | 1.499× byte-weighted (trained OpenZL le-u16, 617 B profile; 1.4986 geomean per-tensor) | 1.495× byte-weighted (R_marginal; median per-tensor 1.498) |
| Q4_K tensor stream (530-tensor test corpus) | 1.052× byte-weighted (qb_mixture_k4; 1.0517 geomean) | 1.076× |
| GGUF whole file (3 held-out models) | 1.0422× geomean (decomp_perstream_zstd19_bg); deployable artifact-level 1.041–1.045× | — |
| Cross-layer bf16 redundancy (250 pairs, Qwen2.5 family) | net null — median Pearson +0.0004 | — |
Browse
- paper.md — the paper live
- README.md — repo entry point: layout, citation, reproduce recipe live
- NOTES.md — what is and is not in this artifact set live
- code/ — Apache-2.0 core harness sources (browse via GitHub; per-file URLs at
/code/<file> on this mirror, e.g. code/atlas.py) live
- methods/bf16-split/ — bf16 byte-split predictor + mixture-CDF live
- methods/qk-mixture-cdf/ — Q4_K mixture-CDF coder + 277 B qb_k4.profile live
- reproductions/dfloat11/ — pointer to official DFloat11 kernels not measured here
- reproductions/unweight/ — pointer to official Cloudflare Unweight kernels not measured here
- reproductions/zipnn/ — pointer to official ZipNN repository not measured here
- results/results.jsonl.zst — 7,960 rows / 11,942 verified method-evals live
- manifest/model-manifest.json — 13 source models actually measured live
- manifest/expected-checksums.json — three measured prefixes; smoke test 3/3 passing live
- manifest/dependencies.txt — pinned versions + official reproduction URLs live
- prompts/inference-sanity-prompts.json — 8 prompts; gpt2 verified 8/8 (sanity_check_gpt2.jsonl) live
- profiles/bf16.zl — 617 B trained OpenZL le-u16 bf16 profile (fp16 profile not vendored — see paper §6 lead) live
- profiles/bench_bf16zl.jsonl — 290-tensor benchmark, 1.4986 geomean live
- fixtures/ — TinyLlama bf16 (8 MiB) + Llama-3.2-3B Q4_K (13.5 MiB) + Qwen2.5-0.5B GGUF (379 MiB) live
- figures/ — 5 figures × {pdf, png} live
- notebooks/figures.ipynb — populated; render_figures.py CLI alongside live
- notebooks/lomo-tables.ipynb — LOMO aggregation cells live
- cross-layer-pairs.csv — 250 (model, role, layer-pair) rows from the cross-layer correlation atlas (paper §8) live
- provenance/iter4/FINAL_VERDICT.md — bf16 supplementary stage verdict live
- provenance/iter5/FINAL_VERDICT.md — Q4_K + cross-layer stage verdict live
- provenance/iter6/FINAL_VERDICT.md — GGUF artifact stage verdict live
- smoke-test/ — fixtures + checksum gate,
./reproducibility_smoke_test.sh exits 0 live
How to reproduce
git clone https://github.com/NimoRotem/llm-compression-limits.git
cd llm-compression-limits
git checkout v1.0.0
./reproducibility_smoke_test.sh # 3/3 sha256[:16] checks pass
Expected output (sha256[:16] of compressed bytes):
bf16_split(TinyLlama_layer3_q_proj.bin) → fc3544c35489eb94
qb_mixture_k4(Llama32_3B_layer14_gate.q4k.bin) → c79972a64d11b717
decomp_perstream_zstd19_bg(Qwen2.5-0.5B-Q4_K_M.gguf) → 1c6f077f8c228cf2
The script auto-fetches the 379 MiB GGUF from this mirror if absent. End-to-end roundtrip is byte-exact for all three methods.
License
Code: Apache-2.0 (LICENSE). Data / results / profiles / figures: CC-BY-4.0 (LICENSE-DATA).
Paper: paper.md. Code: github.com/NimoRotem/llm-compression-limits.