Reproduction artifact for the v4 paper: Practical Limits of Lossless Compression for bf16 Transformer LLM Weights.
Status: Draft v4.0.6 (final). The artifact set is end-to-end operational; the smoke test passes 3/3 locally; figures, fixtures, prompts, the regenerated bf16.zl (617 B), and the eight inference-sanity prompts are all vendored. Canonical archival snapshot: GitHub release v4.0.3 (immutable git tag; release ZIP is byte-identical to this mirror; current head v4.0.6 contains only post-archive housekeeping). Supplementary persistent identifier: Software Heritage SWHID swh:1:rel:55d910f5af170c22719cc9346f4d8a5029f09164 covers the v4.0.3 commit (SH save-task 2330392, archived 2026-05-14). No Zenodo DOI was minted; the Zenodo↔GitHub webhook installation did not complete on this repository.
| Domain | Best lossless ratio | Atlas-derived ceiling |
|---|---|---|
| bf16 (290-tensor test corpus) | 1.4986 geomean (trained OpenZL le-u16, 617 B profile, regenerated v4.0.3) | 1.4979 |
| Q4_K tensor stream (530-tensor test corpus) | 1.0517 geomean (qb_mixture_k4) | 1.076 |
| GGUF whole file (3 held-out models) | 1.0422 geomean (decomp_perstream_zstd19_bgscale) | ≈ 1.05 practical |
| Cross-layer bf16 redundancy (250 pairs) | L_RED — median corr +0.0004 | — |
Source & results live at github.com/NimoRotem/llm-compression-limits (tag v4.0.3, release page: github.com/NimoRotem/llm-compression-limits/releases/tag/v4.0.3).
./reproducibility_smoke_test.sh exits 0 livegit clone https://github.com/NimoRotem/llm-compression-limits.git
cd llm-compression-limits
git checkout v4.0.3
./reproducibility_smoke_test.sh # 3/3 sha256[:16] checks pass
Expected output (sha256[:16] of compressed bytes):
bf16_split(TinyLlama_layer3_q_proj.bin) → fc3544c35489eb94qb_mixture_k4(Llama32_3B_layer14_gate.q4k.bin) → c79972a64d11b717decomp_perstream_zstd19_bg(Qwen2.5-0.5B-Q4_K_M.gguf) → 1c6f077f8c228cf2The script auto-fetches the 379 MiB GGUF from this mirror if absent. End-to-end roundtrip is byte-exact for all three methods.
Code: Apache-2.0 (LICENSE). Data / results / profiles / figures: CC-BY-4.0 (LICENSE-DATA).
v4.0.6 — last update 2026-05-14. Paper draft is final (v4.0.6); SWHID is the persistent identifier.