1 AlphaBell Inc., Nimo@AlphaBell.com · 2 Independent contributor
[ Artifact ] [ Code ] [ Results data ] [ Figures ] [ BibTeX ]
Manuscript under peer review at IJACSA. The full text is not posted publicly during the review period; this page indexes the open reproduction artifact (code, data, results, profiles, and figures) only.
In brief, the study characterizes how far bf16 transformer weights and Q4_K-typed GGUF tensors can be compressed without loss, under one benchmark protocol with byte-exact roundtrip checks and a model-level train/test split. Measured ceilings: about 1.495× for bf16 and about 1.076× for the Q4_K tensor stream (1.041–1.045× at whole-GGUF level); adjacent same-role layers show no usable linear redundancy at bf16 precision. Full numbers, methods, and caveats appear in the manuscript once published.
| Domain | Best lossless ratio | Atlas ceiling | Sample |
|---|---|---|---|
| bf16 weights (Prop. 1 byte-marginal) | 1.488–1.499× | 1.495× | 290 test tensors, ≤7B |
| Q4_K tensor stream | 1.052× | 1.076× | 530 Q4_K-typed test tensors |
| GGUF Q4_K_M whole file | 1.041–1.045× | ≈1.05× | 3 held-out GGUF files (5-model corpus, 3 in test set) |
| Adjacent-layer linear residuals (bf16) | net loss | median Pearson +0.0004 | 250 layer pairs, 2 Qwen2.5 models |
Headline numbers are byte-weighted corpus ratios. Per-tensor distributions and 95% bootstrap CIs in paper §5–§7.
The smoke test runs three lossless compressors against bundled fixtures and verifies their sha256[:16] prefixes match the values measured at publication time. Roundtrip is byte-exact-asserted in every method before the hash is taken. Expected to pass 3/3:
git clone https://github.com/NimoRotem/llm-compression-limits.git cd llm-compression-limits ./reproducibility_smoke_test.sh
| Method | Fixture | Ratio | sha256[:16] |
|---|---|---|---|
| bf16_split | TinyLlama_layer3_q_proj.bin (8 MiB) | 1.4820× | fc3544c35489eb94 |
| qb_mixture_k4 + 277 B profile | Llama32_3B_layer14_gate.q4k.bin (13.5 MiB) | 1.0507× | c79972a64d11b717 |
| decomp_perstream_zstd19_bg | Qwen2.5-0.5B-Q4_K_M.gguf (379 MiB) | 1.0367× | 1c6f077f8c228cf2 |
| Manuscript | Under peer review at IJACSA; full text not posted during review. |
| Reproduction artifact | knowva.ai/llm-compression-limits |
| Code (Apache-2.0) | github.com/NimoRotem/llm-compression-limits |
| Results table | results.jsonl.zst, 7,960 rows / 11,942 verified evals |
| Trained profile | bf16.zl (617 B, OpenZL le-u16) |
| Fixtures | TinyLlama bf16 · Llama-3.2-3B Q4_K · Qwen2.5-0.5B GGUF |
| Smoke checksums | expected-checksums.json |
| Cross-layer pairs | cross-layer-pairs.csv, 250 rows |
| Reproduction notes | NOTES.md |
@misc{rotem_llm_compression_limits_2026,
title = {Practical Limits of Lossless Compression for bf16 Transformer LLM Weights},
author = {Rotem, Nimo and Rotem, Ariel},
year = {2026},
url = {https://knowva.ai/llm-compression-limits/},
note = {Reproduction artifact at github.com/NimoRotem/llm-compression-limits, tag v1.0.0; Software Heritage swh:1:rel:08d597be136278838e8cc2fef2a68303d990208d.}
}
License: Apache-2.0 (code) and CC-BY-4.0 (data, profiles, figures). Smoke test passes 3/3.