Practical Limits of Lossless Compression for bf16 Transformer LLM Weights With companion measurements on Q4_K tensors in GGUF Q4_K_M files

Nimo Rotem1  ·  Ariel Rotem2

1 AlphaBell Inc., Nimo@AlphaBell.com  ·  2 Independent contributor

Manuscript under peer review at IJACSA. The full text is not posted publicly during the review period; this page indexes the open reproduction artifact (code, data, results, profiles, and figures) only.

In brief, the study characterizes how far bf16 transformer weights and Q4_K-typed GGUF tensors can be compressed without loss, under one benchmark protocol with byte-exact roundtrip checks and a model-level train/test split. Measured ceilings: about 1.495× for bf16 and about 1.076× for the Q4_K tensor stream (1.041–1.045× at whole-GGUF level); adjacent same-role layers show no usable linear redundancy at bf16 precision. Full numbers, methods, and caveats appear in the manuscript once published.

Headline Results

DomainBest lossless ratioAtlas ceilingSample
bf16 weights (Prop. 1 byte-marginal)1.488–1.499×1.495×290 test tensors, ≤7B
Q4_K tensor stream1.052×1.076×530 Q4_K-typed test tensors
GGUF Q4_K_M whole file1.041–1.045×≈1.05×3 held-out GGUF files (5-model corpus, 3 in test set)
Adjacent-layer linear residuals (bf16)net lossmedian Pearson +0.0004250 layer pairs, 2 Qwen2.5 models

Headline numbers are byte-weighted corpus ratios. Per-tensor distributions and 95% bootstrap CIs in paper §5–§7.

Figures

Figure 1, bf16 byte-marginal entropy decomposition
Fig 1. Per-tensor stacked bar of bf16 byte-marginal entropy. PDF
Figure 2, Per-tensor ratio vs R_marginal scatter
Fig 2. Per-tensor best-method ratio vs Rmarginal. PDF
Figure 3, Q4_K nibble entropy histogram
Fig 3. Per-tensor H(nibble) across 530 held-out Q4_K test tensors. PDF
Figure 4, Adjacent-layer Pearson correlation histogram
Fig 4. Pearson correlation across 250 (model, role, K) pairs. PDF
Figure 5, Throughput / ratio Pareto frontier
Fig 5. Decompress MB/s (log) vs byte-weighted geomean ratio across all benchmark methods. PDF

Reproducibility

The smoke test runs three lossless compressors against bundled fixtures and verifies their sha256[:16] prefixes match the values measured at publication time. Roundtrip is byte-exact-asserted in every method before the hash is taken. Expected to pass 3/3:

git clone https://github.com/NimoRotem/llm-compression-limits.git
cd llm-compression-limits
./reproducibility_smoke_test.sh
MethodFixtureRatiosha256[:16]
bf16_splitTinyLlama_layer3_q_proj.bin (8 MiB)1.4820×fc3544c35489eb94
qb_mixture_k4 + 277 B profileLlama32_3B_layer14_gate.q4k.bin (13.5 MiB)1.0507×c79972a64d11b717
decomp_perstream_zstd19_bgQwen2.5-0.5B-Q4_K_M.gguf (379 MiB)1.0367×1c6f077f8c228cf2

Resources

ManuscriptUnder peer review at IJACSA; full text not posted during review.
Reproduction artifactknowva.ai/llm-compression-limits
Code (Apache-2.0)github.com/NimoRotem/llm-compression-limits
Results tableresults.jsonl.zst, 7,960 rows / 11,942 verified evals
Trained profilebf16.zl (617 B, OpenZL le-u16)
FixturesTinyLlama bf16 · Llama-3.2-3B Q4_K · Qwen2.5-0.5B GGUF
Smoke checksumsexpected-checksums.json
Cross-layer pairscross-layer-pairs.csv, 250 rows
Reproduction notesNOTES.md

Citation

@misc{rotem_llm_compression_limits_2026,
  title  = {Practical Limits of Lossless Compression for bf16 Transformer LLM Weights},
  author = {Rotem, Nimo and Rotem, Ariel},
  year   = {2026},
  url    = {https://knowva.ai/llm-compression-limits/},
  note   = {Reproduction artifact at github.com/NimoRotem/llm-compression-limits, tag v1.0.0; Software Heritage swh:1:rel:08d597be136278838e8cc2fef2a68303d990208d.}
}

License: Apache-2.0 (code) and CC-BY-4.0 (data, profiles, figures). Smoke test passes 3/3.