# Format and quant references llama.cpp commit 5e9f3a2 (2026-01) github.com/ggerganov/llama.cpp gguf-py 0.19.0 github.com/ggerganov/llama.cpp/tree/master/gguf-py # Entropy coders constriction 0.4 github.com/bamler-lab/constriction zstandard (Python binding) 0.22 github.com/indygreg/python-zstandard # Model loading safetensors 0.4.5 github.com/huggingface/safetensors # Clustering / analysis scikit-learn 1.4 scikit-learn.org # Trained-profile codec openzl 0.4.1 github.com/facebook/openzl # Comparison rows (no reproductions vendored in v4.0.3; see DEVIATIONS.md §C) DFloat11 (official) — github.com/LeanModels/DFloat11 Zhang et al., arXiv:2504.11651, NeurIPS 2025 OpenReview: https://openreview.net/forum?id=xdNAVP7TGy Cloudflare Unweight — github.com/cloudflareresearch/unweight-kernels (CUDA, Hopper-only) Nikulin, Cloudflare Research, 2026 Tech report: https://research.cloudflare.com/nikulin2026/ Blog: https://blog.cloudflare.com/unweight-tensor-compression/ ZipNN — github.com/zipnn/zipnn Hershcovitch et al., arXiv:2411.05239 # Hardware reference host c3-highcpu-88, Intel Sapphire Rapids, single-core os Debian 12 python 3.11