# REPRODUCTION NOTICE — `unweight`

This directory is reserved for a faithful re-implementation of **Cloudflare Unweight** from its published description (not the original authors' code).

## Status: NOT IMPLEMENTED in this artifact

Official kernels: **https://github.com/cloudflareresearch/unweight-kernels** (CUDA / BSD-3-Clause)

Cloudflare Unweight is described in:
- **Technical report:** *Unweight: Lossless MLP Weight Compression for LLM Inference*, Ivan Nikulin, Cloudflare Research, 2026 — https://research.cloudflare.com/nikulin2026/ (PDF: https://research.cloudflare.com/papers/unweight-2026.pdf)
- **Cloudflare blog post:** *"Unweight: how we compressed an LLM 22% without sacrificing quality"* — https://blog.cloudflare.com/unweight-tensor-compression/

Headline numbers from the tech report (Cf-TR-2026.04.v1, abstract / §1.4 / §2.2 #7): **~30% compression on MLP weights** (gate / up / down projections, i.e. ≈1.43× MLP-scope ratio) and **~20% total model VRAM reduction** on Llama-3.1-8B (≈1.25× whole-model ratio, since only MLP weights are compressed — non-MLP layers are stored verbatim). The asymmetry between MLP-scope and whole-model numbers is a deliberate design choice in the tech report (§2.2 "Selective MLP-only compression").

Key constraint for paper comparison: Unweight is Hopper-only (H100/H200, ThunderKittens LCF kernel + WGMMA). The official kernels require CUDA Toolkit 12.4+ and a Hopper GPU and so cannot be run inside the c3-highcpu-88 CPU-only reproduction container — comparison against the official implementation needs a separate GPU-equipped reproduction VM. See `../../DEVIATIONS.md §C`.
