# REPRODUCTION NOTICE — `unweight`

This directory is reserved for a faithful re-implementation of **Cloudflare Unweight** from its published description (not the original authors' code).

## Status: NOT IMPLEMENTED in this artifact

Official kernels: **https://github.com/cloudflareresearch/unweight-kernels** (CUDA / BSD-3-Clause)

Cloudflare Unweight is described in:
- **Technical report:** *Unweight: Lossless MLP Weight Compression for LLM Inference*, Ivan Nikulin, Cloudflare Research, 2026 — https://research.cloudflare.com/nikulin2026/ (PDF: https://research.cloudflare.com/papers/unweight-2026.pdf)
- **Cloudflare blog post:** *"Unweight: how we compressed an LLM 22% without sacrificing quality"* — https://blog.cloudflare.com/unweight-tensor-compression/

Headline numbers from the tech report: ~13% MLP-only footprint reduction (gate/up only) or ~22% MLP-projection-wide reduction on Llama 3.1 8B, with 30–40% throughput overhead on H100 SXM5 at v0.

Key constraint for paper comparison: Unweight is Hopper-only (H100/H200, ThunderKittens LCF kernel + WGMMA). The official kernels require CUDA Toolkit 12.4+ and a Hopper GPU and so cannot be run inside the c3-highcpu-88 CPU-only reproduction container — comparison against the official implementation needs a separate GPU-equipped reproduction VM. See `../../DEVIATIONS.md §C`.