{
  "schema_version": "v1.0",
  "notes": [
    "This manifest reflects the actual corpus used in the benchmark runs that produced the results in /provenance/ and /results/.",
    "prefix_match compares the first 8 chars of revision_full against revision_prefix_spec. true => the 8-char spec prefix exactly matches the resolved SHA's leading 8 chars; false => they differ (HF main moved, spec typo, or the spec recorded a stale prefix). The Appendix A.2 / paper.md / body_fixed.tex SHA references all derive from revision_full, never from revision_prefix_spec. See NOTES.md Sec. A for additional discussion of the prefix mismatches.",
    "Five of the seven bf16 models (gpt2, opt-125m, Qwen2.5-0.5B, TinyLlama-1.1B, Qwen2.5-7B-Instruct) have spec prefixes that are off by 1-2 chars vs the resolved SHA; distilbert and MiniLM-L6-v2 have spec prefixes that are wholly unrelated to the resolved SHA (likely a spec-authoring slip). All five bartowski GGUF rows had moved entirely on HF main between spec authorship and the benchmark date.",
    "The cross-layer (Sec. 8) corpus uses SmolLM2-1.7B as a test-model substitute for Llama-3.2-3B (which is HF-gated). See provenance/ for per-stage BUILD_LOG and FINAL_VERDICT files.",
    "The cross-layer (Sec. 8) test phase was scoped out by the L_RED pre-check verdict (median |corr| = 0.0004 vs the 0.5 L_GREEN threshold); the 250 measured pairs all come from the cross_layer_train models (Qwen2.5-0.5B and Qwen2.5-1.5B, both Qwen2.5 family). The cross_layer_test_substitute entry (SmolLM2-1.7B) was prepared as a Llama-3.2-3B substitute but never measured; the Qwen2.5-7B-Instruct cross_layer_test attribution from an earlier draft of this manifest was incorrect for the same reason."
  ],
  "models": [
    {
      "repo_id": "gpt2",
      "revision_prefix_spec": "607a30d4",
      "revision_full": "607a30d783dfa663caf39e06633721c8d4cfcd7e",
      "prefix_match": false,
      "prefix_match_note": "spec 607a30d4 vs resolved 607a30d7 — differ in 8th char (HF main moved between spec authorship and benchmark date)",
      "formats": [
        "bf16",
        "fp16"
      ],
      "tensor_counts": {
        "bf16": 81,
        "fp16": 81
      },
      "split": "train",
      "total_bytes": 498000000,
      "used_in": [
        "bf16_main",
        "bf16_supplementary"
      ]
    },
    {
      "repo_id": "distilbert-base-uncased",
      "revision_prefix_spec": "1c4513b2",
      "revision_full": "12040accade4e8a0f71eabdb258fecc2e7e948be",
      "prefix_match": false,
      "prefix_match_note": "spec 1c4513b2 wholly unrelated to resolved 12040acc (spec-authoring slip)",
      "formats": [
        "bf16",
        "fp16"
      ],
      "tensor_counts": {
        "bf16": 73,
        "fp16": 73
      },
      "split": "train",
      "total_bytes": 268000000,
      "used_in": [
        "bf16_main",
        "bf16_supplementary"
      ]
    },
    {
      "repo_id": "facebook/opt-125m",
      "revision_prefix_spec": "27dcfa7c",
      "revision_full": "27dcfa74d334bc871f3234de431e71c6eeba5dd6",
      "prefix_match": false,
      "prefix_match_note": "spec 27dcfa7c vs resolved 27dcfa74 — differ in 8th char",
      "formats": [
        "bf16",
        "fp16"
      ],
      "tensor_counts": {
        "bf16": 75,
        "fp16": 75
      },
      "split": "train",
      "total_bytes": 500000000,
      "used_in": [
        "bf16_main",
        "bf16_supplementary"
      ]
    },
    {
      "repo_id": "sentence-transformers/all-MiniLM-L6-v2",
      "revision_prefix_spec": "e4ce9879",
      "revision_full": "c9745ed1d9f207416be6d2e6f8de32d1f16199bf",
      "prefix_match": false,
      "prefix_match_note": "spec e4ce9879 wholly unrelated to resolved c9745ed1 (spec-authoring slip)",
      "formats": [
        "bf16",
        "fp16"
      ],
      "tensor_counts": {
        "bf16": 13,
        "fp16": 13
      },
      "split": "test",
      "total_bytes": 90000000,
      "used_in": [
        "bf16_main",
        "bf16_supplementary"
      ]
    },
    {
      "repo_id": "Qwen/Qwen2.5-0.5B",
      "revision_prefix_spec": "060db6e4",
      "revision_full": "060db6499f32faf8b98477b0a26969ef7d8b9987",
      "prefix_match": false,
      "prefix_match_note": "spec 060db6e4 vs resolved 060db649 — differ in last 2 chars",
      "formats": [
        "bf16",
        "fp16"
      ],
      "tensor_counts": {
        "bf16": 121,
        "fp16": 121
      },
      "split": "test",
      "total_bytes": 988000000,
      "used_in": [
        "bf16_main",
        "bf16_supplementary",
        "cross_layer_train"
      ]
    },
    {
      "repo_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
      "revision_prefix_spec": "fe8a4eaa",
      "revision_full": "fe8a4ea1ffedaf415f4da2f062534de366a451e6",
      "prefix_match": false,
      "prefix_match_note": "spec fe8a4eaa vs resolved fe8a4ea1 — differ in last 2 chars",
      "formats": [
        "bf16",
        "fp16"
      ],
      "tensor_counts": {
        "bf16": 156,
        "fp16": 156
      },
      "split": "test",
      "total_bytes": 2200000000,
      "used_in": [
        "bf16_main",
        "bf16_supplementary"
      ]
    },
    {
      "repo_id": "Qwen/Qwen2.5-7B-Instruct",
      "revision_prefix_spec": "a09a3543",
      "revision_full": "a09a35458c702b33eeacc393d103063234e8bc28",
      "prefix_match": false,
      "prefix_match_note": "spec a09a3543 vs resolved a09a3545 — differ in 8th char",
      "formats": [
        "bf16"
      ],
      "tensor_counts": {
        "bf16": 196
      },
      "split": "7B_validation",
      "total_bytes": 12400000000,
      "used_in": [
        "bf16_7B_validation"
      ],
      "cross_layer_note": "Was planned for cross-layer test set; cross-layer test phase was scoped out by the L_RED pre-check verdict (see provenance/iter5/FINAL_VERDICT.md), so no cross-layer measurements were taken on this model."
    },
    {
      "repo_id": "Qwen/Qwen2.5-1.5B",
      "revision_full": "<resolved at run time; see notes>",
      "formats": [
        "bf16"
      ],
      "split": "cross_layer_train",
      "used_in": [
        "cross_layer_train"
      ]
    },
    {
      "repo_id": "HuggingFaceTB/SmolLM2-1.7B",
      "revision_full": "<resolved at run time; substituted for Llama-3.2-3B which is gated>",
      "formats": [
        "bf16"
      ],
      "split": "cross_layer_test_substitute",
      "used_in": [],
      "cross_layer_note": "Downloaded as a test-set substitute for the HF-gated Llama-3.2-3B in case the cross-layer benchmark needed a Llama-family substitute; cross-layer test phase was scoped out by the L_RED pre-check verdict (see provenance/iter5/FINAL_VERDICT.md), so no measurements were taken on this model. Kept in the manifest as audit-trail of the prepared corpus."
    },
    {
      "repo_id": "bartowski/Qwen2.5-0.5B-Instruct-GGUF",
      "quantization": "Q4_K_M",
      "revision_prefix_spec": "a8b21f63",
      "revision_full": "41ba88dbac95fed2528c92514c131d73eb5a174b",
      "prefix_match": false,
      "formats": [
        "Q4_K"
      ],
      "tensor_counts": {
        "Q4_K": 168,
        "Q6_K": 24
      },
      "split": "train",
      "total_bytes": 397808192,
      "used_in": [
        "Q4_K_train",
        "GGUF_artifact_train"
      ]
    },
    {
      "repo_id": "bartowski/Qwen2.5-1.5B-Instruct-GGUF",
      "quantization": "Q4_K_M",
      "revision_prefix_spec": "b4d309e1",
      "revision_full": "9eadc66189c7641e1ddd226b8267a9119b2ce2d4",
      "prefix_match": false,
      "formats": [
        "Q4_K"
      ],
      "tensor_counts": {
        "Q4_K": 345,
        "Q6_K": 56
      },
      "split": "train",
      "total_bytes": 986000000,
      "used_in": [
        "Q4_K_train",
        "GGUF_artifact_train"
      ]
    },
    {
      "repo_id": "bartowski/Qwen2.5-7B-Instruct-GGUF",
      "quantization": "Q4_K_M",
      "revision_prefix_spec": "c7e5a82d",
      "revision_full": "8911e8a47f92bac19d6f5c64a2e2095bd2f7d031",
      "prefix_match": false,
      "formats": [
        "Q4_K"
      ],
      "tensor_counts": {
        "Q4_K": 196,
        "Q6_K": 28
      },
      "split": "test",
      "total_bytes": 4680000000,
      "used_in": [
        "Q4_K_test",
        "GGUF_artifact_test"
      ]
    },
    {
      "repo_id": "bartowski/Llama-3.2-3B-Instruct-GGUF",
      "quantization": "Q4_K_M",
      "revision_prefix_spec": "0cb88a4f",
      "revision_full": "5ab33fa94d1d04e903623ae72c95d1696f09f9e8",
      "prefix_match": false,
      "formats": [
        "Q4_K"
      ],
      "tensor_counts": {
        "Q4_K": 168,
        "Q6_K": 28
      },
      "split": "test",
      "total_bytes": 2020000000,
      "used_in": [
        "Q4_K_test",
        "GGUF_artifact_test"
      ]
    },
    {
      "repo_id": "bartowski/Mistral-7B-Instruct-v0.3-GGUF",
      "quantization": "Q4_K_M",
      "revision_prefix_spec": "e0bc86c7",
      "revision_full": "61fd4167fff3ab01ee1cfe0da183fa27a944db48",
      "prefix_match": false,
      "formats": [
        "Q4_K"
      ],
      "tensor_counts": {
        "Q4_K": 166,
        "Q6_K": 32
      },
      "split": "test",
      "total_bytes": 4370000000,
      "used_in": [
        "Q4_K_test",
        "GGUF_artifact_test"
      ]
    }
  ]
}