Scrutica
Default-parameters derivation (Hardware path)
10,000 H100 SXM5 GPUs at the documented defaults (interconnect efficiency 0.85, MFU 0.4). The interactive calculator below hydrates on JS and exposes every parameter as an adjustable slider.
| Step | Formula | Result |
|---|---|---|
| Peak FP16 | gpu_count × per_gpu_fp16_tflops | 9.89 × 1018 FLOP/s |
| Effective FP16 | peak × interconnect_efficiency (0.85) | 8.41 × 1018 FLOP/s |
| Training Throughput | effective × mfu_assumption (0.4) | 3.36 × 1018 FLOP/s |
| Daily FLOP Budget | training × 86,400 | 2.91 × 1023 FLOP/day |
At this daily budget, the EU AI Act 1025 FLOP threshold is reached in approximately 34.4 days. Adjust GPU count, model, interconnect efficiency, and MFU in the interactive calculator below.
The calculator above runs three independent estimation paths against any facility. The table applies the same paths to 10 publicly-known frontier-class training runs (≥10²⁵ FLOP) for which the lab disclosed hardware count or Epoch derived a central FLOP estimate. Where both exist, both appear in the same row; the column-level disagreement is what the calculator can audit, not a synthesized best-of figure.
| Model | Hardware (lab disclosure) | Epoch FLOP | Epoch confidence | Source provenance |
|---|---|---|---|---|
Grok 4 xAI · Jul 2025 | 200,000 GPUs (class undisclosed) via Epoch derivation | 5.0e+26 FLOP T2 | Speculative |
GPU count and model known. Narrowest estimation bounds.When the number and type of GPUs in a facility are known. Produces the most precise estimate.
Bounds vary interconnect efficiency and MFU across documented ranges. These are parameter sensitivity ranges derived from input bounds, not distributional estimates.
MFU 0.40 is SemiAnalysis’s reported FP16 figure on trillion-parameter H100 training runs (Patel et al, “100,000 H100 Clusters”, 2024-06-17), corroborated by LLaMA 3 405B at 0.384 and PaLM 540B at 0.462. PUE 1.15 is grounded in operator fleet disclosures (Microsoft FY25 1.16, AWS 2024 1.15, xAI Memphis 1.18, Stargate Abilene 1.12, Google fleet 1.09, Meta FY23 1.08), not in SemiAnalysis, which does not publish a PUE for the 100K H100 reference. GPU share of IT load 0.49 is a Scrutica-editorial midpoint between the in-server share of ~0.55 implied by SemiAnalysis’s 700 W / 1,275 W per-GPU server decomposition and the cluster-level ~0.47 once external networking/storage is added; the cluster-level decomposition itself is Scrutica-derived, not published by SemiAnalysis. Dense FP16 specs throughout (no 2:4 sparsity). Interconnect-efficiency 0.85 reflects NVLink topology (Epoch AI, Sevilla et al, 2022). The calibration substrate is a US hyperscaler-class deployment.
Export-restricted variants ship with reduced NVLink aggregate bandwidth: H800 runs at 400 GB/s vs H100’s 900 GB/s, A800 at 400 GB/s vs A100’s 600 GB/s (NVIDIA / Lenovo product datasheets, corroborated against export-control analyses). The Tensor Core BF16 throughput on the SXM SKUs is architecturally unchanged (NVIDIA never published a public H800 SXM5 datasheet, but the SM count and Tensor Core pathway match H100; the H800 PCIe variant matches H100 PCIe at 1,513 TFLOPS BF16 with sparsity per the Lenovo product guide). The narrowed bandwidth bites at the cluster scale where the defaults were measured: tensor-parallel and pipeline-parallel collectives stall on inter-GPU bandwidth, and achievable MFU drops. For clusters built on these variants, the achievable MFU and interconnect-efficiency parameters live closer to the lower end of the documented range; the calculator does not apply this adjustment automatically because the exact bandwidth penalty is workload-dependent.
The calculator’s default settings are calibrated against measured behavior on a US H100 cluster. Export-restricted variants (H800 in China, A800) carry the same compute throughput but ship with NVLink bandwidth cut from 900 to 400 GB/s on H800 and 600 to 400 GB/s on A800 — the export-control’s structural effect on training throughput. At large cluster scale this lowers achievable utilisation; for facilities operating on those variants, move the MFU and interconnect-efficiency settings toward the lower end of the parameter range rather than accepting the US-calibrated defaults.
GPU TFLOP/s values use dense (no structured sparsity) FP16/BF16 Tensor Core specs: A100: 312, H100/H200: 989.5, MI300X: 1,307.4, B200: 2,250 TFLOP/s. NVIDIA’s 2:4 structured sparsity doubles these rates (H100 = 1,979; B200 = 4,500) and AMD’s matrix engines likewise publish a sparsity-inclusive 2,614.9 for MI300X. Most production training does not enable structured sparsity, so the dense baseline is the right reference and the MFU parameter is calibrated against it. Same convention as Epoch AI’s ml_hardware dataset.
The Compute Cost Index uses the same dense BF16 throughput (989.5 TFLOP/sper H100), so $/petaFLOP-day values compare directly to MFU-normalized throughput here. A reader deriving $/SCU from cloud $/hour against sparsity-inclusive vendor specs will read values ~2× lower than the index; the discrepancy is the dense-vs-sparsity convention, not pricing divergence between the two surfaces.
GPU vendor TFLOP/s figures differ by measurement convention. This calculator uses dense FP16 (no structured sparsity), the convention Epoch AI publishes against; cloud-provider marketing typically quotes a sparsity-inclusive figure roughly twice as high. The two numbers describe the same hardware. The separate Compute Cost Index uses the same dense FP16 standard, so its $/petaFLOP-day values compare directly to the throughput numbers shown here.
Grok 3 xAI · Feb 2025 | NVIDIA H100 SXM5 80GB × 80,000 via Epoch derivation | 3.5e+26 FLOP T2 | Likely |
Llama 4 Behemoth (preview) Meta Platforms, Inc. · Apr 2025 | 32,000 GPUs (class undisclosed) | 5.2e+25 FLOP T2 | Likely |
Gemini 1.0 Ultra Google Deepmind · Dec 2023 | Google TPUv4 | 5.0e+25 FLOP T2 | Speculative |
Llama 3.1-405B Meta Platforms, Inc. · Jul 2024 | NVIDIA H100 80GB × 16,384 | 3.8e+25 FLOP T2 | Confident |
GPT-4 (Mar 2023) OpenAI · Mar 2023 | NVIDIA A100 SXM4 40 GB × 25,000 via Epoch derivation | 2.1e+25 FLOP T2 | Likely |
GPT-4 (Jun 2023) OpenAI · Mar 2023 | NVIDIA A100 SXM4 40 GB × 25,000 via Epoch derivation | 2.1e+25 FLOP T2 | Likely |
Nemotron-4 340B Nvidia · Jun 2024 | NVIDIA H100 80GB SXM5 × 6,144 | 1.8e+25 FLOP T2 | Confident |
Pangu Ultra Huawei Technologies · Apr 2025 | Huawei Ascend NPUs × 8,192 | 1.1e+25 FLOP T2 | Confident |
Inflection-2 Inflection Ai · Nov 2023 | NVIDIA H100 × 5,000 | 1.0e+25 FLOP T2 | Confident |
frontier_model_runs.Epoch's training-FLOP formula is hardware-path single-shot (chip-time × peak FLOP/s × utilization rate). Scrutica runs three back-solves independently — hardware, power-thermal envelope, and capex — then surfaces the spread between them as the credibility signal, with every parameter adjustable inline. The comparison shows where Scrutica adds path multiplicity and per-architecture calibration that Epoch's published methodology does not surface.
| Dimension | Scrutica | Epoch AI | What Scrutica adds |
|---|---|---|---|
| Estimation paths | Three independent back-solves run in parallel: hardware (GPU count × peak TFLOP/s × interconnect × MFU × hours), power-thermal (facility MW / PUE / GPU TDP), capex (facility USD × GPU-fraction / per-GPU price). Reconciliation across the three is the credibility signal; the spread names the parameter doing the work. |
For each facility with disclosed chip count AND disclosed nameplate power, the hardware-path and power-path FLOP estimators run independently. The ranking surfaces the two ends of the agreement distribution: tight rows (credibility signal; independent inputs reconcile under the methodology defaults) and loose rows (substrate-gap suspect; GPU model unknown but power disclosed, or vice versa). Cost path is excluded at the facility level; the methodology panel below explains why.
| Facility | Country | Hardware-path FLOP/day | Power-path FLOP/day | Divergence | Inputs |
|---|---|---|---|---|---|
| Sakura's B200s Phase 2 | JP | 1.00× 0% normalized | 10,000 GPUs 16 MW | ||
| xAI Fulton Georgia | US | 1.02× 2% normalized | 12,448 GPUs 20 MW | ||
| SK Group AWS Uslan Phase 2 | KR | 1.04× 4% normalized | 60,000 GPUs 103 MW | ||
| Ubilink.AI Supercomputer | TW | 1.06× 6% normalized | 1,024 GPUs 2 MW | ||
| Lawrence Livermore NL RZAdams | US | 1.06× 6% normalized | 512 GPUs 1 MW | ||
| Nebius Kansas City Phase 2 | US | 1.07× 7% normalized | 26,000 GPUs 40 MW | ||
| xAI Colossus Memphis Phase 3 | US | 1.07× 7% normalized | 230,000 GPUs 352 MW | ||
| TensorWave MI300X Cluster 1 Phase 1 | US | 1.07× 7% normalized | 1,000 GPUs 2 MW | ||
| Oracle OCI MI300x | US | 1.07× 7% normalized | 16,384 GPUs 25 MW | ||
| Vultr Chicago Cluster | US | 1.07× 7% normalized | 3,000 GPUs 5 MW |
Hardware-path defaults to h100_sxm5 for any facility whose specific GPU model is not disclosed at the row level (the substrate carries gpu_count without per-row hardware_type for most facilities). Both paths use the same H100 baseline, so a divergence here reflects (gpu_count, power_capacity_mw) reconciliation, not hardware-mix uncertainty.
Multi-path reconciliation — the spread is the credibility signal Scrutica surfaces; Epoch does not run multi-path cross-validation on the FLOP figure. |
| Utilization / MFU | Tunable slider; default 0.40 (range 0.20–0.50). Default calibrated to SemiAnalysis H100 reference + LLaMA 3 405B (Meta technical report) + PaLM 540B (Chowdhery et al., Google). | 30-50%range. Quote: “Typical utilization rates for large distributed training runs are around 30-50%.” Estimated by reference to comparable models from similar time periods when unreported (not a fixed default).sourceaccessed 2026-05-20 | Scrutica exposes MFU as a per-facility slider with explicit calibration table; Epoch uses comparable-model inference without per-architecture per-facility tuning. |
| Dense vs sparse computation | BF16 dense throughout; 2:4 sparsity explicitly NOT applied per Cost Index methodology (constraint surfaced on /cost-index methodology section). | Dense default; sparse architectures (Mixture-of-Experts) handled via active-experts heuristic. Quote: "This can be modified by sparsity such as Mixture-of-Experts: in this case, the heuristic should use the number of connections in the number of active experts." sourceaccessed 2026-05-20 | Methodological agreement on dense-by-default; Scrutica makes the 2:4-sparsity-disabled choice explicit at the facility level. |
| Interconnect / server overhead | Interconnect efficiency parameter; default 0.85; tunable slider with NVLink + InfiniBand reference points inline. | Server overhead: 1.82×for multi-GPU (versus 1.0 for single-GPU). Quote: “1.82 if hardware quantity > 1”, derived from the DGX H100 server specification. sourceaccessed 2026-05-20 | Scrutica surfaces interconnect as a tunable per-facility parameter; Epoch fixes server overhead at 1.82× from DGX H100 reference. |
| PUE (power-path) | Default 1.15 (range 1.05–1.45); calibrated against EPRI 2024 data-center energy research + per-facility disclosures where available. | PUE appears in the power formula (PUE × Server overhead × Power per GPU × Hardware quantity) but Scrutica found no per-facility PUE field on Epoch frontier-DC records; methodology page does not surface a published PUE default.Not published sourceaccessed 2026-05-20 | Scrutica surfaces PUE as a tunable per-facility input with explicit EPRI calibration; Epoch uses PUE inside the power formula without surfacing the assumed value. |
| Hours / chip-time | Daily-FLOP at 24h sustained per facility (× 86,400 s/day); facility-side question (capacity available), not training-run-side question (capacity consumed). | Variable chip-time per training run (e.g., 2,500 V100-days). Training-run-side question, not facility-side. sourceaccessed 2026-05-20 | Different question — Scrutica asks "how much can this facility produce daily?" vs Epoch asks "how much did this run consume?". Both are valid; Scrutica's framing supports threshold-monitoring + cost-index downstream. |
| Confidence / uncertainty quantification | Parameter sensitivity ranges, NOT statistical confidence intervals. The spread across three paths (hardware vs power vs capex) names the parameter doing the work; an analyst can shift any slider and watch the bound move. | Three published bands as 90% confidence intervals:
sourceaccessed 2026-05-20 | Different uncertainty philosophies; Scrutica exposes uncertainty as parameter-knob movement (so the user audits the load-bearing parameter), Epoch publishes pre-computed bands per model. |
| Cross-validation against publicly-disclosed runs | FlopCrossValidation table at /flop-engine: lab-disclosed hardware leads when present; Epoch T2 estimates appear in adjacent column. Disagreements (e.g., Pangu Ultra: Huawei 8,192 Ascend NPU disclosure vs Epoch 1.1e25 derived via Ascend-MFU calibration off the NVIDIA path) preserved as spread, not silently picked. | Per-model FLOP estimates published in the Notable Models dataset; cross-validation against lab-primary disclosure is not surfaced as a per-row table.Not published sourceaccessed 2026-05-20 | Scrutica surfaces the cross-validation as a page-rendered table; Epoch publishes the estimates without the cross-walk against lab-primary. |
| GPU fraction of capex (cost-path) | Tunable; default 0.45, calibrated to SemiAnalysis AI-data-center capex composition. | Capex per facility is tracked on the frontier-DC dataset, but a GPU-fraction-of-capex back-solve to GPU count is not published as part of Epoch's FLOP methodology.Not published sourceaccessed 2026-05-20 | Capex path is the third Scrutica back-solve; it widens the bound but acts as the credibility-check against the hardware + power paths. |
| GPU fraction of IT load (power-path) | Tunable; default 0.49, calibrated to SemiAnalysis hyperscaler IT-power decomposition. | Not surfaced as a per-facility tunable in Epoch's published methodology; absorbed into the chip-time × peak FLOP/s × utilization product.Not published sourceaccessed 2026-05-20 | Scrutica's power-path back-solve makes the IT-load split a first-class parameter. |
Epoch's training-FLOP formula is hardware-path single-shot (chip-time × peak FLOP/s × utilization rate). Scrutica runs three back-solves independently — hardware, power-thermal envelope, and capex — then surfaces the spread between them as the credibility signal, with every parameter adjustable inline. The comparison shows where Scrutica adds path multiplicity and per-architecture calibration that Epoch's published methodology does not surface.
Switch to Analyst view (top of page) for the side-by-side methodology table.