Scrutica
GPU count and model known. Narrowest estimation bounds.When the number and type of GPUs in a facility are known. Produces the most precise estimate.
Bounds vary interconnect efficiency and MFU across documented ranges. These are parameter sensitivity ranges derived from input bounds, not distributional estimates.
GPU TFLOP/s values use dense (no structured sparsity) FP16/BF16 Tensor Core specs: A100: 312, H100/H200: 989.5, MI300X: 1,307.4, B200: 2,250 TFLOP/s. NVIDIA’s 2:4 structured sparsity doubles these rates (H100 = 1,979; B200 = 4,500) and AMD’s matrix engines likewise publish a sparsity-inclusive 2,614.9 for MI300X. Most production training does not enable structured sparsity, so the dense baseline is the right reference and the MFU parameter is calibrated against it. Same convention as Epoch AI’s ml_hardware dataset.
The Compute Cost Index uses the same dense BF16 throughput (989.5 TFLOP/sper H100), so $/petaFLOP-day values compare directly to MFU-normalized throughput here. A reader deriving $/SCU from cloud $/hour against sparsity-inclusive vendor specs will read values ~2× lower than the index; that gap is the unit mismatch, not a pricing disagreement.
GPU manufacturers advertise peak performance numbers that vary depending on the measurement standard. This calculator uses a single consistent standard (dense FP16 operations without structured sparsity) aligned with Epoch AI’s methodology, which means the numbers here will not match what cloud providers advertise. The difference is methodological, not an error; both produce equivalent results when calibrated consistently. The separate Compute Cost Index uses the same dense FP16 standard, so $/petaFLOP-day values compare directly to the throughput numbers shown here.