METR autonomous task completion time horizons mapped to estimated training compute. Tracks when AI systems cross autonomy thresholds relevant to governance frameworks. For broader compute-performance scaling across 31 benchmarks, see Capability Scaling. Estimate facility training compute with the FLOP Engine. At what level of training compute do AI systems begin completing tasks that take humans hours or days? This tracks autonomous capability thresholds based on METR evaluation data.