Autonomy Monitor

METR autonomous task completion time horizons mapped to estimated training compute. Tracks when AI systems cross autonomy thresholds relevant to governance frameworks; companion to Capability Scaling (31-benchmark surface) and the FLOP Engine (facility-level capacity).

The compute level at which a model starts completing tasks that take a human professional hours rather than minutes, plotted against METR's 50% time-horizon scores. The same compute rung the EU AI Act 10²⁵ FLOP presumption already gates and the AISI evaluation programme already probes; the Threshold Atlas identifies the facilities with the physical compute to put a model across it.

Scenario AnalysisChokepoint Analysis

Geopolitical risks of facilities crossing thresholds

Built for the AI governance research community