Scrutica
Methodology overview and data source inventory (including coverage notes), alongside limitations disclosure and the CC-BY-SA 4.0 data license.
Defines the platform's scope and coverage boundaries while detailing the citation methodology. RAND flagged that monitoring infrastructure for AI compute governance does not yet exist; this platform is an attempt to build it (with full transparency about methods and limitations).
The questions compute governance keeps returning to (which facilities can train above a regulatory threshold, how a single chokepoint propagates through the supply chain, which sovereign programmes have execution behind their announcements, which BIS designations bind which firms) all turn on resolving facilities, ownership chains, supply-chain edges, export-control designations, and sovereign procurement programmes against each other at the row level. Established references already cover pieces of that surface (Epoch on frontier facilities, CSET on chip flows, RAND on systemic risk, GovAI on training-compute thresholds, the Federal Register on BIS designations); Scrutica adds the institutional supply-chain layer underneath (licensed supply-chain edges, licensed corporate-ownership and competitor relationships, licensed fund/LP→fund chains, primary-source government procurement disclosures across the major sovereign-AI programmes), the cross-substrate joins themselves, and the derived metrics built on top (cascade criticality weighted by relationship-value and 3-month price correlation, the compute visibility index, announced-versus-deployed reconciliation per country, $/petaFLOP-day normalisation across cloud providers).
Substrate counts as of 2026-05-19; BIS Entity List designations stand at 3,425. The live counts stream into the homepage at every refresh.
Each substrate is bounded by a named source corpus (Epoch's frontier-DC dataset, Epoch's GPU-clusters dataset, the Federal Register's BIS Entity List, a licensed supply-chain database, a licensed corporate-ownership database, CSET ETO), with the per-source coverage rate documented on the methodology page alongside the limits each corpus carries.
The core dataset connects layers held in separate silos, or whose schemas don't compose:
Data centers and fabrication plants with location, operator, power capacity, and (where disclosed or estimable) GPU deployment across 30+ countries.
Thousands of bilateral supplier-customer and parent-subsidiary edges from a licensed supply-chain database, a licensed corporate-ownership database, and SEC Exhibit 21 filings. Named-entity relationships, not market-share proxies; the graph resolves specific interdependencies between specific companies.
Government compute commitments tracked separately from private investment, with both announced and deployed figures where the two diverge (they usually do).
BIS Entity List designations mapped onto the supply chain graph, connecting regulatory actions to the infrastructure they actually affect.
Facility-level FLOP estimates via three independent paths (hardware inventory, power draw, capital cost), each with estimation bounds. Every parameter documented on the methodology page and adjustable in-browser.
Every layer above is also reachable via natural language at /query — the Query Builder is the canonical NL interface to Scrutica's substrate, with 55 tools spanning all data + derived analysis + page content. Every numerical claim in a Query Engine response wraps in a citation tag with authority-tier provenance; a server-side verifier flags any cited record_id absent from the underlying tool-result history.
Much of the underlying data derives from institutional sources — licensed corporate-ownership, supply-chain, and fund/LP databases, alongside public primary sources (SEC EDGAR, CSET ETO) — that are individually expensive, access-restricted, or both. The licensed databases are held under academic subscription and accessed under each provider’s research terms; the substrate stores extracted relational facts with provenance preserved (source class, URL, confidence tier, estimation flag) on every value, and the raw subscription datasets themselves are not redistributed.
Coverage is structurally biased toward entities with public disclosure obligations. State-owned enterprises, privately held companies, and facilities in jurisdictions with limited reporting are underrepresented, and often precisely the entities most relevant to compute governance.
Compute capacity estimates carry real uncertainty, particularly via the power-based and cost-based paths where assumptions compound. The methodology page makes this visible rather than hiding it.
Cascade analysis is stronger on topology than edge weights. Graph structure from regulatory filings (Tier 1–2); substitutability decay rates are expert-assessed (Tier 3). Useful for identifying structurally critical nodes, less reliable for precise severity prediction at multiple hops.
David Gringras, physician (Edinburgh) and law graduate (University of Law), Frank Knox Fellow at Harvard (MPH Health Policy; cross-registered at MIT, Harvard Law School, and the Kennedy School). Evaluations and Collaborations Lead on the FATF-to-AI governance translation project at Arcadia Impact / The Future Society; project supervisor at Orion AI Governance on evaluation-independent governance mechanisms.
Corrections welcome; methodological critique especially.