About

Name: Scrutica Global Compute Infrastructure Database
Creator: Scrutica
License: https://scrutica.com/about

Methodology overview and data source inventory (including coverage notes), alongside limitations disclosure and the CC-BY-SA 4.0 data license.

Defines the platform's scope and coverage boundaries while detailing the citation methodology. RAND flagged that monitoring infrastructure for AI compute governance does not yet exist; this platform is an attempt to build it (with full transparency about methods and limitations).

Why Scrutica exists

Several questions compute governance asks (which facilities can train above a regulatory threshold, how a single chokepoint propagates through the supply chain, which sovereign programmes have execution behind their announcements, which BIS designations bind which firms) require resolving facilities, ownership chains, supply-chain edges, export-control designations, and sovereign procurement programmes against each other at the row level. Several established references already cover parts of this surface (Epoch on frontier facilities, CSET on chip flows, RAND on systemic risk, GovAI on training-compute thresholds, the Federal Register on BIS designations). Scrutica adds the institutional supply-chain data underneath (FactSet Revere edges, WRDS-FactSet competitor relationships, PitchBook LP→fund chains, primary-source government procurement disclosures across the major sovereign-AI programmes), the cross-substrate joins that resolve them against each other at the row level, and the derived metrics built on top (cascade criticality weighted by relationship-value and 3-month price correlation, the compute visibility index, announced-versus-deployed reconciliation per country, $/petaFLOP-day normalisation across cloud providers).

What Scrutica covers

Approximately 4,500 facilities across 110 countries; ~18,600 supply-chain edges between firms in the AI compute supply chain; 3,354 BIS Entity List designations cross-referenced onto compute-universe entities; 34 sovereign AI programmes with announced-versus-deployed reconciliation. The live counts stream into the homepage at every refresh.

Each substrate is bounded by a named source corpus (Epoch's frontier-DC dataset, Epoch's GPU-clusters dataset, the Federal Register's BIS Entity List, FactSet Revere, WRDS-FactSet, CSET ETO), with the per-source coverage rate documented on the methodology page alongside the limits each corpus carries.

What's in the platform

The core dataset connects layers that are historically siloed or structured incompatibly:

Every layer below is also reachable via natural language at /query — the Query Builder is the canonical NL interface to Scrutica's substrate, with 55 tools spanning all data + derived analysis + page content. Every numerical claim in a Query Engine response wraps in a citation tag with authority-tier provenance; a server-side verifier flags any cited record_id absent from the underlying tool-result history.

Facilities

Data centers and fabrication plants with location, operator, power capacity, and (where disclosed or estimable) GPU deployment across 30+ countries.

Supply chain relationships

Thousands of bilateral supplier-customer and parent-subsidiary edges from FactSet Revere, WRDS, and SEC Exhibit 21 filings. Named-entity relationships, not market-share proxies; the graph resolves specific interdependencies between specific companies.

Sovereign AI programs

Government compute commitments tracked separately from private investment, with both announced and deployed figures where the two diverge (they usually do).

Export controls

BIS Entity List designations mapped onto the supply chain graph, connecting regulatory actions to the infrastructure they actually affect.

Compute capacity estimates

Facility-level FLOP estimates via three independent paths (hardware inventory, power draw, capital cost), each with estimation bounds. Every parameter documented on the methodology page and adjustable in-browser.

Much of the underlying data derives from institutional sources (FactSet Revere, WRDS, SEC EDGAR, CSET) that are individually expensive, access-restricted, or both. The platform cross-references across them and maintains provenance (source, URL, confidence tier, estimation flag) on every value.

Limitations

Coverage is structurally biased toward entities with public disclosure obligations. State-owned enterprises, privately held companies, and facilities in jurisdictions with limited reporting are underrepresented, and often precisely the entities most relevant to compute governance.

Compute capacity estimates carry real uncertainty, particularly via the power-based and cost-based paths where assumptions compound. The methodology page makes this visible rather than hiding it.

Cascade analysis is stronger on topology than edge weights. Graph structure from regulatory filings (Tier 1–2); substitutability decay rates are expert-assessed (Tier 3). Useful for identifying structurally critical nodes, less reliable for precise severity prediction at multiple hops.

Built by

David Gringras, physician (Edinburgh) and law graduate (University of Law), Frank Knox Fellow at Harvard (MPH Health Policy; cross-registered at MIT, Harvard Law School, and the Kennedy School). Evaluations and Collaborations Lead on the FATF-to-AI governance translation project at Arcadia Impact / The Future Society; project supervisor at Orion AI Governance on evaluation-independent governance mechanisms. Recent research: Safety Under Scaffolding (arXiv:2603.10044), IatroBench (arXiv:2604.07709), Defensive AI (SSRN), Frontier Lag (arXiv:2605.04135).

Built at Harvard as part of ongoing research on AI governance infrastructure, informed by work on evaluation-independent governance mechanisms (Orion AI Governance) and FATF-to-AI regulatory translation (Arcadia Impact / The Future Society).

davidgringras@hsph.harvard.edu

Corrections, data contributions, and methodological critiques welcome.