Data & Citation

The packaged Scrutica datasets and how to cite them. Each bundle ships as a CSV with a paired JSON Schema, methodology Markdown, SHA-256 checksum, and Croissant JSON-LD descriptor; the Citation view carries the reuse forms for the datasets and for any number rendered on the site.

Sovereign AI Program Index

Sovereign AI investment programs across roughly thirty jurisdictions, with announced / committed / disbursed spend stages and qualitative interdependence ratings (NVIDIA / TSMC / U.S. / BIS reach). The three funding stages are tracked separately because conflating them is the single most common source of inflation in sovereign-AI reporting.

Version 2026.334 rowsReleased 2026-07-17License CC-BY-4.0SHA-256 19a19e3f91fa…

Download bundle

CSVFlat data, UTF-8 Data DictionaryColumn definitions JSON SchemaColumn types MethodologyMarkdown Croissant JSON-LDMLCommons 1.0 SHA-256Integrity

Hosted mirrors

ZenodoDeposition pending — a versioned Zenodo record with a resolvable DOI is in preparation. Until it lands, cite by version, URL, SHA-256, and retrieval date.

Pending

Hugging FaceMirror pending — a Hugging Face dataset mirror is in preparation. The canonical download above is complete and hash-verifiable in the meantime.

Pending

Citation

Gringras, D., & Scrutica. (2026). Scrutica Sovereign AI Program Index (Version 2026.3) [Data set]. Zenodo. (DOI pending deposition)

BibTeX

@dataset{scrutica_sovereign_ai_index_2026_3,
  author    = {Gringras, David and {Scrutica}},
  title     = {Scrutica Sovereign AI Program Index},
  year      = {2026},
  version   = {2026.3},
  publisher = {Zenodo},
  note      = {DOI pending Zenodo deposition; cite by version + SHA-256 hash from /datasets/sovereign-ai-index until assigned},
  url       = {https://scrutica.com/datasets/sovereign-ai-index}
}

Methodology

# Scrutica Sovereign AI Program Index — v2026.3

**Release date:** 2026-07-17
**License:** CC-BY-4.0
**Suggested citation:** see `CITATION` block at the bottom.

## Coverage

34 sovereign AI investment programs across 34 jurisdictions. Each row carries three distinct funding stages — `announced_usd`, `committed_usd`, `disbursed_usd` — because conflating them is the single most common source of the "$X billion sovereign AI" inflation pattern in trade press. `announced_govt_only_usd` separates the government-pledged component from stacked private FDI pledges (Stargate, Cumulative Microsoft, etc.).

## Funding-stage definitions

- **Announced**: total publicly-attributed pledge, including private FDI partners. Inflation-prone.
- **Committed**: subset that has cleared legislative / board sign-off, where verifiable.
- **Disbursed**: subset that has actually been contracted / spent, where verifiable.

`*_is_estimated = true` flags when the figure is analyst-assigned (e.g., a backed-out fraction of a stacked pledge).

## Interdependence scoring

Four qualitative axes (low / medium / high) describe each program's structural exposure:

- `nvidia_dependency`: reliance on NVIDIA accelerators
- `tsmc_dependency`: reliance on TSMC foundry capacity
- `us_dependency`: reliance on U.S.-jurisdiction inputs (chips, tools, IP)
- `bis_jurisdiction_reach`: whether the program sits inside BIS Entity-List / FDPR reach

These are not numeric scores; they are analyst calls anchored on the published primary chips, partners, and supply-chain edges of the program. The qualitative ladder is preserved to surface uncertainty.

## Currency handling

`currency_of_record` records the native announcement currency. USD figures use the FX rate in `fx_rate_used` on `fx_rate_date` (with the source in `fx_rate_source`); we do not re-baseline historical announcements to a single year's USD purchasing-power index — the rate is the rate that turns the literal announcement into USD at announcement time.

## Refresh path

The upstream JSON (`data/processed/sovereign_programs.json`) is curated against the program-page primary-source layer in the Scrutica platform; this dataset is regenerated by `scripts/package-datasets.ts`.

## Known limitations

- Some programs (e.g., Chinese provincial / military-aligned) have `structural_non_disclosure = true` — the announced figures should be read as floors, not actuals.
- Stacked pledges are flagged in `key_notes` and broken out in the platform's `/sovereign-ai` decomposition view; the flat CSV preserves the stacked total in `announced_usd`.
- The qualitative interdependence ratings are intended for triage, not for cross-program quantitative ranking.

## Citation

```bibtex
@dataset{scrutica_sovereign_ai_index_2026_3,
author = {Gringras, David and {Scrutica}},
title = {Scrutica Sovereign AI Program Index},
year = {2026},
version = {2026.3},
publisher = {Zenodo},
note = {DOI pending Zenodo deposition; cite by version + SHA-256 hash from /datasets/sovereign-ai-index until assigned},
url = {https://scrutica.com/datasets/sovereign-ai-index}
}
```

Data Provenance

Primary67%Research33%

34 records

Sources

Source	Tier
Primary government disclosures	T1
CRS / Congressional Research Service	T1
NVIDIA earnings calls	T2

Processing

Bundle regenerated by scripts/package-datasets.ts from data/processed/. SHA-256 19a19e3f91fad40a… signs the CSV.

Reading this data

~valueEstimated — derived, modeled, or inferred. Hover for method and source.

valueDerived from another measured field (e.g. power from GPU count).

Source conflict — multiple sources disagree. Click for competing values.

hoverHover any annotated value to see authority tier, method, source, and vintage.

Full methodology

Built for the AI governance research community