Tidy R access to the Anthropic Economic Index dataset.
The Anthropic Economic Index (AEI) is a recurring open dataset that maps real Claude conversations to occupations and tasks. Anthropic classifies millions of conversations against the U.S. Department of Labor’s O*NET task taxonomy and the Standard Occupational Classification (SOC) system, and publishes the resulting usage shares on Hugging Face under CC-BY-4.0. Each release also splits conversations into automation-style interactions (the user delegates to Claude) and augmentation-style interactions (the user works through a task with Claude). From the September 2025 release onwards, the data is broken down by country and US state. Methodology is documented in Handa et al. (2025); the privacy-preserving classification pipeline is described in Tamkin et al. (2024).
Five releases have shipped between February 2025 and March 2026,
covering Claude 3.5 Sonnet through Opus 4.5/4.6.
aieconindex lists releases, fetches raw and enriched usage
tables, retrieves task statements and request hierarchies, exposes
country and US-state slices, caches downloads, and produces ready-made
citations. Schema differences across releases are handled internally.
Three runtime dependencies (cli, httr2,
jsonlite) plus base R. No API key needed.
install.packages("aieconindex")
# or the development version
# install.packages("remotes")
remotes::install_github("charlescoverdale/aieconindex")R 4.1.0 or later.
library(aieconindex)
# 1. See what's available
aei_releases()
#> # AEI: releases · 5 rows
#> release_id release_date model
#> 1 release_2026_03_24 2026-03-24 Claude Opus 4.5/4.6
#> 2 release_2026_01_15 2026-01-15 Claude Sonnet 4.5
#> 3 release_2025_09_15 2025-09-15 Claude Sonnet 4
#> 4 release_2025_03_27 2025-03-27 Claude 3.7 Sonnet
#> 5 release_2025_02_10 2025-02-10 Claude 3.5 Sonnet
#> ...
# 2. Look inside a release
aei_files("2025-09-15", recursive = TRUE)
# 3. Fetch the canonical usage table
df <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
# 4. Slice to a country
uk <- aei_geography("2025-09-15", country = "GBR")
# 5. Cite the dataset
aei_cite("2025-09-15", format = "bibtex")| Function | Returns |
|---|---|
aei_releases(live = TRUE) |
Available releases (live + bundled metadata) as an
aei_tbl |
aei_files(release, recursive = TRUE) |
Recursive file tree for a release as an aei_tbl with
path, type, size_bytes |
aei_releases(live = FALSE) # offline-safe (uses bundled metadata)
aei_files("latest") # tree of the most recent release
aei_files("2025-03-27", recursive = FALSE) # top-level only| Function | Returns |
|---|---|
aei_index(release, source, variant) |
Canonical usage table as an aei_tbl |
aei_download(release, path) |
CSVs as aei_tbl, JSON as parsed list, other extensions
as local path |
aei_index() locates the canonical usage CSV by
file-pattern matching. Arguments:
source: "claude_ai" (consumer product
traffic) or "1p_api" (first-party API). Not all releases
include both.variant: "raw" (counts and percentages
from Anthropic’s pipeline) or "enriched" (joined to O*NET /
SOC metadata, with derived per-capita and tier metrics). Older releases
may only ship one variant.df_raw <- aei_index("2026-03-24", source = "claude_ai", variant = "raw")
df_enriched <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
df_api <- aei_index("2026-03-24", source = "1p_api", variant = "raw")aei_download() fetches any path returned by
aei_files():
soc <- aei_download("2025-03-27", "SOC_Structure.csv")
hierarchy <- aei_download("2025-09-15",
"data/output/request_hierarchy_tree_claude_ai.json")
report <- aei_download("2026-01-15", "aei_v4_appendix.pdf") # returns local path| Function | Returns |
|---|---|
aei_clusters(release, source) |
Request-hierarchy tree (Clio output) as a parsed nested list |
aei_tasks(release) |
O*NET task statements bundled with the release as an
aei_tbl |
aei_geography(release, country, geography) |
Country or US-state filter on the enriched table |
# Clio-derived request hierarchy (from 2025-09-15 onwards)
tree <- aei_clusters("2025-09-15", source = "claude_ai")
# Bundled O*NET task statements (ships in 2025-03-27)
tasks <- aei_tasks("2025-03-27")
# UK country slice (geographic facets ship from 2025-09-15 onwards)
uk <- aei_geography("2025-09-15", country = "GBR")
# Australia country slice
au <- aei_geography("2025-09-15", country = "AUS")
# US state-level breakdown
us_states <- aei_geography("2025-09-15", geography = "state_us")Country codes are ISO-3 ("GBR", "AUS",
"USA"). Releases before 2025-09-15 have no geographic data;
the function errors informatively.
| Function | Returns |
|---|---|
aei_compare(release_a, release_b, ...) |
Release-on-release diff with value_a,
value_b, delta, pct_change |
aei_link(x, y, by, type) |
Generic merge that preserves the aei_tbl class; for
splicing AEI to user-supplied data on a shared key |
aei_concentration(x, share_col, group_cols, top_n) |
HHI, top-N concentration ratio, Shannon entropy on usage shares |
# How did the cluster shares move between Sept 2025 and March 2026?
diff <- aei_compare("2025-09-15", "2026-03-24")
head(diff[order(-abs(diff$delta)), ])
# Splice AEI country shares to your own GDP-per-capita table
overlay <- data.frame(
geo_id = c("GBR", "AUS", "USA"),
gdp_pc = c(48000, 65000, 80000)
)
joined <- aei_link(aei_geography("2025-09-15"), overlay, by = "geo_id")
# How concentrated is UK Claude.ai usage across O*NET tasks?
uk <- aei_geography("2025-09-15", country = "GBR")
uk_tasks <- uk[uk$facet == "onet_task" & uk$variable == "onet_task_pct", ]
aei_concentration(uk_tasks)aei_link() is a thin wrapper over
base::merge() that preserves the aei_tbl class
and provenance metadata, supports left / inner / full joins, and warns
when a join produces zero rows. Use it to attach occupational crosswalks
(SOC, ANZSCO, ISCO, SOC2020 UK), national labour-force data (ONS, BLS
OEWS, ABS), or anything else keyed on country code or task
identifier.
| Function | Returns |
|---|---|
aei_cite(release, format, method = TRUE) |
Citation in plain text, BibTeX, or bibentry form |
By default aei_cite() returns both the dataset citation
and Handa et al. (2025).
Set method = FALSE for the dataset only.
aei_cite() # text, project-wide, with paper
aei_cite("2025-09-15", format = "bibtex") # BibTeX, both refs
aei_cite("2026-03-24", format = "bibentry") # bibentry object (multi-entry)
aei_cite(format = "text", method = FALSE) # dataset only| Function | Returns |
|---|---|
aei_cache_dir() |
Path of the cache directory (override-aware) |
aei_cache_info() |
List with dir, n_files,
size_bytes, size_human,
files |
aei_cache_clear() |
Clears the cache; invisible NULL |
All data-returning functions emit an aei_tbl: a
data.frame subclass with provenance metadata stored in the
aei_query attribute. The metadata carries
endpoint, the resolved release identifier, the source URL,
and the fetch timestamp; it is preserved across row and column
subsetting.
df <- aei_index("2025-09-15")
attr(df, "aei_query")
#> $endpoint "index"
#> $release "release_2025_09_15"
#> $facet "raw/claude_ai"
#> $source_url "https://huggingface.co/datasets/Anthropic/EconomicIndex/.../aei_raw_claude_ai_*.csv"
#> $fetched_at "2026-04-28 18:34:00 BST"
# Custom print method shows the provenance header
print(df)
#> # AEI: index · release=release_2025_09_15 · facet=raw/claude_ai · 12345 rows
#> ...
# Subsetting preserves the class and attribute
sub <- df[df$value > 1, ]
class(sub)
#> [1] "aei_tbl" "data.frame"The class inherits from data.frame, so any function that
takes a data frame works without conversion. Drop the class with
as.data.frame() if you need a plain frame.
Pin a release for production. Default
release = "latest" resolves to the most recent release at
call time, which is fine for exploration but unsuitable for reproducible
pipelines. Pin a release identifier explicitly:
RELEASE <- "2025-09-15" # or "release_2025_09_15"
df <- aei_index(RELEASE, source = "claude_ai", variant = "enriched")Replicate an Anthropic figure. Anthropic ships
Python replication notebooks (v2_report_replication.ipynb)
inside several releases. To replicate the augmentation-vs-automation
headline figure in R:
df <- aei_download("2025-03-27", "automation_vs_augmentation_v2.csv")
df$family <- ifelse(df$interaction_type %in% c("directive", "feedback loop"),
"Automation", "Augmentation")Country exposure ranking. Top O*NET tasks for the UK by share of Claude.ai usage:
uk <- aei_geography("2025-09-15", country = "GBR")
top <- subset(uk, facet == "onet_task" & variable == "onet_task_pct")
top <- top[order(-top$value), ][1:15, c("cluster_name", "value")]Cross-country comparison. Per-capita usage index for selected economies:
df <- aei_index("2025-09-15", source = "claude_ai", variant = "enriched")
country_overall <- subset(df,
geography == "country" &
variable == "usage_per_capita_index" &
cluster_name == "not_classified" &
level == 0
)
country_overall <- country_overall[order(-country_overall$value), ]Cite in a paper. Drop the BibTeX form straight into
your .bib:
cat(aei_cite("2025-09-15", format = "bibtex"), file = "refs.bib", append = TRUE)The package recognises every release published to Hugging Face up to 2026-03-24 and discovers any newer releases automatically via the Hugging Face tree API.
| Release | Headline model | Notes |
|---|---|---|
release_2025_02_10 |
Claude 3.5 Sonnet | Initial release; O*NET task mappings; automation vs augmentation |
release_2025_03_27 |
Claude 3.7 Sonnet | Cluster-level insights; v2 report replication notebook |
release_2025_09_15 |
Claude Sonnet 4 | Geographic + first-party API data added; long-format schema |
release_2026_01_15 |
Claude Sonnet 4.5 | Economic primitives added |
release_2026_03_24 |
Claude Opus 4.5/4.6 | Learning curves added |
Each release ships its own data_documentation.md on
Hugging Face. The package’s aei_releases() blends bundled
metadata (model, report URL) with a live Hugging Face listing.
Downloaded files are cached under the path returned by
aei_cache_dir(), which defaults to
tools::R_user_dir("aieconindex", "cache"). Override before
the first call:
options(aieconindex.cache_dir = "/your/preferred/path")Cache is keyed by release identifier and relative path, so re-downloads are byte-identical to the original.
aei_cache_info()
#> $dir "/Users/.../aieconindex/cache"
#> $n_files 3
#> $size_bytes 126839425
#> $size_human "121.0 MB"
#> $files <data.frame: 3 rows>
aei_cache_clear() # removes all cached filesThe latest release usage CSVs are around 100 MB each, so the first call to a fresh release is bandwidth-heavy. Subsequent calls are served from disk.
Anthropic ships its own replication code as Jupyter notebooks inside
several releases
(e.g. release_2025_03_27/v2_report_replication.ipynb). For
exact figure replication, use those. aieconindex is the
R-side equivalent of Hugging Face’s Python datasets loader:
typed, cached access to the same source files, with downstream analysis
left to you.
| Package | Description |
|---|---|
inequality |
Inequality and poverty measurement (labour-market distributional context) |
ons |
UK labour market data (employment, wages by occupation) |
fred |
US labour market data (employment, productivity, occupational wages) |
readoecd |
OECD international labour and skills data |
Cite both the package and the underlying dataset:
citation("aieconindex")
aei_cite("2025-09-15", format = "bibtex")aei_cite() returns the dataset citation alongside Handa et al. (2025), the
methodological source paper.
Issues and pull requests welcome at https://github.com/charlescoverdale/aieconindex/issues. Useful contributions for v0.2 include:
For Anthropic-introduced schema changes that break
aei_index() or aei_geography(), please open an
issue with a sample of the new file structure (output of
aei_files(<new_release>)).
This package is released under the MIT License.
The underlying Anthropic Economic Index dataset is released by
Anthropic under Creative Commons
Attribution 4.0 International (CC-BY-4.0). When using this package
to retrieve or redistribute that data, attribution to Anthropic and to
Handa et al. (2025) is
required. Use aei_cite() for ready-made citation
strings.
The bundled O*NET and SOC reference data (when accessed through the AEI) inherit their respective licences. See the O*NET licensing page and the BLS Standard Occupational Classification documentation.
This product uses the Anthropic Economic Index data but is not endorsed or certified by Anthropic.