
clinTrialData is a community-grown
library of clinical trial example datasets for R. The package
ships with a core set of datasets and is designed to expand over time —
anyone can contribute a new data source, and users can download any
available study on demand without waiting for a new package release.
Data is stored in Parquet format and accessed through the connector
package, giving a consistent API regardless of which study you are
working with.
The core idea is simple: datasets live as assets on GitHub Releases, not inside the package itself. This means:
# What's available to download from GitHub Releases?
list_available_studies()
#> source version size_mb cached
#> 1 cdisc_pilot v0.1.0 3.7 TRUE
#> 2 cdisc_pilot_extended v0.1.0 4.3 FALSE
# Inspect any study before downloading — fetches a tiny metadata file
dataset_info("cdisc_pilot_extended")
#> ──────────────────────────────────────────────────────────────────────────
#> cdisc_pilot_extended (v0.1.0)
#> ──────────────────────────────────────────────────────────────────────────
#> Enhanced CDISC Pilot 01 study with urinalysis data
#>
#> Domains & datasets:
#> adam (12): adsl, adae, adlb, adlbc, adlbh, adlbhy, adlburi, ...
#> sdtm (22): ae, cm, dm, ds, ex, lb, mh, qs, relrec, sc, ...
#>
#> Subjects: 254
#> Version: v0.1.0
#> License: CDISC Pilot — educational use
#> Source: https://github.com/cdisc-org/sdtm-adam-pilot-project
#> ──────────────────────────────────────────────────────────────────────────
# Download once; cached locally from then on
download_study("cdisc_pilot_extended")
# Connect and analyse — same API for every study
db <- connect_clinical_data("cdisc_pilot_extended")
adsl <- db$adam$read_cnt("adsl")# Install from CRAN
install.packages("clinTrialData")
# Or the development version from GitHub:
# install.packages("remotes")
remotes::install_github("Lovemore-Gakava/clinTrialData")library(clinTrialData)
# What's already on your machine?
list_data_sources()
# What's available to download?
list_available_studies()
# Download a study (only needed once — cached locally after that)
download_study("cdisc_pilot")
# Connect and explore
db <- connect_clinical_data("cdisc_pilot")
db$adam$list_content_cnt() # list ADaM datasets
db$sdtm$list_content_cnt() # list SDTM datasets
adsl <- db$adam$read_cnt("adsl")
dm <- db$sdtm$read_cnt("dm")cdisc_pilot — Standard CDISC Pilot 01 study (11 ADaM, 22 SDTM datasets). Available immediately after installation, no download needed.
cdisc_pilot_extended — Enhanced CDISC Pilot 01 study (11 ADaM, 24 SDTM datasets) with additional features:
download_study("cdisc_pilot_extended")
connect_clinical_data("cdisc_pilot_extended")Use list_data_sources() to see all locally available
studies and list_available_studies() to see everything on
GitHub Releases.
Adding a new study to the library does not require a pull request or a CRAN submission. The data lives on GitHub Releases, not inside the package.
adam/, sdtm/):your_study/
├── adam/
│ ├── adsl.parquet
│ └── adae.parquet
└── sdtm/
├── dm.parquet
└── ae.parquet
source("data-raw/upload_to_release.R")
# Upload the data zip
upload_study_to_release("your_study", tag = "v1.1.0")
# Generate and upload the metadata (enables dataset_info() for your study)
generate_and_upload_metadata(
source = "your_study",
description = "Brief description of your study",
version = "v1.1.0",
license = "Your license here",
source_url = "https://link-to-original-data",
tag = "v1.1.0"
)dataset_info("your_study") # inspect before downloading
download_study("your_study") # download and cache
connect_clinical_data("your_study")All datasets — whether bundled or downloaded — are automatically protected from accidental modification. Reading is always allowed; write and delete operations are blocked with a clear error message.
The extended datasets are derived from the CDISC Pilot Study data.
Original Source: CDISC SDTM/ADaM Pilot Project
Modifications: This extended version includes additional derived variables (TRTDURY) and a simulated urinalysis dataset (ADLBURI) created for educational and development purposes.
Acknowledgments: We acknowledge and thank CDISC for making the original pilot data available. The extended datasets maintain the structure and quality of the original data while adding features to support additional analysis scenarios.
# Browse all vignettes
vignette(package = "clinTrialData")
# Cache location
cache_dir()