Help for package clinTrialData

Title:

Clinical Trial Example Datasets

Version:

0.1.3

Description:

A collection of clinical trial example datasets from multiple sources including the CDISC Pilot 01 study (CDISC https://www.cdisc.org/). All datasets are provided in Parquet format for efficient storage and can be accessed using the 'connector' package. Designed for training, testing, prototyping, and demonstrating clinical data analysis workflows.

Depends:

R (≥ 4.1.0)

License:

Apache License (≥ 2)

URL:

https://lovemore-gakava.github.io/clinTrialData/, https://github.com/Lovemore-Gakava/clinTrialData

BugReports:

https://github.com/Lovemore-Gakava/clinTrialData/issues

Encoding:

UTF-8

Language:

en-US

RoxygenNote:

7.3.3

Imports:

connector, httr, jsonlite, piggyback, tools

Suggests:

arrow, dplyr, ggplot2, testthat (≥ 3.0.0), knitr, rmarkdown, tidyr

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-03-16 21:45:35 UTC; lgaka

Author:

Lovemore Gakava [aut, cre, cph]

Maintainer:

Lovemore Gakava <Lovemore.Gakava@gmail.com>

Repository:

CRAN

Date/Publication:

2026-03-17 08:40:08 UTC

Load a stale study listing and refresh the `cached` column

Description

Load a stale study listing and refresh the cached column

Usage

.load_stale_studies(reason)

Arguments

reason

Character string describing why the fallback is needed.

Value

A data frame, or NULL if no cache exists.

Package onLoad hook

Description

Called when the package is loaded. Registers bundled and cached study folders as locked (in memory only) to prevent accidental data modification. No files are written to disk and no file-system permissions are changed.

Usage

.onLoad(libname, pkgname)

Arguments

libname

Library name

pkgname

Package name

Path to the cached study-listing file

Description

Returns the path where list_available_studies() stores its last successful result for offline fallback.

Usage

.studies_cache_path()

Get the Local Cache Directory

Description

Returns the path to the local cache directory where downloaded clinical trial datasets are stored. The location follows the platform-specific user data directory convention via tools::R_user_dir().

You can delete any subdirectory here to remove a cached dataset, or clear the entire directory to free disk space.

Usage

cache_dir()

Value

A character string with the path to the cache directory.

Examples

cache_dir()

Check if a study folder can be written to

Description

Returns TRUE if the folder is not locked; FALSE with a warning otherwise.

Usage

can_write_study(study_path, operation = "write to study folder")

Arguments

study_path

Path to the study folder

operation

Description of the operation being attempted

Value

Logical indicating if the operation can proceed

Clinical Trial Datasets

Description

The clinTrialData package contains clinical trial datasets from multiple sources, stored in Parquet format. Data is accessed using connector functions.

Available Data Sources

CDISC Pilot 01 Study

The CDISC Pilot 01 study data includes both ADaM and SDTM domains.

ADaM datasets include:

ADSL: Subject-Level Analysis Dataset
ADAE: Adverse Events Analysis Dataset
ADLBC: Laboratory Analysis Dataset (Chemistry)
ADLBH: Laboratory Analysis Dataset (Hematology)
ADLBHY: Laboratory Analysis Dataset (Hy's Law)
ADQSADAS: ADAS-Cog Questionnaire Analysis Dataset
ADQSCIBC: CIBC Questionnaire Analysis Dataset
ADQSNPIX: NPI-X Questionnaire Analysis Dataset
ADTTE: Time-to-Event Analysis Dataset
ADVS: Vital Signs Analysis Dataset

SDTM datasets include:

DM: Demographics
AE: Adverse Events
VS: Vital Signs
LB: Laboratory Test Results
And 18 additional domains (see list_data_sources() for details)

Usage

Data sources are discovered by scanning the package directory structure. List available datasets with list_data_sources().

Access data using the connection function:

# Connect to any data source (e.g., CDISC Pilot data)
db <- connect_clinical_data("cdisc_pilot")

# List available datasets
db$adam$list_content_cnt()

# Read a dataset
adsl <- db$adam$read_cnt("adsl")

# See all available data sources
list_data_sources()

Data Format

Datasets are stored in Parquet format:

Columnar storage
Fast reads
Compression
Cross-platform compatibility

Source

CDISC Pilot 01 Study Data Various clinical trial data sources

References

CDISC. Clinical Data Interchange Standards Consortium. https://www.cdisc.org/

Connect to Clinical Data by Source

Description

Generic connection function that allows access to any data source in the package. Data sources are automatically discovered by scanning the package's example data directory structure.

Usage

connect_clinical_data(source = "cdisc_pilot")

Arguments

source

Character string specifying the data source. Use list_data_sources() to see all available options.

Value

A connectors object

Examples


if (interactive()) {
  # Connect to CDISC Pilot data
  db <- connect_clinical_data("cdisc_pilot")

  # List available datasets
  db$adam$list_content_cnt()

  # Read a dataset (requires the arrow package)
  if (requireNamespace("arrow", quietly = TRUE)) {
    adsl <- db$adam$read_cnt("adsl")
  }

  # List available sources
  list_data_sources()
}

Connect to Data Source

Description

Generic function to connect to any data source by scanning its directory structure and generating the connector configuration dynamically. Wraps all filesystem connectors with lock protection.

Resolution order:

User cache (downloaded via download_study())
Package-bundled data (⁠inst/exampledata/⁠)

Usage

connect_to_source(source_name)

Arguments

source_name

Name of the data source (e.g., "cdisc_pilot")

Value

A connectors object

Inspect a Clinical Trial Dataset Without Downloading

Description

Fetches and displays metadata for any study available in the clinTrialData library – without downloading the full dataset. Metadata includes the study description, available domains and datasets, subject count, version, and data source attribution.

For studies already downloaded via download_study(), the metadata is read from the local cache and works offline. For studies not yet downloaded, a small JSON file (~2KB) is fetched from the GitHub Release.

Usage

dataset_info(source, repo = "Lovemore-Gakava/clinTrialData")

Arguments

source

Character string. Name of the study (e.g. "cdisc_pilot_extended"). Use list_available_studies() to see all options.

repo

GitHub repository in the form "owner/repo". Defaults to the official clinTrialData release repository.

Value

Invisibly returns the metadata as a named list.

Examples


dataset_info("cdisc_pilot")

Download a Clinical Trial Study Dataset

Description

Downloads a study dataset from a GitHub Release and stores it in the local cache (see cache_dir()). Once downloaded, the study is available to connect_clinical_data() without an internet connection.

Requires the piggyback package.

Usage

download_study(
  source,
  version = "latest",
  force = FALSE,
  repo = "Lovemore-Gakava/clinTrialData"
)

Arguments

source

Character string. The name of the study to download (e.g. "cdisc_pilot"). Use list_available_studies() to see all options.

version

Character string. The release tag to download from. Defaults to "latest", which resolves to the most recent release.

force

Logical. If TRUE, re-download even if the study is already cached. Defaults to FALSE.

repo

GitHub repository in the form "owner/repo". Defaults to the official clinTrialData release repository.

Value

Invisibly returns the path to the cached study directory.

Examples


if (interactive()) {
  # Download a study not bundled with the package
  download_study("cdisc_pilot_extended")

  # Then connect as usual
  db <- connect_clinical_data("cdisc_pilot_extended")
}

Generate Connector Configuration from Directory Structure

Description

Scans a data source directory and generates a connector configuration list dynamically based on the available parquet files.

Usage

generate_connector_config(source_path)

Arguments

source_path

Path to the data source directory

Value

A list suitable for passing to connector::connect()

Get lock status for a study folder

Description

Returns information about the lock status of a study folder.

Usage

get_lock_status(study_path)

Arguments

study_path

Path to the study folder

Value

A list with components locked (logical) and path (character).

Check if a package is available

Description

Thin wrapper around requireNamespace() to allow mocking in tests.

Usage

has_package(pkg)

Arguments

pkg

Package name.

Value

Logical.

Check if a study folder is locked

Description

Checks whether a study path is locked in the current session, indicating that the data should not be overwritten.

Usage

is_study_locked(study_path)

Arguments

study_path

Path to the study folder

Value

Logical indicating if the folder is locked

List Studies Available for Download

Description

Returns a data frame of all clinical trial studies available as GitHub Release assets, along with their local cache status. Studies marked as cached = TRUE are already downloaded and available for use with connect_clinical_data() without an internet connection.

When GitHub is unreachable, the function falls back to the last successfully fetched listing (if available) and issues a warning. The cached column is always recomputed from the local filesystem.

Requires the piggyback package.

Usage

list_available_studies(repo = "Lovemore-Gakava/clinTrialData")

Arguments

repo

GitHub repository in the form "owner/repo". Defaults to the official clinTrialData release repository.

Value

A data frame with columns:

source: Study name (pass this to download_study() or connect_clinical_data())
version: Release tag the asset belongs to
size_mb: Asset size in megabytes
cached: TRUE if the study is already in the local cache

Examples


if (interactive()) {
  list_available_studies()
}

List Available Clinical Data Sources

Description

Returns information about all clinical datasets available locally – both datasets bundled with the package and any datasets previously downloaded via download_study(). The location column indicates whether a dataset is "bundled" (shipped with the package) or "cached" (downloaded to the user cache directory).

To see datasets available for download from GitHub, use list_available_studies().

Usage

list_data_sources()

Value

A data frame with columns:

source: Dataset name (pass to connect_clinical_data())
description: Human-readable study description
domains: Comma-separated list of available data domains (e.g. "adam, sdtm")
format: Storage format ("parquet")
location: Either "bundled" or "cached"

Examples

list_data_sources()

Lock all study folders

Description

Locks all study folders under a base path (in-memory).

Usage

lock_all_studies(base_path = "inst/exampledata", reason = "Package installed")

Arguments

base_path

Base path to the exampledata directory

reason

Optional reason for the lock

Value

Invisible character vector of locked folder paths

Lock a study folder

Description

Marks a study path as locked for the duration of the current R session. The lock is in-memory only: no file-system permissions are modified.

Usage

lock_study(study_path, reason = "Package installed")

Arguments

study_path

Path to the study folder

reason

Optional reason for the lock (included in messages only)

Value

Logical indicating success, invisibly

Remove Content with Lock Check

Description

S3 method for remove_cnt that checks if the study folder is locked before allowing remove operations.

Usage

## S3 method for class 'ConnectorLockedFS'
remove_cnt(connector_object, name, ...)

Arguments

connector_object

The ConnectorLockedFS object

name

The file name to remove

...

Additional arguments passed to the underlying connector

Value

Invisible connector_object

Unlock a study folder

Description

Removes the in-memory lock on a study path, allowing write operations for the remainder of the current R session.

Usage

unlock_study(study_path)

Arguments

study_path

Path to the study folder

Value

Logical indicating success, invisibly

Wrap Connectors with Lock Protection

Description

Recursively wraps all ConnectorFS objects with lock protection.

Usage

wrap_connectors_with_locks(obj, study_path)

Arguments

obj

A connectors object or connector object

study_path

Path to the study folder

Value

The wrapped object

Write Content with Lock Check

Description

S3 method for write_cnt that checks if the study folder is locked before allowing write operations.

Usage

## S3 method for class 'ConnectorLockedFS'
write_cnt(connector_object, x, name, overwrite = FALSE, ...)

Arguments

connector_object

The ConnectorLockedFS object

x

The data to write

name

The file name

overwrite

Whether to overwrite existing files

...

Additional arguments passed to the underlying connector

Value

Invisible connector_object

Package {clinTrialData}

Load a stale study listing and refresh the cached column

Description

Usage

Arguments

Value

Package onLoad hook

Description

Usage

Arguments

Path to the cached study-listing file

Description

Usage

Get the Local Cache Directory

Description

Usage

Value

Examples

Check if a study folder can be written to

Description

Usage

Arguments

Value

Clinical Trial Datasets

Description

Available Data Sources

CDISC Pilot 01 Study

Usage

Data Format

Source

References

Connect to Clinical Data by Source

Description

Usage

Arguments

Value

Examples

Connect to Data Source

Description

Usage

Arguments

Value

Inspect a Clinical Trial Dataset Without Downloading

Description

Usage

Arguments

Value

Examples

Download a Clinical Trial Study Dataset

Description

Usage

Arguments

Value

Examples

Generate Connector Configuration from Directory Structure

Description

Usage

Arguments

Value

Get lock status for a study folder

Description

Usage

Arguments

Value

Check if a package is available

Description

Usage

Arguments

Value

Check if a study folder is locked

Description

Usage

Arguments

Value

List Studies Available for Download

Description

Usage

Arguments

Value

Examples

Load a stale study listing and refresh the `cached` column