Title: Clinical Trial Example Datasets
Version: 0.1.0
Description: A collection of clinical trial example datasets from multiple sources including the CDISC Pilot 01 study (CDISC https://www.cdisc.org/). All datasets are provided in Parquet format for efficient storage and can be accessed using the 'connector' package. Designed for training, testing, prototyping, and demonstrating clinical data analysis workflows.
Depends: R (≥ 4.1.0)
License: Apache License (≥ 2)
URL: https://lovemore-gakava.github.io/clinTrialData/, https://github.com/Lovemore-Gakava/clinTrialData
BugReports: https://github.com/Lovemore-Gakava/clinTrialData/issues
Encoding: UTF-8
RoxygenNote: 7.3.3
Imports: connector, httr, jsonlite, piggyback, tools
Suggests: arrow, dplyr, ggplot2, testthat (≥ 3.0.0), knitr, rmarkdown, tidyr
Config/testthat/edition: 3
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2026-02-25 21:48:59 UTC; lovemore.gakavagmail.com
Author: Lovemore Gakava [aut, cre, cph]
Maintainer: Lovemore Gakava <Lovemore.Gakava@gmail.com>
Repository: CRAN
Date/Publication: 2026-03-03 11:20:08 UTC

Load a stale study listing and refresh the cached column

Description

Load a stale study listing and refresh the cached column

Usage

.load_stale_studies(reason)

Arguments

reason

Character string describing why the fallback is needed.

Value

A data frame, or NULL if no cache exists.


Package onLoad hook

Description

Called when the package is loaded. Registers bundled and cached study folders as locked (in memory) to prevent accidental data modification. No files are written to disk.

Usage

.onLoad(libname, pkgname)

Arguments

libname

Library name

pkgname

Package name


Set directory permissions (Unix only)

Description

On Unix-like systems, sets the directory and its files to read-only (mode 0555/0444) or read-write (mode 0755/0644). This is a no-op on Windows, where these permission bits are not meaningful. Only applied to paths under the user cache directory.

Usage

.set_permissions(path, read_only = TRUE)

Arguments

path

Directory path.

read_only

Logical; TRUE to make read-only, FALSE to restore.


Path to the cached study-listing file

Description

Returns the path where list_available_studies() stores its last successful result for offline fallback.

Usage

.studies_cache_path()

Get the Local Cache Directory

Description

Returns the path to the local cache directory where downloaded clinical trial datasets are stored. The location follows the platform-specific user data directory convention via tools::R_user_dir().

You can delete any subdirectory here to remove a cached dataset, or clear the entire directory to free disk space.

Usage

cache_dir()

Value

A character string with the path to the cache directory.

Examples

cache_dir()

Check if a study folder can be written to

Description

Returns TRUE if the folder is not locked; FALSE with a warning otherwise.

Usage

can_write_study(study_path, operation = "write to study folder")

Arguments

study_path

Path to the study folder

operation

Description of the operation being attempted

Value

Logical indicating if the operation can proceed


Clinical Trial Datasets

Description

The clinTrialData package contains clinical trial datasets from multiple sources, stored in Parquet format. Data is accessed using connector functions.

Available Data Sources

CDISC Pilot 01 Study

The CDISC Pilot 01 study data includes both ADaM and SDTM domains.

ADaM datasets include:

SDTM datasets include:

Usage

Data sources are discovered by scanning the package directory structure. List available datasets with list_data_sources().

Access data using the connection function:

# Connect to any data source (e.g., CDISC Pilot data)
db <- connect_clinical_data("cdisc_pilot")

# List available datasets
db$adam$list_content_cnt()

# Read a dataset
adsl <- db$adam$read_cnt("adsl")

# See all available data sources
list_data_sources()

Data Format

Datasets are stored in Parquet format:

Source

CDISC Pilot 01 Study Data Various clinical trial data sources

References

CDISC. Clinical Data Interchange Standards Consortium. https://www.cdisc.org/


Connect to Clinical Data by Source

Description

Generic connection function that allows access to any data source in the package. Data sources are automatically discovered by scanning the package's example data directory structure.

Usage

connect_clinical_data(source = "cdisc_pilot")

Arguments

source

Character string specifying the data source. Use list_data_sources() to see all available options.

Value

A connectors object

Examples


# Connect to CDISC Pilot data
db <- connect_clinical_data("cdisc_pilot")

# List available datasets
db$adam$list_content_cnt()

# Read a dataset
adsl <- db$adam$read_cnt("adsl")

# List available sources
list_data_sources()


Connect to Data Source

Description

Generic function to connect to any data source by scanning its directory structure and generating the connector configuration dynamically. Wraps all filesystem connectors with lock protection.

Resolution order:

  1. User cache (downloaded via download_study())

  2. Package-bundled data (⁠inst/exampledata/⁠)

Usage

connect_to_source(source_name)

Arguments

source_name

Name of the data source (e.g., "cdisc_pilot")

Value

A connectors object


Inspect a Clinical Trial Dataset Without Downloading

Description

Fetches and displays metadata for any study available in the clinTrialData library – without downloading the full dataset. Metadata includes the study description, available domains and datasets, subject count, version, and data source attribution.

For studies already downloaded via download_study(), the metadata is read from the local cache and works offline. For studies not yet downloaded, a small JSON file (~2KB) is fetched from the GitHub Release.

Usage

dataset_info(source, repo = "Lovemore-Gakava/clinTrialData")

Arguments

source

Character string. Name of the study (e.g. "cdisc_pilot_extended"). Use list_available_studies() to see all options.

repo

GitHub repository in the form "owner/repo". Defaults to the official clinTrialData release repository.

Value

Invisibly returns the metadata as a named list.

Examples


dataset_info("cdisc_pilot")


Download a Clinical Trial Study Dataset

Description

Downloads a study dataset from a GitHub Release and stores it in the local cache (see cache_dir()). Once downloaded, the study is available to connect_clinical_data() without an internet connection.

Requires the piggyback package.

Usage

download_study(
  source,
  version = "latest",
  force = FALSE,
  repo = "Lovemore-Gakava/clinTrialData"
)

Arguments

source

Character string. The name of the study to download (e.g. "cdisc_pilot"). Use list_available_studies() to see all options.

version

Character string. The release tag to download from. Defaults to "latest", which resolves to the most recent release.

force

Logical. If TRUE, re-download even if the study is already cached. Defaults to FALSE.

repo

GitHub repository in the form "owner/repo". Defaults to the official clinTrialData release repository.

Value

Invisibly returns the path to the cached study directory.

Examples


# Download the CDISC Pilot study
download_study("cdisc_pilot")

# Then connect as usual
db <- connect_clinical_data("cdisc_pilot")


Generate Connector Configuration from Directory Structure

Description

Scans a data source directory and generates a connector configuration list dynamically based on the available parquet files.

Usage

generate_connector_config(source_path)

Arguments

source_path

Path to the data source directory

Value

A list suitable for passing to connector::connect()


Get lock status for a study folder

Description

Returns information about the lock status of a study folder.

Usage

get_lock_status(study_path)

Arguments

study_path

Path to the study folder

Value

A list with components locked (logical) and path (character).


Check if a package is available

Description

Thin wrapper around requireNamespace() to allow mocking in tests.

Usage

has_package(pkg)

Arguments

pkg

Package name.

Value

Logical.


Check if a study folder is locked

Description

Checks whether a study path is locked in the current session, indicating that the data should not be overwritten.

Usage

is_study_locked(study_path)

Arguments

study_path

Path to the study folder

Value

Logical indicating if the folder is locked


List Studies Available for Download

Description

Returns a data frame of all clinical trial studies available as GitHub Release assets, along with their local cache status. Studies marked as cached = TRUE are already downloaded and available for use with connect_clinical_data() without an internet connection.

When GitHub is unreachable, the function falls back to the last successfully fetched listing (if available) and issues a warning. The cached column is always recomputed from the local filesystem.

Requires the piggyback package.

Usage

list_available_studies(repo = "Lovemore-Gakava/clinTrialData")

Arguments

repo

GitHub repository in the form "owner/repo". Defaults to the official clinTrialData release repository.

Value

A data frame with columns:

source

Study name (pass this to download_study() or connect_clinical_data())

version

Release tag the asset belongs to

size_mb

Asset size in megabytes

cached

TRUE if the study is already in the local cache

Examples


list_available_studies()


List Available Clinical Data Sources

Description

Returns information about all clinical datasets available locally – both datasets bundled with the package and any datasets previously downloaded via download_study(). The location column indicates whether a dataset is "bundled" (shipped with the package) or "cached" (downloaded to the user cache directory).

To see datasets available for download from GitHub, use list_available_studies().

Usage

list_data_sources()

Value

A data frame with columns:

source

Dataset name (pass to connect_clinical_data())

description

Human-readable study description

domains

Comma-separated list of available data domains (e.g. "adam, sdtm")

format

Storage format ("parquet")

location

Either "bundled" or "cached"

Examples

list_data_sources()

Lock all study folders

Description

Locks all study folders under a base path (in-memory).

Usage

lock_all_studies(base_path = "inst/exampledata", reason = "Package installed")

Arguments

base_path

Base path to the exampledata directory

reason

Optional reason for the lock

Value

Invisible character vector of locked folder paths


Lock a study folder

Description

Marks a study path as locked for the duration of the current R session. On Unix-like systems, cached study directories are also made read-only at the file-system level via Sys.chmod().

Usage

lock_study(study_path, reason = "Package installed")

Arguments

study_path

Path to the study folder

reason

Optional reason for the lock (included in messages only)

Value

Logical indicating success, invisibly


Remove Content with Lock Check

Description

S3 method for remove_cnt that checks if the study folder is locked before allowing remove operations.

Usage

## S3 method for class 'ConnectorLockedFS'
remove_cnt(connector_object, name, ...)

Arguments

connector_object

The ConnectorLockedFS object

name

The file name to remove

...

Additional arguments passed to the underlying connector

Value

Invisible connector_object


Unlock a study folder

Description

Removes the in-memory lock on a study path, allowing write operations for the remainder of the current R session. On Unix-like systems, also restores write permissions on cached study directories.

Usage

unlock_study(study_path)

Arguments

study_path

Path to the study folder

Value

Logical indicating success, invisibly


Wrap Connectors with Lock Protection

Description

Recursively wraps all ConnectorFS objects with lock protection.

Usage

wrap_connectors_with_locks(obj, study_path)

Arguments

obj

A connectors object or connector object

study_path

Path to the study folder

Value

The wrapped object


Write Content with Lock Check

Description

S3 method for write_cnt that checks if the study folder is locked before allowing write operations.

Usage

## S3 method for class 'ConnectorLockedFS'
write_cnt(connector_object, x, name, overwrite = FALSE, ...)

Arguments

connector_object

The ConnectorLockedFS object

x

The data to write

name

The file name

overwrite

Whether to overwrite existing files

...

Additional arguments passed to the underlying connector

Value

Invisible connector_object