In this vignette, we will explore the OmopSketch functions
designed to provide information about the number of counts of specific
concepts. Specifically, there are two key functions that facilitate
this, summariseConceptCounts()
and
plotConceptCounts()
. The former one creates a summary
statistics results with the number of counts per each concept, and the
latter one creates a histogram plot.
Let’s see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using Eunomia database.
library(dplyr)
library(CDMConnector)
library(DBI)
library(duckdb)
library(OmopSketch)
# Connect to Eunomia database
con <- DBI::dbConnect(duckdb::duckdb(), CDMConnector::eunomia_dir())
cdm <- CDMConnector::cdmFromCon(
con = con, cdmSchema = "main", writeSchema = "main"
)
cdm
#>
#> ── # OMOP CDM reference (duckdb) of Synthea synthetic health database ──────────
#> • omop tables: person, observation_period, visit_occurrence, visit_detail,
#> condition_occurrence, drug_exposure, procedure_occurrence, device_exposure,
#> measurement, observation, death, note, note_nlp, specimen, fact_relationship,
#> location, care_site, provider, payer_plan_period, cost, drug_era, dose_era,
#> condition_era, metadata, cdm_source, concept, vocabulary, domain,
#> concept_class, concept_relationship, relationship, concept_synonym,
#> concept_ancestor, source_to_concept_map, drug_strength
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -
First, let’s generate a list of codes for the concept
acetaminophen
and `sinusitis.
acetaminophen <- c(1125315,1127078, 1127433, 19133768, 40229134, 40231925, 40162522)
sinusitis <- c(4294548, 40481087, 4283893, 257012)
Now we want to explore the occurrence of these concepts within the
database. For that, we can use summariseConceptCounts()
from OmopSketch:
summariseConceptCounts(cdm,
conceptId = list("acetaminophen" = acetaminophen,
"sinusitis" = sinusitis)) |>
select(group_level, variable_name, variable_level, estimate_name, estimate_value) |>
glimpse()
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 2s ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 2s
#> Rows: 24
#> Columns: 5
#> $ group_level <chr> "acetaminophen", "acetaminophen", "acetaminophen", "ace…
#> $ variable_name <chr> "overall", "Acetaminophen 325 MG / Hydrocodone Bitartra…
#> $ variable_level <chr> NA, "40162522", "1127433", "40231925", "19133768", "402…
#> $ estimate_name <chr> "record_count", "record_count", "record_count", "record…
#> $ estimate_value <chr> "14205", "312", "9365", "306", "71", "1993", "2158", "2…
By default, the function will provide information about either the
number of records (estimate_name == "record_count"
) for
each concept_id or the number of people
(estimate_name == "person_count"
):
summariseConceptCounts(cdm,
conceptId = list("acetaminophen" = acetaminophen,
"sinusitis" = sinusitis),
countBy = c("record","person")) |>
select(group_level, variable_name, estimate_name) |>
distinct() |>
arrange(group_level, variable_name)
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 2s ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 2s
#> # A tibble: 24 × 3
#> group_level variable_name estimate_name
#> <chr> <chr> <chr>
#> 1 acetaminophen Acetaminophen 160 MG Oral Tablet record_count
#> 2 acetaminophen Acetaminophen 160 MG Oral Tablet person_count
#> 3 acetaminophen Acetaminophen 21.7 MG/ML / Dextromethorphan Hydr… record_count
#> 4 acetaminophen Acetaminophen 21.7 MG/ML / Dextromethorphan Hydr… person_count
#> 5 acetaminophen Acetaminophen 325 MG / Hydrocodone Bitartrate 7.… record_count
#> 6 acetaminophen Acetaminophen 325 MG / Hydrocodone Bitartrate 7.… person_count
#> 7 acetaminophen Acetaminophen 325 MG / Oxycodone Hydrochloride 5… record_count
#> 8 acetaminophen Acetaminophen 325 MG / Oxycodone Hydrochloride 5… person_count
#> 9 acetaminophen Acetaminophen 325 MG Oral Tablet record_count
#> 10 acetaminophen Acetaminophen 325 MG Oral Tablet person_count
#> # ℹ 14 more rows
However, we can specify which one is of interest using
countBy
argument:
summariseConceptCounts(cdm,
conceptId = list("acetaminophen" = acetaminophen,
"sinusitis" = sinusitis),
countBy = "record") |>
select(group_level, variable_name, estimate_name) |>
distinct() |>
arrange(group_level, variable_name)
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 1s ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 1s
#> # A tibble: 12 × 3
#> group_level variable_name estimate_name
#> <chr> <chr> <chr>
#> 1 acetaminophen Acetaminophen 160 MG Oral Tablet record_count
#> 2 acetaminophen Acetaminophen 21.7 MG/ML / Dextromethorphan Hydr… record_count
#> 3 acetaminophen Acetaminophen 325 MG / Hydrocodone Bitartrate 7.… record_count
#> 4 acetaminophen Acetaminophen 325 MG / Oxycodone Hydrochloride 5… record_count
#> 5 acetaminophen Acetaminophen 325 MG Oral Tablet record_count
#> 6 acetaminophen Acetaminophen 750 MG / Hydrocodone Bitartrate 7.… record_count
#> 7 acetaminophen overall record_count
#> 8 sinusitis Acute bacterial sinusitis record_count
#> 9 sinusitis Chronic sinusitis record_count
#> 10 sinusitis Sinusitis record_count
#> 11 sinusitis Viral sinusitis record_count
#> 12 sinusitis overall record_count
One can further stratify by year, sex or age group using the
year
, sex
, and ageGroup
arguments.
summariseConceptCounts(cdm, conceptId = list("acetaminophen" = acetaminophen, "sinusitis" = sinusitis), countBy = "person", year = TRUE, sex = TRUE, ageGroup = list("<=50" = c(0,50), ">50" = c(51,Inf))) |> select(group_level, strata_level, variable_name, estimate_name) |> glimpse()
#> ℹ Getting use of codes from acetaminophen
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 3s ℹ Getting use of codes from sinusitis
#> Getting use of codes ■■■■■■■■■■■■■■■■ 50% | ETA: 3s
#> Rows: 1,173
#> Columns: 4
#> $ group_level <chr> "acetaminophen", "acetaminophen", "acetaminophen", "acet…
#> $ strata_level <chr> "overall", "overall", "overall", "overall", "overall", "…
#> $ variable_name <chr> "overall", "Acetaminophen 325 MG / Hydrocodone Bitartrat…
#> $ estimate_name <chr> "person_count", "person_count", "person_count", "person_…
Finally, we can visualise the concept counts using
plotRecordCounts()
.
summariseConceptCounts(cdm,
conceptId = list("sinusitis" = sinusitis),
countBy = "person") |>
plotConceptCounts()
#> ℹ Getting use of codes from sinusitis
#> ! The following column type were changed:
#> • variable_name: from integer to character
Notice that either person counts or record counts can be plotted. If both have been included in the summarised result, you will have to filter to only include one variable at time:
summariseConceptCounts(cdm,
conceptId = list("sinusitis" = sinusitis),
countBy = c("person","record")) |>
filter(estimate_name == "person_count") |>
plotConceptCounts()
#> ℹ Getting use of codes from sinusitis
#> ! The following column type were changed:
#> • variable_name: from integer to character
Additionally, if results were stratified by year, sex or age group,
we can further use facet
or colour
arguments
to highlight the different results in the plot. To help us identify by
which variables we can colour or facet by, we can use visOmopResult
package.
summariseConceptCounts(cdm,
conceptId = list("sinusitis" = sinusitis),
countBy = c("person"),
sex = TRUE,
ageGroup = list("<=50" = c(0,50), ">50" = c(51, Inf))) |>
visOmopResults::tidyColumns()
#> ℹ Getting use of codes from sinusitis
#> [1] "cdm_name" "codelist_name" "sex"
#> [4] "age_group" "variable_name" "variable_level"
#> [7] "person_count" "source_concept_name" "source_concept_id"
#> [10] "domain_id" "result_type" "package_name"
#> [13] "package_version"
summariseConceptCounts(cdm,
conceptId = list("sinusitis" = sinusitis),
countBy = c("person"),
sex = TRUE,
ageGroup = list("<=50" = c(0,50), ">50" = c(51, Inf)))|>
plotConceptCounts(facet = "sex", colour = "age_group")
#> ℹ Getting use of codes from sinusitis
#> ! The following column type were changed:
#> • variable_name: from integer to character