2 Introduction

In this vignette, we will explore the OmopSketch functions designed to provide information about the number of counts of concepts in tables. Specifically, there are two key functions that facilitate this, summariseConceptIdCounts() and tableConceptIdCounts(). The former one creates a summary statistics results with the number of counts per each concept in the clinical table, and the latter one displays the result in a table.

2.1 Create a mock cdm

Let’s see an example of the previous functions. To start with, we will load essential packages and create a mock cdm using the R package omock.

library(OmopSketch)
library(dplyr)
library(omock)

cdm <- mockCdmFromDataset(datasetName = "GiBleed", source = "duckdb")
#> ℹ Reading GiBleed tables.
#> ℹ Adding drug_strength table.
#> ℹ Creating local <cdm_reference> object.
#> ℹ Inserting <cdm_reference> into duckdb.

cdm
#> 
#> ── # OMOP CDM reference (duckdb) of GiBleed ────────────────────────────────────
#> • omop tables: care_site, cdm_source, concept, concept_ancestor, concept_class,
#> concept_relationship, concept_synonym, condition_era, condition_occurrence,
#> cost, death, device_exposure, domain, dose_era, drug_era, drug_exposure,
#> drug_strength, fact_relationship, location, measurement, metadata, note,
#> note_nlp, observation, observation_period, payer_plan_period, person,
#> procedure_occurrence, provider, relationship, source_to_concept_map, specimen,
#> visit_detail, visit_occurrence, vocabulary
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

3 Summarise concept level counts

We now use the summariseConceptIdCounts() function from the OmopSketch package to retrieve counts for each concept id and name, as well as for each source concept id and name, across the clinical tables.

summariseConceptIdCounts(cdm = cdm, omopTableName = "drug_exposure") |>
  select(group_level, variable_name, variable_level, estimate_name, estimate_value, additional_name, additional_level) |>
  glimpse()
#> Rows: 113
#> Columns: 7
#> $ group_level      <chr> "drug_exposure", "drug_exposure", "drug_exposure", "d…
#> $ variable_name    <chr> "Naproxen sodium 220 MG Oral Tablet", "Diphenhydramin…
#> $ variable_level   <chr> "1115171", "40232448", "19075601", "19129655", "19079…
#> $ estimate_name    <chr> "count_records", "count_records", "count_records", "c…
#> $ estimate_value   <chr> "1159", "105", "363", "488", "35", "6", "27", "7", "5…
#> $ additional_name  <chr> "source_concept_id &&& source_concept_name", "source_…
#> $ additional_level <chr> "1115171 &&& Naproxen sodium 220 MG Oral Tablet", "40…

By default, the function returns the number of records (estimate_name == "count_records") for each concept_id. To include counts by person, you can set the countBy argument to "person" or to c("record", "person") to obtain both record and person counts.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
) |>
  select(variable_name, estimate_name, estimate_value)
#> # A tibble: 226 × 3
#>    variable_name                                    estimate_name estimate_value
#>    <chr>                                            <chr>         <chr>         
#>  1 zoster vaccine, live                             count_records 2125          
#>  2 zoster vaccine, live                             count_subjec… 1140          
#>  3 Acetaminophen 160 MG Oral Tablet                 count_records 2158          
#>  4 Acetaminophen 160 MG Oral Tablet                 count_subjec… 1428          
#>  5 Penicillin V Potassium 500 MG Oral Tablet        count_records 1087          
#>  6 Penicillin V Potassium 500 MG Oral Tablet        count_subjec… 856           
#>  7 Acetaminophen 325 MG / Oxycodone Hydrochloride … count_records 306           
#>  8 Acetaminophen 325 MG / Oxycodone Hydrochloride … count_subjec… 306           
#>  9 varicella virus vaccine                          count_records 422           
#> 10 varicella virus vaccine                          count_subjec… 301           
#> # ℹ 216 more rows

Further stratification can be applied using the interval, sex, and ageGroup arguments. The interval argument supports “overall” (no time stratification), “years”, “quarters”, or “months”.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  countBy = "person",
  interval = "years",
  sex = TRUE,
  ageGroup = list("<=50" = c(0, 50), ">50" = c(51, Inf))
) |>
  select(group_level, strata_level, variable_name, estimate_name, additional_level) |>
  glimpse()
#> Rows: 28,358
#> Columns: 5
#> $ group_level      <chr> "condition_occurrence", "condition_occurrence", "cond…
#> $ strata_level     <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ variable_name    <chr> "Acute bronchitis", "Polyp of colon", "Laceration of …
#> $ estimate_name    <chr> "count_subjects", "count_subjects", "count_subjects",…
#> $ additional_level <chr> "260139 &&& Acute bronchitis", "4285898 &&& Polyp of …

We can also filter the clinical table to a specific time window by setting the dateRange argument.

summarisedResult <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  dateRange = as.Date(c("1990-01-01", "2010-01-01"))
)
summarisedResult |>
  settings() |>
  glimpse()
#> Rows: 1
#> Columns: 10
#> $ result_id          <int> 1
#> $ result_type        <chr> "summarise_concept_id_counts"
#> $ package_name       <chr> "OmopSketch"
#> $ package_version    <chr> "1.0.0"
#> $ group              <chr> "omop_table"
#> $ strata             <chr> ""
#> $ additional         <chr> "source_concept_id &&& source_concept_name"
#> $ min_cell_count     <chr> "0"
#> $ study_period_end   <chr> "2010-01-01"
#> $ study_period_start <chr> "1990-01-01"

Finally, you can restrict concept counts to a subset of subjects via the sample argument: provide an integer to randomly select that many person_ids from the person table, or a character string naming a cohort table to limit counts to its subject_ids.

summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "condition_occurrence",
  sample = 50
) |>
  select(group_level, variable_name, estimate_name) |>
  glimpse()
#> Rows: 66
#> Columns: 3
#> $ group_level   <chr> "condition_occurrence", "condition_occurrence", "conditi…
#> $ variable_name <chr> "Acute bronchitis", "Facial laceration", "Laceration of …
#> $ estimate_name <chr> "count_records", "count_records", "count_records", "coun…

3.1 Display the results

Finally, concept counts can be visualised using tableConceptIdCounts(). By default, it generates an interactive reactable table, but DT datatables are also supported.

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "measurement",
  countBy = "record"
)
tableConceptIdCounts(result = result, type = "reactable")

tableConceptIdCounts(result = result, type = "datatable")

The display argument in tableConceptIdCounts() controls which concept counts are shown. Available options include display = "overall". It is the default option and it shows both standard and source concept counts.

tableConceptIdCounts(result = result, display = "overall")

If display = "standard" the table shows only standard concept_id and concept_name counts.

tableConceptIdCounts(result = result, display = "standard")

If display = "source" the table shows only source concept_id and concept_name counts.

tableConceptIdCounts(result = result, display = "source")

If display = "missing source" the table shows only counts for concept ids that are missing a corresponding source concept id.

tableConceptIdCounts(result = result, display = "missing source")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -Inf

If display = "missing standard" the table shows only counts for source concept ids that are missing a mapped standard concept id.

tableConceptIdCounts(result = result, display = "missing standard")
#> Warning in max(dplyr::pull(dplyr::tally(dplyr::group_by(result,
#> dplyr::across(-c("estimate_value")))), : no non-missing arguments to max;
#> returning -Inf

3.2 Display the most frequent concepts

You can use the tableTopConceptCounts() function to display the most frequent concepts in a OMOP CDM table in formatted table. By default, the function returns a gt table, but you can also choose from other output formats, including flextable, datatable, and reactable.

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = "record"
)
tableTopConceptCounts(result = result, type = "gt")

Top 10 concepts in drug_exposure table
Top	Cdm name
Top	GiBleed
drug_exposure
1	Standard: Acetaminophen 325 MG Oral Tablet (1127433) Source: Acetaminophen 325 MG Oral Tablet (1127433) 9365
2	Standard: poliovirus vaccine, inactivated (40213160) Source: poliovirus vaccine, inactivated (40213160) 7977
3	Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227) 7430
4	Standard: Aspirin 81 MG Oral Tablet (19059056) Source: Aspirin 81 MG Oral Tablet (19059056) 4380
5	Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671) 3851
6	Standard: hepatitis A vaccine, adult dosage (40213296) Source: hepatitis A vaccine, adult dosage (40213296) 3211
7	Standard: Acetaminophen 160 MG Oral Tablet (1127078) Source: Acetaminophen 160 MG Oral Tablet (1127078) 2158
8	Standard: zoster vaccine, live (40213260) Source: zoster vaccine, live (40213260) 2125
9	Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134) Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134) 1993
10	Standard: hepatitis B vaccine, adult dosage (40213306) Source: hepatitis B vaccine, adult dosage (40213306) 1916

3.2.1 Customising the number of top concepts

By default, the function shows the top 10 concepts. You can change this using the top argument:

tableTopConceptCounts(result = result, top = 5)

Top 5 concepts in drug_exposure table
Top	Cdm name
Top	GiBleed
drug_exposure
1	Standard: Acetaminophen 325 MG Oral Tablet (1127433) Source: Acetaminophen 325 MG Oral Tablet (1127433) 9365
2	Standard: poliovirus vaccine, inactivated (40213160) Source: poliovirus vaccine, inactivated (40213160) 7977
3	Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227) 7430
4	Standard: Aspirin 81 MG Oral Tablet (19059056) Source: Aspirin 81 MG Oral Tablet (19059056) 4380
5	Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671) 3851

3.2.2 Choosing the count type

If your summary includes both record and person counts, you must specify which type to display using the countBy argument:

result <- summariseConceptIdCounts(
  cdm = cdm,
  omopTableName = "drug_exposure",
  countBy = c("record", "person")
)
tableTopConceptCounts(result = result, countBy = "person")

Top 10 concepts in drug_exposure table
Top	Cdm name
Top	GiBleed
drug_exposure
1	Standard: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227) Source: tetanus and diphtheria toxoids, adsorbed, preservative free, for adult use (40213227) 2660
2	Standard: Acetaminophen 325 MG Oral Tablet (1127433) Source: Acetaminophen 325 MG Oral Tablet (1127433) 2580
3	Standard: poliovirus vaccine, inactivated (40213160) Source: poliovirus vaccine, inactivated (40213160) 2140
4	Standard: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671) Source: Amoxicillin 250 MG / Clavulanate 125 MG Oral Tablet (1713671) 2021
5	Standard: Aspirin 81 MG Oral Tablet (19059056) Source: Aspirin 81 MG Oral Tablet (19059056) 1927
6	Standard: celecoxib (1118084) Source: celecoxib 200 MG Oral Capsule [Celebrex] (44923712) 1844
7	Standard: hepatitis A vaccine, adult dosage (40213296) Source: hepatitis A vaccine, adult dosage (40213296) 1737
8	Standard: hepatitis B vaccine, adult dosage (40213306) Source: hepatitis B vaccine, adult dosage (40213306) 1560
9	Standard: Acetaminophen 160 MG Oral Tablet (1127078) Source: Acetaminophen 160 MG Oral Tablet (1127078) 1428
10	Standard: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134) Source: Acetaminophen 21.7 MG/ML / Dextromethorphan Hydrobromide 1 MG/ML / doxylamine succinate 0.417 MG/ML Oral Solution (40229134) 1393

4 Disconnect from CDM

Finally, disconnect from the mock CDM.

cdmDisconnect(cdm = cdm)

Summarise concept id counts