Probabilistic Sensitivity Analysis

Denis Haine

Bias parameters are not always known. A probabilistic sensitivity analysis allows to specify a probability distribution for the bias parameters and then, through Monte Carlo sampling, to generate a frequency distribution of the corrected estimates of effect. episensr has a set of probsens() functions for this:

The available distributions for the various bias parameters throughout these functions are:

Probabilistic Sensitivity Analysis for Exposure Misclassification

We use a study on the effect of smoking during pregnancy on breast cancer risk by Fink & Lash, where we assume nondifferential misclassification of the exposure, smoking, with probability density functions for sensitivities (Se) and specificities (Sp) among cases and noncases equal to uniform distributions with a minimum of 0.7 and a maximum of 0.95 for sensitivities (0.9 and 0.99 respectively for specificities). We choose to correct for exposure misclassification with the argument type = exposure. We ask for 50000 replications (default is 1000). Don’t be shy with the number of iterations, episensr is fast.

The Se and Sp for cases (seca, spca) are given as a list with its first element referring to the type of distribution (choice between constant, uniform, triangular, trapezoidal, logit-logistic, and logit-normal) and the second element giving the distribution parameters (min and max for uniform distribution). By avoiding to provide information on the noncases (seexp, spexp), we are referring to a non-differential misclassification.

library(episensr)
## Loading required package: ggplot2
## Thank you for using episensr!
## This is version 2.0.0 of episensr
## Type 'citation("episensr")' for citing this R package in publications.
set.seed(123)
smoke.nd <- probsens(matrix(c(215, 1449, 668, 4296),
                            dimnames = list(c("BC+", "BC-"), c("Smoke+", "Smoke-")),
                            nrow = 2, byrow = TRUE),
                     type = "exposure",
                     reps = 50000,
                     seca = list("uniform", c(.7, .95)),
                     spca = list("uniform", c(.9, .99)))
## ℹ Calculating observed measures
## ⠙ Assign probability distributions
## ✔ Assign probability distributions [27ms]
## 
## ⠙ Simple bias analysis
## ✔ Simple bias analysis [90ms]
## 
## ⠙ Incorporating random error
## ✔ Incorporating random error [311ms]
## 
smoke.nd
## 
## ── Observed data ───────────────────────────────────────────────────────────────
## • Outcome: BC+
## • Comparing: Smoke+ vs. Smoke-
## 
##     Smoke+ Smoke-
## BC+    215   1449
## BC-    668   4296
##                                         2.5%     97.5%
##  Observed Relative Risk: 0.9653825 0.8523766 1.0933704
##     Observed Odds Ratio: 0.9542406 0.8092461 1.1252141
## ── Bias-adjusted measures ──
##                                       Median      p2.5     p97.5
## Relative Risk -- systematic error: 0.9431650 0.8817325 0.9612154
##                       total error: 0.9413972 0.7270521 1.1382687
##    Odds Ratio -- systematic error: 0.9253996 0.8478414 0.9487841
##                       total error: 0.9226182 0.6635342 1.1894274

The output gives the 2X2 table, the observed measures of association, and the corrected measures of association.

We saved the probsens() analysis in a new object smoke.nd. We can see its elements with the function str():

str(smoke.nd)
## List of 7
##  $ obs_data    : num [1:2, 1:2] 215 668 1449 4296
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "BC+" "BC-"
##   .. ..$ : chr [1:2] "Smoke+" "Smoke-"
##  $ obs_measures: num [1:2, 1:3] 0.965 0.954 0.852 0.809 1.093 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] " Observed Relative Risk:" "    Observed Odds Ratio:"
##   .. ..$ : chr [1:3] " " "2.5%" "97.5%"
##  $ adj_measures: num [1:4, 1:3] 0.943 0.941 0.925 0.923 0.882 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:4] "Relative Risk -- systematic error:" "                      total error:" "   Odds Ratio -- systematic error:" "                      total error:"
##   .. ..$ : chr [1:3] "Median" "p2.5" "p97.5"
##  $ sim_df      :'data.frame':    50000 obs. of  27 variables:
##   ..$ seca   : num [1:50000] 0.772 0.897 0.802 0.921 0.935 ...
##   ..$ seexp  : num [1:50000] 0.772 0.897 0.802 0.921 0.935 ...
##   ..$ spca   : num [1:50000] 0.919 0.94 0.912 0.943 0.965 ...
##   ..$ spexp  : num [1:50000] 0.919 0.94 0.912 0.943 0.965 ...
##   ..$ A1     : num [1:50000] 117 137 95 139 174 ...
##   ..$ B1     : num [1:50000] 1547 1527 1569 1525 1490 ...
##   ..$ C1     : num [1:50000] 386 442 321 446 548 ...
##   ..$ D1     : num [1:50000] 4578 4522 4643 4518 4416 ...
##   ..$ prevca : num [1:50000] 0.0719 0.0902 0.0525 0.0775 0.1032 ...
##   ..$ prevexp: num [1:50000] 0.0753 0.0903 0.0654 0.0874 0.1017 ...
##   ..$ ppvca  : num [1:50000] 0.425 0.597 0.334 0.576 0.753 ...
##   ..$ ppvexp : num [1:50000] 0.438 0.597 0.388 0.608 0.75 ...
##   ..$ npvca  : num [1:50000] 0.981 0.989 0.988 0.993 0.992 ...
##   ..$ npvexp : num [1:50000] 0.98 0.989 0.985 0.992 0.992 ...
##   ..$ ab     : num [1:50000] 119 143 75 138 178 257 191 167 164 162 ...
##   ..$ bb     : num [1:50000] 1545 1521 1589 1526 1486 ...
##   ..$ cb     : num [1:50000] 385 451 321 434 521 837 606 472 540 508 ...
##   ..$ db     : num [1:50000] 4579 4513 4643 4530 4443 ...
##   ..$ corr_RR: num [1:50000] 0.936 0.955 0.743 0.957 1.016 ...
##   ..$ corr_OR: num [1:50000] 0.916 0.941 0.683 0.944 1.022 ...
##   ..$ rr_se_b: num [1:50000] 0.0831 0.0762 0.1062 0.0774 0.0685 ...
##   ..$ or_se_b: num [1:50000] 0.1089 0.1004 0.1315 0.1021 0.0918 ...
##   ..$ z      : num [1:50000] 0.0266 -0.6453 -0.7772 1.1848 0.5986 ...
##   ..$ tot_RR : num [1:50000] 0.934 1.003 0.807 0.874 0.975 ...
##   ..$ tot_OR : num [1:50000] 0.913 1.004 0.756 0.836 0.967 ...
##   ..$ syst_RR: num [1:50000] 0.918 0.94 0.905 0.943 0.954 ...
##   ..$ syst_OR: num [1:50000] 0.893 0.922 0.877 0.925 0.94 ...
##  $ reps        : num 50000
##  $ fun         : chr "probsens"
##  $ warnings    : NULL
##  - attr(*, "class")= chr [1:3] "episensr" "episensr.probsens" "list"

smoke.nd is a list of 8 elements where different information on the analysis done are saved. We have smoke.nd$obs_data where we have the observed 2X2 table, smoke.nd$obs_measures (the observed measures of association), smoke.nd$adj_measures (the adjusted measures of association), and smoke.nd$sim_df, a data frame with the simulated variables from each replication, like the Se and Sp, the four cells of the adjusted 2X2 table, and the adjusted measures. episensr offers some plot functions to allow plotting directly these saved information, for example, the Se prior distribution:

plot(smoke.nd, "seca")
Sensibility prior distribution.
Sensibility prior distribution.

There are combinations of Se, Sp, and disease (or exposure) prevalence that can produce negative cells in the corrected 2-by-2 table. For outcome misclassification, this happens when the frequency of observed exposed cases is less than the total number of diseased individuals multiplied by the false-positive proportion. Negative cell counts occur when the false-positive proportion is greater than the proportion of cases that are exposed. When providing values for Se and Sp that are more or less like random classification (i.e., Se ~50% and Sp ~50%), you can obtain negative cell values.

Note that a message is provided if the chosen distribution(s) leads to negative cell values, with the number of iterations affected. It is recommended to shift distributions upwards if more than 10% of the iterations are deleted (Fox et al., 2005).

Probabilistic Sensitivity Analysis for Uncontrolled Confounding

Let’s illustrate this function with this example where the association between occupational resins exposure and lung cancer mortality is studied, together with the presence of an unmeasured potential confounder, smoking (Greenland et al., 1994).

Resins exposed Resins unexposed
Cases 45 94
Controls 257 945

After adjustment for age and year at death, Greenland et al. found an OR of 1.77 (95% CI from 1.18 to 2.64). Is smoking, for which they had no data about, had an effect on this association? How large the association between resins and smoking had to be to remove the association between resins and lung cancer association after adjustment for smoking? For this, you have to know three bias parameters: the association between smoking and lung cancer, the prevalence of smoking among those unexposed to resins, and the prevalence of smoking in those exposed to resins.

Prior probability distributions are given to each bias parameter. Prevalences of smoking in those exposed to resins, and of smoking in those unexposed to resins receive prior distributions that are uniform between 0.40 and 0.70. Prior distribution for the odds ratio associating smoking with lung cancer mortality is log-normal with 95% limits of 5 and 15. The mean of this distribution is [ln(15) + ln(5)] / 2 = 2.159, and its standard deviation is [ln(15) - ln(5)] / 3.92 = 0.28. It looks like this:

set.seed(123)
x <- rlnorm(10000, meanlog = 2.159, sdlog = 0.28)
quantile(x, c(0.025, 0.975))
##      2.5%     97.5% 
##  4.979672 14.959055
plot(density(x))
Log-normal distribution with meanlog = 2.159 and sdlog = 0.28.
Log-normal distribution with meanlog = 2.159 and sdlog = 0.28.

Now, let’s run probsens_conf() with 50,000 iterations:

set.seed(123)
greenland <- probsens_conf(matrix(c(45, 94, 257, 945),
                                  dimnames = list(c("Cases+", "Cases-"), c("Res+", "Res-")),
                                  nrow = 2, byrow = TRUE),
                           reps = 50000,
                           prev_exp = list("uniform", c(.4, .7)),
                           prev_nexp = list("uniform", c(.4, .7)),
                           risk = list("log-normal", c(2.159, .28)))
## ℹ Calculating observed measures
## ⠙ Simple bias analysis
## ! Samplings lead to 1969 instances in which
## sampled cell counts were zero and discarded.
## ⠙ Simple bias analysis✔ Simple bias analysis [286ms]
## 
## ⠙ Incorporating random error
## ✔ Incorporating random error [77ms]
greenland
## 
## ── Observed data ───────────────────────────────────────────────────────────────
## • Outcome: Cases+
## • Comparing: Res+ vs. Res-
##        Res+ Res-
## Cases+   45   94
## Cases-  257  945
##                                      2.5%    97.5%
## Observed Relative Risk: 1.646999 1.182429 2.294094
##    Observed Odds Ratio: 1.760286 1.202457 2.576898
## ── Bias-adjusted measures ──
##                                            Median 2.5th percentile
##            RR (MH) -- systematic error:  1.654391         1.170568
## RR (MH) -- systematic and random error:  1.655420         1.028700
##            OR (SMR) -- systematic error: 1.866437         1.212980
## OR (SMR) -- systematic and random error: 1.870259         1.090092
##                                          97.5th percentile
##            RR (MH) -- systematic error:           2.328447
## RR (MH) -- systematic and random error:           2.652687
##            OR (SMR) -- systematic error:          2.916181
## OR (SMR) -- systematic and random error:          3.248255

The median adjusted OR is 1.87 [1.09,3.25]. The ratio of the two 95% CI bounds is 3.25/1.09 = 2.98. The ratio from the conventional 95% confidence limits was 2.57/1.20 = 2.14. While the adjusted ratio is 50% larger, it still does not include 1.

You can plot the bias-adjusted OR including both systematic and random error, and even get a forest plot:

plot(greenland, "rr_tot")
Distribution of the 50,000 confounder-adjusted odds ratios.
Distribution of the 50,000 confounder-adjusted odds ratios.
plot(greenland, "forest_or") + theme_classic()
Forest plot of odds ratios.
Forest plot of odds ratios.