Title: | The Directed Prediction Index |
Version: | 2025.8 |
Date: | 2025-08-15 |
Maintainer: | Han Wu Shuang Bao <baohws@foxmail.com> |
Description: | The Directed Prediction Index ('DPI') is a simulation-based method for quantifying the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in multiple linear regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of potential confounding variables, it suggests a more plausible influence direction from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/. |
License: | GPL-3 |
Encoding: | UTF-8 |
URL: | https://psychbruce.github.io/DPI/ |
BugReports: | https://github.com/psychbruce/DPI/issues |
Depends: | R (≥ 4.0.0) |
Imports: | glue, crayon, cli, ggplot2, cowplot, qgraph, bnlearn |
Suggests: | bruceR |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-08-20 01:59:58 UTC; Bruce |
Author: | Han Wu Shuang Bao |
Repository: | CRAN |
Date/Publication: | 2025-08-20 03:30:02 UTC |
DPI: The Directed Prediction Index
Description
The Directed Prediction Index ('DPI') is a simulation-based method for quantifying the relative endogeneity (relative dependence) of outcome (Y) versus predictor (X) variables in multiple linear regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of potential confounding variables, it suggests a more plausible influence direction from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Author(s)
Maintainer: Han Wu Shuang Bao baohws@foxmail.com (ORCID)
See Also
Useful links:
The Directed Prediction Index (DPI).
Description
The Directed Prediction Index (DPI) is a simulation-based method for quantifying the relative endogeneity (relative dependence) of outcome (Y) vs. predictor (X) variables in multiple linear regression models. By comparing the proportion of variance explained (R-squared) between the Y-as-outcome model and the X-as-outcome model while controlling for a sufficient number of potential confounding variables, it suggests a more plausible influence direction from a more exogenous variable (X) to a more endogenous variable (Y). Methodological details are provided at https://psychbruce.github.io/DPI/.
Usage
DPI(
model,
y,
x,
data = NULL,
k.cov = 1,
n.sim = 1000,
seed = NULL,
progress,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
model |
Model object ( |
y |
Dependent (outcome) variable. |
x |
Independent (predictor) variable. |
data |
[Optional] Defaults to |
k.cov |
Number of random covariates (simulating potential omitted variables) added to each simulation sample.
|
n.sim |
Number of simulation samples.
Defaults to |
seed |
Random seed for replicable results.
Defaults to |
progress |
Show progress bar.
Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution).
Defaults to |
Value
Return a data.frame of simulation results:
-
DPI
-
t.beta.xy^2 * (R2.Y - R2.X)
-
-
t.beta.xy
-
t value for coefficient of X predicting Y (always equal to t value for coefficient of Y predicting X) when controlling for all other covariates
-
-
df.beta.xy
residual degree of freedom (df) of
t.beta.xy
-
r.partial.xy
partial correlation (always with the same t value as
t.beta.xy
) between X and Y when controlling for all other covariates
-
delta.R2
-
R2.Y - R2.X
-
-
R2.Y
-
R^2
of regression model predicting Y using X and all other covariates
-
-
R2.X
-
R^2
of regression model predicting X using Y and all other covariates
-
See Also
Examples
model = lm(Ozone ~ ., data=airquality)
DPI(model, y="Ozone", x="Solar.R", seed=1)
DPI(data=airquality, y="Ozone", x="Solar.R", k.cov=10, seed=1)
The DPI curve analysis.
Description
The DPI curve analysis.
Usage
DPI_curve(
model,
y,
x,
data = NULL,
k.covs = 1:10,
n.sim = 1000,
seed = NULL,
file = NULL,
width = 6,
height = 4,
dpi = 500
)
Arguments
model |
Model object ( |
y |
Dependent (outcome) variable. |
x |
Independent (predictor) variable. |
data |
[Optional] Defaults to |
k.covs |
An integer vector of number of random covariates
(simulating potential omitted variables)
added to each simulation sample.
Defaults to |
n.sim |
Number of simulation samples.
Defaults to |
seed |
Random seed for replicable results.
Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution).
Defaults to |
Value
Return a data.frame of DPI curve results.
See Also
Examples
model = lm(Ozone ~ ., data=airquality)
DPIs = DPI_curve(model, y="Ozone", x="Solar.R", seed=1)
plot(DPIs) # ggplot object
[S3 methods] for DPI()
and DPI_curve()
.
Description
summary(dpi)
-
Summarize DPI results. Return a list (class
summary.dpi
) of summarized results and raw DPI data.frame. print(summary.dpi)
-
Print DPI summary.
plot(dpi)
-
Plot DPI results. Return a
ggplot
object. print(dpi)
-
Print DPI summary and plot.
plot(dpi.curve)
-
Plot DPI curve analysis results. Return a
ggplot
object.
Usage
## S3 method for class 'dpi'
summary(object, ...)
## S3 method for class 'summary.dpi'
print(x, digits = 3, ...)
## S3 method for class 'dpi'
plot(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
## S3 method for class 'dpi'
print(x, digits = 3, ...)
## S3 method for class 'dpi.curve'
plot(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
Arguments
object |
Object (class |
... |
Other arguments (currently not used). |
x |
Object (class |
digits |
Number of decimal places. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution).
Defaults to |
[S3 methods] for cor_network()
and dag_network()
.
Description
print(cor.net)
-
Plot (partial) correlation network results.
print(dag.net)
-
Plot Bayesian network (DAG) results.
Usage
## S3 method for class 'cor.net'
print(x, file = NULL, width = 6, height = 4, dpi = 500, ...)
## S3 method for class 'dag.net'
print(
x,
file = NULL,
width = 6,
height = 4,
dpi = 500,
algorithm = names(x),
...
)
Arguments
x |
Object (class |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
... |
Other arguments (currently not used). |
algorithm |
[For |
Value
Invisibly return a grob
object ("Grid Graphical Object", or a list of them) that can be further reused in ggplot2::ggsave()
and cowplot::plot_grid()
.
Correlation and partial correlation networks.
Description
Correlation and partial correlation networks (also called Gaussian graphical models, GGMs).
Usage
cor_network(
data,
index = c("cor", "pcor"),
show.value = TRUE,
show.insig = FALSE,
show.cutoff = FALSE,
faded = FALSE,
node.text.size = 1.2,
node.group = NULL,
node.color = NULL,
edge.color.pos = "#0571B0",
edge.color.neg = "#CA0020",
edge.color.non = "#EEEEEEEE",
edge.label.mrg = 0.01,
title = NULL,
file = NULL,
width = 6,
height = 4,
dpi = 500,
...
)
Arguments
data |
Data. |
index |
Type of graph: |
show.value |
Show correlation coefficients and their significance on edges.
Defaults to |
show.insig |
Show edges with insignificant correlations (p > 0.05).
Defaults to |
show.cutoff |
Show cut-off values of correlations.
Defaults to |
faded |
Transparency of edges according to the effect size of correlation.
Defaults to |
node.text.size |
Scalar on the font size of node (variable) labels.
Defaults to |
node.group |
A list that indicates which nodes belong together, with each element of list as a vector of integers identifying the column numbers of variables that belong together. |
node.color |
A vector with a color for each element in |
edge.color.pos |
Color for (significant) positive values. Defaults to |
edge.color.neg |
Color for (significant) negative values. Defaults to |
edge.color.non |
Color for insignificant values. Defaults to |
edge.label.mrg |
Margin of the background box around the edge label. Defaults to |
title |
Plot title. |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
... |
Arguments passed on to |
Value
Return a list (class cor.net
) of (partial) correlation results and qgraph
object with its grob
(Grid Graphical Object).
See Also
Examples
# correlation network
cor_network(airquality)
cor_network(airquality, show.insig=TRUE)
# partial correlation network
cor_network(airquality, "pcor")
cor_network(airquality, "pcor", show.insig=TRUE)
Directed acyclic graphs (DAGs) via Bayesian networks (BNs).
Description
Directed acyclic graphs (DAGs) via Bayesian networks (BNs). It uses bnlearn::boot.strength()
to estimate the strength of each edge as its empirical frequency over a set of networks learned from bootstrap samples. It computes (1) the probability of each edge (modulo its direction) and (2) the probabilities of each edge's directions conditional on the edge being present in the graph (in either direction). Stability thresholds are usually set as 0.85
for strength (i.e., an edge appearing in more than 85% of BNs bootstrap samples) and 0.50
for direction (i.e., a direction appearing in more than 50% of BNs bootstrap samples) (Briganti et al., 2023). Finally, for each chosen algorithm, it returns the stable Bayesian network as the final DAG.
Usage
dag_network(
data,
algorithm = c("pc.stable", "hc", "rsmax2"),
algorithm.args = list(),
n.boot = 1000,
seed = NULL,
strength = 0.85,
direction = 0.5,
node.text.size = 1.2,
edge.width.max = 1.5,
edge.label.mrg = 0.01,
file = NULL,
width = 6,
height = 4,
dpi = 500,
...
)
Arguments
data |
Data. |
algorithm |
Structure learning algorithms for building Bayesian networks (BNs). Should be function name(s) from the Defaults to the most common algorithms:
|
algorithm.args |
An optional list of extra arguments passed to the algorithm. |
n.boot |
Number of bootstrap samples (for learning a more "stable" network structure). Defaults to |
seed |
Random seed for replicable results.
Defaults to |
strength |
Stability threshold of edge strength: the minimum proportion (probability) of BNs (among the
|
direction |
Stability threshold of edge direction: the minimum proportion (probability) of BNs (among the
|
node.text.size |
Scalar on the font size of node (variable) labels.
Defaults to |
edge.width.max |
Maximum value of edge strength to scale all edge widths. Defaults to |
edge.label.mrg |
Margin of the background box around the edge label. Defaults to |
file |
File name of saved plot ( |
width , height |
Width and height (in inches) of saved plot.
Defaults to |
dpi |
Dots per inch (figure resolution). Defaults to |
... |
Arguments passed on to |
Value
Return a list (class dag.net
) of Bayesian network results and qgraph
object with its grob
(Grid Graphical Object).
References
Briganti, G., Scutari, M., & McNally, R. J. (2023). A tutorial on Bayesian networks for psychopathology researchers. Psychological Methods, 28(4), 947–961. doi:10.1037/met0000479
Burger, J., Isvoranu, A.-M., Lunansky, G., Haslbeck, J. M. B., Epskamp, S., Hoekstra, R. H. A., Fried, E. I., Borsboom, D., & Blanken, T. F. (2023). Reporting standards for psychological network analyses in cross-sectional data. Psychological Methods, 28(4), 806–824. doi:10.1037/met0000471
Scutari, M., & Denis, J.-B. (2021). Bayesian networks: With examples in R (2nd ed.). Chapman and Hall/CRC. doi:10.1201/9780429347436
See Also
Examples
bn = dag_network(airquality, seed=1)
bn
# bn$pc.stable
# bn$hc
# bn$rsmax2
## All DAG objects can be directly plotted
## or saved with print(..., file="xxx.png")
# bn$pc.stable$DAG.edge
# bn$pc.stable$DAG.strength
# bn$pc.stable$DAG.direction
# bn$pc.stable$DAG
# ...
## Not run:
print(bn, file="airquality.png")
# will save three plots with auto-modified file names:
- "airquality_DAG.NET_BNs.01_pc.stable.png"
- "airquality_DAG.NET_BNs.02_hc.png"
- "airquality_DAG.NET_BNs.03_rsmax2.png"
# arrange multiple plots using cowplot::plot_grid()
# but still with unknown issue on incomplete figure
c1 = cor_network(airquality, "cor")
c2 = cor_network(airquality, "pcor")
bn = dag_network(airquality, seed=1)
plot_grid(
~print(c1),
~print(c2),
~print(bn$hc$DAG),
~print(bn$rsmax2$DAG),
labels="AUTO"
)
## End(Not run)
Generate random data.
Description
Generate random data.
Usage
data_random(k, n, seed = NULL)
Arguments
k |
Number of variables. |
n |
Number of observations (cases). |
seed |
Random seed for replicable results.
Defaults to |
Value
Return a data.frame of random data.
Examples
d = data_random(k=5, n=100, seed=1)
cor_network(d)