This is a minor release introducing changes meant to focus S3 methods
and utility functions around two core classes: causal_model
and model_query
. Our aim is to improve the user experience
of CausalQueries
by focusing user facing functionality more
clearly around the workflow of making, updating, querying and inspecting
causal models. With respect to causal_model
objects this
release introduces more expressive and concise S3 summary and print
methods for the causal_model
class and its internal
objects. Updates to the grab()
and inspect()
functions streamline access to objects contained within a
causal_model
, facilitating more advanced use-cases or
deeper review. This release introduces the model_query
class along with S3 summary, print and plot methods for a more seamless
querying workflow. Finally, this release removes dependency on
dagitty
, restoring compatibility of
CausalQueries
with systems on which V8
JavaScript
WASM
is not supported.
The summary()
method for objects of class
causal_model
now supports an include
argument
allowing users to specify additional objects internal to the
causal_model
object for which they would like to have
summaries appended to the main output of summary()
.
Summaries have additionally been made more informative and readable.
Please see ?summary.causal_model
for extensive
documentation on the new functionality.
Internal objects of a causal_model
instance can now be
returned quietly via grab()
eliminating the need to
interact with a causal_model
instance directly.
The newly introduced model_query
class comes with a
print, summary and plot method. plot()
generates a
coefficient plot with credible intervals for evaluated queries.
This is a patch release fixing a bug in the
print.model_query()
S3 method that occurred when querying
models using paramters
.
Accessing causal-model
objects via get_
methods e.g. get_nodal_types()
, get_parameters
is no longer supported. Objects may now be accessed via a unified syntax
through the inspect()
function (see New Functionality). The
following functions are no longer exported:
get_causal_types()
get_nodal_types()
get_all_data_types()
get_event_probabilities()
get_ambiguities_matrix()
get_parameters()
get_parameter_names()
get_parmap()
get_parameter_matrix()
get_priors()
get_param_dist()
get_type_prob_multiple()
inspect()
causal-model
objects can now be accessed via
inspect()
like so:
inspect(model, "parameters_df")
See documentation for an exhaustive list of accessible objects.
causal-model
objects now additionally come with dedicated
print
methods returning short informative summaries of the
given object.
A summary of parameter values and convergence information produced by
the update_model()
Stan
model can now be
accessed via:
inspect(model, "stan_summary")
Advanced model diagnostics on raw Stan
output via
external packages is possible by saving the stan_fit
object
when updating. This is facilitated via the keep_fit
option
in update_model()
:
model <- make_model("X -> Y") |>
update_model(data, keep_fit = TRUE)
model |> inspect("stanfit")
nodal_types
to make_model()
now
implements correct error handlingPreviously this
make_model("X -> Y" , nodal_types = list(Y = c("0", "1")))
was permissible leading to setting nodal_types
:
$X
NULL
$Y
[1] "0" "1"
This led to undefined behavior and unhelpful downstream error
messages. When passing nodal_types
to
make_model()
users are now forced to specify a set of
nodal_types
on each node.
query_distribution()
are no longer overwrites type
distribution internallymake_model()
Previously hyphenated names would not throw an error and be corrupted
silently through the conversion of model definition strings into
dagitty
objects.
make_model("institutions -> political-inequality")
Statement:
[1] "institutions -> political-inequality"
DAG:
parent children
1 institutions political
Checks for correct variable naming are now reinstated.
Calls to sapply()
have ben replaced with
vapply()
wherever possible to enforce type safety.
Looping via index has been replaced by range based looping wherever possible to guard against 0 length exceptions.
goodpractice::gp()
goodpractice
code improvements have been
implemented.
query_distribution()
now supports the use of multiple
queries in one function call and thus returns a DataFrame
of distribution draws instead of a single numeric vector.
query_distribution()
: now supports the specification of
multiple queries and givens to be evaluated on a single model in one
function call.
model <- make_model("X -> Y")
query_distribution(model,
query = list("(Y[X=1] > Y[X=0])", "(Y[X=1] < Y[X=0])"),
given = list("Y==1", "(Y[X=1] <= Y[X=0])"),
using = "priors")|>
head()
query_model()
: now supports the specification of
multiple models to evaluate a set of queries on in one function
call.
models <- list(
M1 = make_model("X -> Y"),
M2 = make_model("X -> Y") |> set_restrictions("Y[X=1] < Y[X=0]")
)
query_model(
models,
query = list(ATE = "Y[X=1] - Y[X=0]", Share_positive = "Y[X=1] > Y[X=0]"),
given = c(TRUE, "Y==1 & X==1"),
using = c("parameters", "priors"),
expand_grid = FALSE)
query_model(
models,
query = list(ATE = "Y[X=1] - Y[X=0]", Share_positive = "Y[X=1] > Y[X=0]"),
given = c(TRUE, "Y==1 & X==1"),
using = c("parameters", "priors"),
expand_grid = TRUE)
This eliminates the need for redundant function calls when querying models and substantially improves computation time as computationally expensive function calls to produce data structures required for querying are now reduced to a minimum via redundancy elimination and caching.
realise_outcomes()
: specifying the node
option now produces a DataFrame
detailing how the specified
node responds to its parents in the presence or absence of do
operations. This produces a reduced form of the usual
realise_outcomes()
output detailing all causal-types; and
aids in the interpretation of both nodal- and causal-types. This update
resolves previous bugs and errors relating to specification of nodes
with multiple parents in the node
option.
model <- make_model("X1 -> M -> Y -> Z; X2 -> Y") |>
realise_outcomes(dos = list(M = 1), node = "Y")
Previously set_parameters()
and
set_priors()
would default applying changes in the order in
which parameters appeared in the parameters_df
DataFrame
; regardless of the order in which changes were
specified in the aforementioned functions. Calling:
model <- make_model("X -> Y")
set_priors(model, alphas = c(3,4), nodal_type = c("10",00))
would results in the following parameters_df
.
param_names node gen param_set nodal_type given param_value priors
<chr> <chr> <int> <chr> <chr> <chr> <dbl> <dbl>
1 X.0 X 1 X 0 "" 0.5 1
2 X.1 X 1 X 1 "" 0.5 1
3 Y.00 Y 2 Y 00 "" 0.25 3
4 Y.10 Y 2 Y 10 "" 0.25 4
5 Y.01 Y 2 Y 01 "" 0.25 1
6 Y.11 Y 2 Y 11 "" 0.25 1
Now changes to parameters values get applied in the order specified
in the function call; resulting in the following
parameters_df
for the above example:
param_names node gen param_set nodal_type given param_value priors
<chr> <chr> <int> <chr> <chr> <chr> <dbl> <dbl>
1 X.0 X 1 X 0 "" 0.5 1
2 X.1 X 1 X 1 "" 0.5 1
3 Y.00 Y 2 Y 00 "" 0.25 4
4 Y.10 Y 2 Y 10 "" 0.25 3
5 Y.01 Y 2 Y 01 "" 0.25 1
6 Y.11 Y 2 Y 11 "" 0.25 1
Additionally we have implemented helpful warnings for when
instructions identifying parameters to be updated are under specified.
This is particularly useful when setting priors or parameters on models
with confounding as changes may inadvertently be applied across
param_sets
.
Previously updating models with censored types would fail as 0s in
the w
vector induced by censoring would evaluate to -Inf as
the Stan
MCMC algorithm began sampling from the posterior
of the multinational distribution. We resolved this issue by pruning the
w
vector when the multinomial is run. This preserves the
true w
vector (event probabilities without censoring) while
still updating with the censored data-
Previously wildcards
in set_restrictions()
were erroneously interpreted as valid nodal types, leading to errors and
undefined behavior. Proper unpacking and mapping of
wildcards
to existing nodal types has been restored.
Previously misspecifications in queries like Y[X==1]=1
would lead to undefined behavior when mapping queries to nodal or causal
types. We now correct misspecified queries internally and warn about the
misspecification. For example; running:
model <- CausalQueries::make_model("X -> Y")
get_query_types(model, "Y[X=1]=1")
now produces
Causal types satisfying query's condition(s)
query = Y[X=1]==1
X0.Y01 X1.Y01
X0.Y11 X1.Y11
Number of causal types that meet condition(s) = 4
Total number of causal types in model = 8
Warning message:
In check_query(query) :
statements to the effect that the realization of a node should equal some value should be specified with `==` not `=`.
The query has been changed accordingly: Y[X=1]==1
Previously a parameter matrix P
that was attached to a
causal_model
object could not be overwritten. Overwrites
are now possible.
realise_outcomes()
We achieved a ~100 fold speed gain in the
realise_outcomes()
functionality. Nodal types on a given
node are generated as the Cartesian product of parent realizations.
Consider the meaning of nodal types on a node \(Y\) with 3 parents \([X1,X2,X3]\):
X1 | X2 | X3 |
---|---|---|
0 | 0 | 0 |
1 | 0 | 0 |
0 | 1 | 0 |
1 | 1 | 0 |
0 | 0 | 1 |
1 | 0 | 1 |
0 | 1 | 1 |
1 | 1 | 1 |
Each row in the above DataFrame
corresponds to a digit
in Y's
nodal types. The first digit of each nodal type of
\(Y\) (see first row above),
corresponds to the realization of \(Y\)
when \(X1 = 0, X2 = 0, X3 = 0\). The
fourth digit of each nodal type of \(Y\) (see fourth row above), corresponds to
the realization of \(Y\) when \(X1 = 1, X2 = 1, X3 = 0\). Finding the
position of the realization value of \(Y\) in a nodal type given parent
realizations is equivalent to finding the row number in the Cartesian
product DataFrame
. By definition of the Cartesian product,
the number of consecutive 0 or 1 elements in a given column is \(2^{columnindex}\), when indexing columns
from 0. Given a set of parent realizations \(R\) indexed from 0, the corresponding row
in a number in a DataFrame
indexed from 0 can thus be
computed via: \[row = (\sum_{i = 0}^{|R| - 1}
(2^{i} \times R_i))\]. We implement a fast C++
version of this computing powers of 2 via bit shifting.
Stan
updateWe updated to the new array syntax introduced in Stan
v2.33.0