Added option noFixed to function rbnpsd
to redraw loci that were drawn fixed for a single allele. These loci are
not polymorphic so they would normally not be considered in
analyses.
Added function fixed_loci to test for fixed loci
within rbnpsd.
coanc_to_kinship to easily obtain
kinship matrices from coancestry matrices.qis now returns a numeric admixture proportions matrix
(used to be logical).q1d and q1dc now handle
sigma = 0 special case.q1d and q1dc now provide more informative
out-of-bounds messages when sigma is missing (and
s is provided)sigma root finding in q1d and
q1dc (when s is provided) is now more robust,
explicitly tested at boundaries (min s > 0 achieved at
sigma = 0 and max s = 1 achieved at
sigma = Inf).
interval and tol from
both q1d and q1dc (users would never need to
set them now that procedure is more robust).coanc -> coanc_admixq1d -> admix_prop_1d_linearq1dc -> admix_prop_1d_circularqis -> admix_prop_indep_subpopsrpanc -> draw_p_ancrpint -> draw_p_subpopsrpiaf -> make_p_ind_admixrgeno -> draw_genotypes_admixrbnpsd -> draw_all_admixfst -> fst_admix (no deprecated version
available in this case, to eliminate conflict with
popkin::fst)Q -> admix_proportionsF -> coanc_subpops (if general matrix
is accepted), inbr_subpops (vector or scalar versions
required)s -> bias_coeffw -> weightsTheta -> coancestrym -> m_locin -> n_indk -> k_subpopspAnc -> p_ancB -> p_subpopsP -> p_indsigma = 0 bug in
admix_prop_1d_circular.draw_all_admix (compared to
deprecated rbnpsd, which retains old defaults):
require_polymorphic_loci (old noFixed) is
now TRUE by default.want_p_ind and want_p_subpops (old
wantP and wantB) are now FALSE by
default.draw_p_subpops now admits scalar inputs
p_anc and inbr_subpops, while number of loci
and number of subpopulations can be provided as additional options.coanc_subpops) or if the diagonal matrix case is required
(specified as vector or scalar inbr_subpops).admix_prop_1d_circular) to
prevent overlapping individuals on the edges, and to better agree
visually with the linear version
(admix_prop_1d_linear).draw_genotypes_admix
low_mem option could be set but filled slowly by locus
only.draw_all_admix is also now automatically low-memory
whenever want_p_ind = FALSE, and the explicit
low_mem option has also been removed.beta in function draw_p_anc
to trigger a symmetric Beta distribution for the ancestral allele
frequencies, with the desired shape parameter. The beta
option can also be set on the wrapper function
draw_all_admix. This option allows simulation of a
distribution heavier on rare variants (when beta is much
smaller than 1), more similar to real human data.q1dc,
q1d, qis, coanc,
rbnpsd, rgeno, rpanc,
rpint, rpiaf.man/figures/bias_coeff_admix_fit, which caused
it to die if the desired bias coefficient was an extreme value
(particularly 1). The error message was:
f() values at end points not of opposite sign. The actual
bug was not observed in the regular R build, but rather in a limited
precision setting where R was configured with
--disable-long-double.p_anc to function
draw_all_admix, to specify desired ancestral allele
frequencies instead of having the code generate it randomly
(default).draw_p_subpops.R, clarifying that input p_anc
can be scalar.draw_all_admix: when option p_anc
is provided as scalar and want_p_anc = TRUE, now the return
value is always a vector (in this case the input scalar value repeated
m_loci times). The previous behavior was to return
p_anc as scalar if that was the input, which could be
problematic for downstream applications.admix_prop_1d_linear and
admix_prop_1d_circular had these changes:
bias_coeff,
coanc_subpops and fst now have default values
(of NA, NULL, and NA,
respectively) instead of missing, and these “missing” values can be
passed to get the same behavior as if they hadn’t been passed at
all.bias_coeff = 1 (to fix an issue only observed on Apple
M1).admix_prop_indep_subpops: default value for
the optional parameter subpops is now made more clear in
arguments definition.DESCRIPTION,
README.md and the vignette, to point to the published
method in PLoS Genetics.draw_p_subpops_tree is the tree version of
draw_p_subpops.coanc_tree calculates the true coancestry
matrix corresponding to the subpopulations related by a tree.draw_all_admix has new argument
tree_subpops that can be used in place of
inbr_subpops (to simulated subpopulation allele frequencies
using draw_p_subpops_tree instead of
draw_p_subpops).coanc_subpops) as input, so they work if they are passed
the matrix that coanc_tree returns:
coanc_admix, fst_admix,
admix_prop_1d_linear,
admix_prop_1d_circular.admix_prop_1d_linear and
admix_prop_1d_circular, when sigma is missing
(and therefore fit to a desired coanc_subpops,
fst, and bias_coeff), now additionally return
multiplicative factor used to rescale
coanc_subpops.It’s Fangorn Forest around here with all the tree updates!
fit_tree for fitting trees to coancestry matrices!scale_tree to easily scale coancestry trees and check
for out-of-bounds values.tree_additive for calculating “additive” edges for
probabilistic edge coancestry trees, and also the reverse function .
coanc_tree, but now it’s renamed, exported, and well
documented!$root.edge to tree phylo
objects passed to these functions:
coanc_tree: edge is a shared covariance value affecting
all subpopulations.draw_all_admix and draw_p_subpops_tree: if
root edge is present, functions warn that it will be ignored.admix_prop_1d_linear and
admix_prop_1d_circular: debugged an edge case where
sigma is small but not zero and numerically-calculated
densities all come out to zero in a given row of the
admix_proportions matrix (for
admix_prop_1d_circular infinite values also arise), which
used to lead to NAs upon row normalization; now for those rows, the
closest ancestry (by coordinate distance) gets assigned the full
admixture fraction (just as for independent
subpopulations/sigma = 0).admix_prop_1d_linear,
admix_prop_1d_circular now copy names from the input
coanc_subpops (vector and matrix versions, only required
when fitting bias_coeff) to the columns of the output
admix_proportions matrix.draw_genotypes_admix now copies row and column
names from input matrix p_ind (or rownames from
p_ind and column names from the rownames of
admix_proportions when the latter is provided) to output
genotype matrixdraw_p_subpops now copies names from
p_anc to rows, names from inbr_subpops to
columns, when present and of the right dimensions.draw_p_subpops_tree now copies names from
p_anc to rows. Names from tree_subpops were
already copied to columns before.coanc_admix and fst_admix stop
if the column names of admix_proportions and the names of
coanc_subpops disagree.draw_all_admix stops if the column names of
admix_proportions and the names of either
inbr_subpops or tree_subpops disagree.draw_genotypes_admix, when
admix_proportions is passed, stops if the column names of
admix_proportions and p_ind disagree.make_p_ind_admix stops if the column names of
admix_proportions and p_subpops disagree.tree_additive now has option
force, which when TRUE simply proceeds without
stopping if additive edges were already present (in
tree$edge.length.add, which is ignored and
overwritten).New functions and bug fixes dealing with reordering tree edges and tips.
tree_reindex_tips for ensuring that tip
order agrees in both the internal labels vector and the edge matrix.
Such lack of agreement is generally possible (technically the tree is
the same for arbitrary orders of edges in the edge matrix). However,
such a disagreement causes visual disagreement in plots (for example,
trees are plotted in the order of the edge matrix, versus coancestry
matrices are ordered as in the tip labels vector instead), which can now
be fixed in general.tree_reorder for reordering tree edges
and tips to agree as much as possible with a desired tip order. The
heuristic finds the exact solution if it exists, otherwise returns a
reasonable order close to the desired order. Tip order in labels and
edge matrix agree (via tree_reindex_tips).fit_tree now outputs trees with tip order that
better agrees with the input data, and tip order in labels vector and
edge matrix now agree (via tree_reorder).tree_additive. Before this bug fix, some trees
could trigger the error message “Error: Node index 6 was not assigned
coancestry from root! (unexpected)”, where “6” could be other
numbers.draw_p_subpops_tree. Before this bug fix, some
trees could trigger the error message “Error: The root node index in
tree_subpops$edge (9) does not match
k_subpops + 1 (6) where k_subpops is the
number of tips! Is the tree_subpops object malformed?”,
where “9” and “6” could be other numbers. Other possible error messages
contain “Parent node index 6 has not yet been processed …” or “Child”
instead of “Parent”, where “6” could be other numbers.fit_tree had related fixes,
but overall fit_tree appears to have had no bugs because
users cannot provide trees, and the tree-building algorithm does not
produce scrambled edges that would have caused problems.fixed_loci and draw_all_admix
have a new parameter maf_min that, when greater than zero,
allows for treating rare variants as fixed. In
draw_all_admix, this now allows for simulating loci with
frequency-based ascertainment bias.draw_all_admix that could cause a
“stack overflow” error. The function used to call itself recursively if
require_polymorphic_loci = TRUE, and in cases where there
are very rare allele frequencies or high maf_min the number
of recursions could be so large that it triggered this error. Now the
function has a while loop, and does not recurse more than
one level at the time; there is no limit to the number of iterations and
no errors occur inherently due to large numbers of iterations.fit_tree internally simplified to use
stats::hclust, which also results in a small runtime gain.
The new code (when method = "mcquitty", which is default)
gives the same answers as before (in other words, the original algorithm
was a special case of hierarchical clustering).
method is passed to hclust.
Although all hclust methods are allowed, for this
application the only ones that make sense are “mcquitty” (WPGMA) and
“average” (UPGMA). In internal evaluations, both algorithms had similar
accuracy and runtime, but only “mcquitty” exactly recapitulates the
original algorithm.inst/CITATION (missed last time I
updated them in other locations).undiff_af for creating
“undifferentiated” allele frequency distributions based on real data but
with a lower variance (more concentrated around 0.5) according to a
given FST, useful for simulating data trying to match real data.LICENSE.md.NEWS.md slightly to improve its
automatic parsing.undiff_af:
F_max, V_in, V_out,
V_mix, and alpha.distr = "auto" cases where mixing variance
ended up being smaller than required due to roundoff errors
(alpha is now larger than given in direct formula by
eps = 10 * .Machine$double.eps, which is also a new
option.draw_all_admix added option
p_anc_distr for passing custom ancestral allele frequency
distributions (as vector or function). This differs from the similar
preexisting option p_anc, which fixed ancestral allele
frequencies per locus to those values. These two options behave
differently when loci have to be re-drawn due to being fixed or having
too-low MAFs: passing p_anc never changes those values,
whereas passing p_anc_distr results in drawing new values
as necessary. The new option is more natural biologically and results in
re-drawing fixed loci less often.undiff_af renamed parameter F to
kinship_mean, and updated all documentation to reflect the
correction that this parameter is the mean kinship and not FST (the
complete derivation will appear in a manuscript).
F_max
is similarly now kinship_mean_max.bias_coeff_admix_fit) shared by
admix_prop_1d_linear and
admix_prop_1d_circular for edge cases.
Inf, but instead an error was encountered.stats::uniroot)