After carrying out a divergence time analysis with
Beast2
, for example, we might be interested in comparing
the prior and the posterior of node ages. We can use the function
crossplot
to generate a plot where the x-axis represents
one analysis and the y-axis another, for instance, prior versus
posterior:
# load the package
library(tbea)
# crossplot operates over files or dataframes. Let's create
# two dataframes to exemplified the desirable structure of input data
log1 <- data.frame(sample=seq(from=1, to=10000, by = 100),
node1=rnorm(n =100, mean=41, sd=0.5),
node2=rnorm(n =100, mean=50, sd=1),
node3=rnorm(n =100, mean=25, sd=1))
log2 <- data.frame(sample=seq(from=1, to=10000, by = 100),
node1=rnorm(n =100, mean=41, sd=0.2),
node2=rnorm(n =100, mean=50, sd=0.8),
node3=rnorm(n =100, mean=25, sd=0.5))
head(log1)
## sample node1 node2 node3
## 1 1 41.31601 51.74525 23.02448
## 2 101 40.66821 50.77049 23.20232
## 3 201 40.46920 48.64849 23.19390
## 4 301 41.16130 47.93125 23.11945
## 5 401 41.05605 50.77709 25.37174
## 6 501 41.20240 50.11384 25.17615
# run crossplot over nodes 1 and 2 using 'idx.cols' instead of 'pattern', and
# plot the mean instead of the median.
crossplot(log1, log2,
idx.cols=c(2,3),
stat="mean",
bar.lty=1,
bar.lwd=1,
identity.lty=2,
identity.lwd=1,
extra.space=0.5,
main="My first crossplot",
xlab="log 1",
ylab="log 2",
pch=19)
# now, load empirical data
data(cynodontidae.prior)
data(cynodontidae.posterior)
# as crossplot operates also over files, let's create temporal
# files for illustration
write.table(cynodontidae.prior, "prior.tsv",
row.names=FALSE, col.names=TRUE, sep="\t")
write.table(cynodontidae.posterior, "posterior.tsv",
row.names=FALSE, col.names=TRUE, sep="\t")
# crossplot
crossplot(log1="prior.tsv",
log2="posterior.tsv",
stat="median",
skip.char="#",
pattern="mrca.date",
bar.lty=1,
bar.lwd=2,
identity.lty=2,
identity.lwd=2,
main="Prior-posterior comparison\nCynodontidae",
xlab="Prior node age (Ma)",
ylab="Posterior node age (Ma)",
pch=20, cex=2, col="blue2")
This kind of plot has been used in the literature when comparing
prior and posterior MCMC samples, as well as when comparing the same
kind of estimates coming from different independent runs or types of
analysis. The function measureSimil
integrates the area
under the curve defined as the intersection between both distributions.
It is a descriptive measure of how similar two distributions are. The
function can both plot the resulting distributions and their
intersection, as well as print out its value, or skip the plot and just
return the value:
# integrate the area under the curve
measureSimil(d1=cynodontidae.prior$mrca.date.backward.Hydrolycus.,
d2=cynodontidae.posterior$mrca.date.backward.Hydrolycus.,
ylim=c(0, 5),
xlab="Age (Ma)",
ylab="Density",
main="Similarity between prior and posterior\nof the node Hydrolycus")
## [1] 0.8078552
## [1] TRUE
## [1] TRUE