Posterior tree distribution

Gustavo A. Ballen and Sandra Reinales

2025-08-19

There are two steps necessary for summarizing tree samples. One is to decide how many distinct topologies (disregarding branch lengths) are in the collection of trees, and then we need to summarize branch lengths for each of these tree subsets using any sensible summary measure (e.g., mean or median). The functions topoFreq and summaryBrlen aid in these two tasks respectively. topoFreq uses the Robinson-Foulds (RF) distance among trees in order to find clusters which are identical in topology and then bind them together in a list that we can later operate over. The RF distance is defined over unrooted trees, so it is necessary to check whether the input trees are. Unrooting a rooted tree will destroy the root node and then add up together both adjacent branch lengths, and this will be reflected when summarizing withsummaryBrlens using the output of topoFreq. This will return one branch length less than the number present in the original sample of rooted trees. Accordingly, this method will only work on collections of unrooted trees, or collections of time trees of fixed topology where we are only interested in finding a summary values for branch lengths. It should not be used when both the topology and divergence times are being coestimated (e.g., when the posterior of trees comes from not fixing the topology as in Beast2, RevBayes, and MrBayes), but it is perfectly fine when the inference was done to estimate only the unrooted topology and branch lengths, such as in standard tree inference in MrBayes.

We see an example using these two functions below where we want to use the median branch length for each group:

# load the packages
library(ape)
library(tbea)

# these urls render correctly on html but not on pdf. The former is more important for CRAN.
# download the trees from the supplementary material of Ballen et al. (2022)
t1<-"https://raw.githubusercontent.com/gaballench/cynodontidaeWare/refs/heads/main/bayesian/mrbayes"
t2<-"https://raw.githubusercontent.com/gaballench/cynodontidaeWare/refs/heads/main/bayesian/mrbayes"

mtr1 <- read.nexus(paste(t1, "concatenatedMolmorph.nexus.run1.t", sep="/"))
mtr2 <- read.nexus(paste(t2, "concatenatedMolmorph.nexus.run2.t", sep="/"))

trees <- c(mtr1, mtr2)

# calculate topological frequencies
tpf <- topoFreq(trees, output="trees")

# summarize median branch length
sumtrees <- summaryBrlen(tpf$trees, method="median")

# sort the frequencies in decreasing order
decreasingIdx <- order(tpf$fabs, decreasing=TRUE)

# how many topologies comprise about 90% of the trees?
sum(cumsum(tpf$fabs[decreasingIdx])/sum(tpf$fabs)<0.9)
## [1] 4
# plot these four trees with branch lengths already summarized
par(mfrow=c(2,2))
par(oma=c(0, 0, 0, 0))
par(mai=c(0.5, 0.2, 0.5, 0.2))
for(i in 1:4){
    plot(sumtrees[[decreasingIdx[i]]],
         type="unrooted",
         show.node.label=FALSE,
         cex=0.4,
         main=round(tpf$frel[decreasingIdx[i]], digits=2))
}