We will use the ex_counts dataset included with ecodive.
This feature table contains counts of bacterial genera across various
samples.
Alpha diversity measures diversity within a single sample. In
ecodive, metrics are grouped into four categories based on
the aspect of diversity they quantify.
Richness metrics estimate the number of distinct features (e.g.,
genera) in a sample. The simplest metric, observed(),
counts features with non-zero abundance.
The Chao1 estimator extends this by inferring the
number of unobserved, low-abundance features based on the ratio of
singletons (counts == 1) to doubletons
(counts == 2).
Diversity metrics account for both richness and evenness (how equally abundances are distributed).
Simpson’s index is often used as a measure of evenness, representing the probability that two randomly selected individuals belong to different species.
# High Evenness (0.8) vs Low Evenness (0.07)
simpson(c(20, 20, 20, 20, 20))
#> [1] 0.8
simpson(c(100, 1, 1, 1, 1))
#> [1] 0.07507396
# Stool < Gums < Saliva < Nose
sort(simpson(counts))
#> Stool Gums Saliva Nose
#> 0.02302037 0.18806133 0.50725478 0.63539593 The Shannon diversity index (entropy) is another common metric that weights both richness and evenness.
Dominance metrics focus on the abundance of the most common species. The Berger-Parker index is the proportional abundance of the single most abundant feature.
Phylogenetic metrics use a phylogenetic tree to incorporate evolutionary distance. Faith’s Phylogenetic Diversity (PD) calculates the total branch length spanned by the features present in a sample.
# ex_tree:
#
# +----------44---------- Haemophilus
# +-2-|
# | +----------------68---------------- Bacteroides
# |
# | +---18---- Streptococcus
# | +--12--|
# | | +--11-- Staphylococcus
# +--11--|
# | +-----24----- Corynebacterium
# +--12--|
# +--13-- Propionibacterium
faith(c(Propionibacterium = 1, Corynebacterium = 1), tree = ex_tree)
#> [1] 60
faith(c(Propionibacterium = 1, Haemophilus = 1), tree = ex_tree)
#> [1] 82
# Nose < Gums < Saliva < Stool
sort(faith(counts, tree = ex_tree))
#> Nose Gums Saliva Stool
#> 101 155 180 202 Given:
| Metric | Formula |
|---|---|
| Abundance-based Coverage Estimator (ACE) | See below. |
| Berger-Parker Index | \(\max(P_i)\) |
| Brillouin Index | \(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\) |
| Chao1 | \(\displaystyle n + \frac{(F_1)^2}{2 F_2}\) |
| Faith’s Phylogenetic Diversity | See below. |
| Fisher’s Alpha (\(\alpha\)) | \(\displaystyle
\frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\)
(\(\alpha\) is solved for iteratively) |
| Gini-Simpson Index | \(1 - \sum_{i = 1}^{n} P_i^2\) |
| Inverse Simpson Index | \(1 / \sum_{i = 1}^{n} P_i^2\) |
| Margalef’s Richness Index | \(\displaystyle \frac{n - 1}{\ln{X_T}}\) |
| McIntosh Index | \(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\) |
| Menhinick’s Richness Index | \(\displaystyle \frac{n}{\sqrt{X_T}}\) |
| Observed Features | \(n\) |
| Shannon Diversity Index | \(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\) |
| Squares Richness Estimator | \(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\) |
Given:
\[C_{ace} = 1 - \frac{F_1}{X_{rare}}\]
\[\gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\]
\[D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2\]
Given \(n\) branches with lengths \(L\) and a binary vector \(A\) indicating presence (1) or absence (0) of descendants on each branch:
\(\sum_{i = 1}^{n} L_i A_i\)