Alpha Diversity

Alpha diversity measures diversity within a single sample. In ecodive, metrics are grouped into four categories based on the aspect of diversity they quantify.

Richness Metrics

Richness metrics estimate the number of distinct features (e.g., genera) in a sample. The simplest metric, observed(), counts features with non-zero abundance.

# Equivalent to rowSums(counts > 0)
observed(counts)
#> Saliva   Gums   Nose  Stool 
#>      4      3      4      5

The Chao1 estimator extends this by inferring the number of unobserved, low-abundance features based on the ratio of singletons (counts == 1) to doubletons (counts == 2).

# Infers 8 unobserved genera
chao1(c(1, 1, 1, 1, 2, 5, 5, 5))
#> [1] 16

# Infers less than 1 unobserved genera
chao1(c(1, 2, 2, 2, 2, 5, 5, 5))
#> [1] 8.125

# Datasets without 1s and 2s give Inf or NaN
chao1(counts)
#> Saliva   Gums   Nose  Stool 
#>    4.5    3.0    NaN    Inf

Diversity Metrics

Diversity metrics account for both richness and evenness (how equally abundances are distributed).

Simpson’s index is often used as a measure of evenness, representing the probability that two randomly selected individuals belong to different species.

# High Evenness (0.8) vs Low Evenness (0.07)
simpson(c(20, 20, 20, 20, 20))
#> [1] 0.8
simpson(c(100, 1, 1, 1, 1))
#> [1] 0.07507396

# Stool < Gums < Saliva < Nose
sort(simpson(counts))
#>      Stool       Gums     Saliva       Nose 
#> 0.02302037 0.18806133 0.50725478 0.63539593

The Shannon diversity index (entropy) is another common metric that weights both richness and evenness.

# High richness, High evenness
shannon(rep(100, 100))
#> [1] 4.60517

# Stool < Gums < Saliva < Nose
sort(shannon(counts))
#>      Stool       Gums     Saliva       Nose 
#> 0.07927797 0.35692121 0.74119910 1.10615349

Dominance Metrics

Dominance metrics focus on the abundance of the most common species. The Berger-Parker index is the proportional abundance of the single most abundant feature.

# Stool is dominated by Bacteroides (341/345 counts -> ~0.99)
# Nose is more balanced; Corynebacterium is max (171/345 counts -> ~0.49)
sort(berger(counts))
#>      Nose    Saliva      Gums     Stool 
#> 0.4956522 0.5217391 0.8956522 0.9884058

Phylogenetic Metrics

Phylogenetic metrics use a phylogenetic tree to incorporate evolutionary distance. Faith’s Phylogenetic Diversity (PD) calculates the total branch length spanned by the features present in a sample.

# ex_tree:
#
#       +----------44---------- Haemophilus
#   +-2-|
#   |   +----------------68---------------- Bacteroides  
#   |                      
#   |             +---18---- Streptococcus
#   |      +--12--|       
#   |      |      +--11-- Staphylococcus
#   +--11--|              
#          |      +-----24----- Corynebacterium
#          +--12--|
#                 +--13-- Propionibacterium


faith(c(Propionibacterium = 1, Corynebacterium = 1), tree = ex_tree)
#> [1] 60

faith(c(Propionibacterium = 1, Haemophilus = 1), tree = ex_tree)
#> [1] 82

# Nose < Gums < Saliva < Stool
sort(faith(counts, tree = ex_tree))
#>   Nose   Gums Saliva  Stool 
#>    101    155    180    202

Formulas

Given:

\(n\) : Number of features (e.g. species, OTUs, ASVs).
\(X_i\) : Integer count of the \(i\)-th feature.
\(X_T\) : Total of all counts (sequencing depth). \(X_T = \sum_{i=1}^{n} X_i\)
\(P_i\) : Proportional abundance of the \(i\)-th feature. \(P_i = X_i / X_T\)
\(F_1\) : Number of singletons (\(X_i = 1\)).
\(F_2\) : Number of doubletons (\(X_i = 2\)).

Metric	Formula
Abundance-based Coverage Estimator (ACE)	See below.
Berger-Parker Index	\(\max(P_i)\)
Brillouin Index	\(\displaystyle \frac{\ln{[(\sum_{i = 1}^{n} X_i)!]} - \sum_{i = 1}^{n} \ln{(X_i!)}}{\sum_{i = 1}^{n} X_i}\)
Chao1	\(\displaystyle n + \frac{(F_1)^2}{2 F_2}\)
Faith’s Phylogenetic Diversity	See below.
Fisher’s Alpha (\(\alpha\))	\(\displaystyle \frac{n}{\alpha} = \ln{\left(1 + \frac{X_T}{\alpha}\right)}\) (\(\alpha\) is solved for iteratively)
Gini-Simpson Index	\(1 - \sum_{i = 1}^{n} P_i^2\)
Inverse Simpson Index	\(1 / \sum_{i = 1}^{n} P_i^2\)
Margalef’s Richness Index	\(\displaystyle \frac{n - 1}{\ln{X_T}}\)
McIntosh Index	\(\displaystyle \frac{X_T - \sqrt{\sum_{i = 1}^{n} (X_i)^2}}{X_T - \sqrt{X_T}}\)
Menhinick’s Richness Index	\(\displaystyle \frac{n}{\sqrt{X_T}}\)
Observed Features	\(n\)
Shannon Diversity Index	\(-\sum_{i = 1}^{n} P_i \times \ln(P_i)\)
Squares Richness Estimator	\(\displaystyle n + \frac{(F_1)^2 \sum_{i=1}^{n} (X_i)^2}{X_T^2 - nF_1}\)

Abundance-based Coverage Estimator (ACE)

Given:

\(r\) : Rare cutoff (features with \(\le r\) counts are considered rare).
\(F_{rare}\) : Number of rare features.
\(F_{abund}\) : Number of abundant features (\(> r\) counts).
\(X_{rare}\) : Total counts belonging to rare features.
\(C_{ace}\) : Sample abundance coverage estimator.
\(\gamma_{ace}^2\) : Estimated coefficient of variation.

\[C_{ace} = 1 - \frac{F_1}{X_{rare}}\]

\[\gamma_{ace}^2 = \max\left[\frac{F_{rare} \sum_{i=1}^{r}i(i-1)F_i}{C_{ace}X_{rare}(X_{rare} - 1)} - 1, 0\right]\]

\[D_{ace} = F_{abund} + \frac{F_{rare}}{C_{ace}} + \frac{F_1}{C_{ace}}\gamma_{ace}^2\]

Faith’s Phylogenetic Diversity (Faith’s PD)

Given \(n\) branches with lengths \(L\) and a binary vector \(A\) indicating presence (1) or absence (0) of descendants on each branch:

\(\sum_{i = 1}^{n} L_i A_i\)

Alpha Diversity

Input Matrix

Alpha Diversity

Richness Metrics

Diversity Metrics

Dominance Metrics

Phylogenetic Metrics

Formulas

Abundance-based Coverage Estimator (ACE)

Faith’s Phylogenetic Diversity (Faith’s PD)