Phylogenetic Eigenvector Maps (PEM) is a method to perform phylogenetic modelling. Phylogenetic modelling consists in modelling trait evolution and predicting trait values using phylogeny as an explanatory factor (Guénard, Legendre, and Peres-Neto 2013). Phylogenetic modelling allows one to predict trait values when it is difficult or impractical to obtain them, for instance when species are rare, extinct, or when information is needed for several species and trait values are only available for a relatively small number of them (Guénard et al. 2011, 2014).
To apply phylogenetic modelling, one needs to have a set of species with known phylogeny and trait values (hereafter referred to as the model species) as well as to know the locations, with respect to the phylogeny of the models species, of the species for which trait values are being predicted (hereafter referred to as the target species). Phylogenetic modelling can be performed jointly with trait correlation modelling: it is possible to use other traits with known (or estimable) values for the target species to help predict a trait of interest. Phylogenetic trees being acyclic graphs, I will hereby prefer terms belonging to the graph theory over terms phylogeneticists may be more familiar with. Therefore I will use edge over branches and vertex over root, node or tip; safe in cases where I want to be specific about what a vertex represents.
The Phylogenetic eigenvector maps (PEM) expression
Statistical maps are a type of geographic map representing the values or states of a variable across space <https://encyclopedia2.thefreedictionary.com/statistical+map).
In a paper entitled “The interpretation of statistical maps”, (Moran 1948) described tests of significance of the spatial relationships among values of qualitative variables on statistical maps.
“Moran’s eigenvector maps” (MEM), an expression coined by Dray, Legendre, and Peres-Neto (2006), describes the variation of spatial eigenvectors whose eigenvalues are proportional to Moran’s I spatial autocorrelation statistics (Moran 1950) of the corresponding eigenvectors. Spatial eigenvectors are mathematical constructs that describe the variation of quantities across space (or time) at different spatial scales. They were originally called Principal Coordinates of Neighbour Matrices by Borcard and Legendre (2002).
Phylogenetic eigenvector maps (PEM) (Diniz-Filho et al. 2012; Guénard, Legendre, and Peres-Neto 2013) are sets of eigenfunctions describing the structure of a phylogenetic graph, which represents either a Darwinian phylogenetic tree or a reticulated tree, i.e., a phylogenetic tree with reticulations. The various eigenvectors describe the variation across a phylogeny at different phylogenetic scales.
Contrary to MEM, the eigenvalues in PEM are not proportional to Moran’s I autocorrelation coefficients of the corresponding eigenvectors.
The PEM work flow consists in
calculating the influence matrix of the graph,
specifying a model of trait evolution along the edges of the phylogenetic tree,
calculating the left eigenvectors of the weighted and centred influence matrix and
use these eigenvectors as descriptors (Guénard, Legendre, and Peres-Neto 2013).
An R language implementation of that approach is found in package MPSEM. MPSEM was developed to make the aforementioned process as seamless as possible. It is a work in progress; I welcome anyone to provide relevant suggestions and constructive remarks aimed at making MPSEM a better, more efficient and user-friendly, interface to phylogenetic modelling.
Assuming package MPSEM is installed, the first step to calculate a PEM is to load package MPSEM, which depends on packages ape and MASS:
## Le chargement a nécessité le package : ape
## Le chargement a nécessité le package : magrittr
For the present tutorial, we will use the data set perissodactyla from R package caper. These data from Purvis and Rambaut (1995) are loaded into your R workspace as follows:
Fig. 1. The phylogenetic tree used for this example.
The perissodactyla data set contains
perissodactyla.tree
, a phylogenetic tree encompassing 18
odd-toed ungulate species (Fig. 1) and perissodactyla.data
,
a data frame containing trait information about the species. For the
present study we will model the \(\log_{10}\) gestation weight as a function
of phylogeny and \(\log_{10}\) adult
female weight:
Binomial | log.female.wt | log.neonatal.wt |
---|---|---|
Ceratotherium simum | 6.262 | 4.903 |
Dicerorhinus sumatrensis | 5.910 | 4.544 |
Diceros bicornis | 6.183 | 4.699 |
Equus africanus | 5.439 | 4.398 |
Equus burchelli | 5.477 | 4.477 |
Equus grevyi | 5.653 | 4.602 |
Equus hemionus | 5.462 | 4.398 |
Equus zebra | 5.462 | 4.398 |
Rhinoceros sondaicus | 6.153 | 4.699 |
Rhinoceros unicornis | 6.237 | 4.845 |
Tapirus indicus | 5.455 | 3.919 |
Tapirus pinchaque | 5.377 | 3.699 |
Tapirus terrestris | 5.332 | 3.760 |
Before going any further, it is important to make sure that the
species in the tree object are the same and presented in the same order
as those in the data table. Glancing at the data table, species clearly
cannot match since the latter feature information for only 13 of the 18
species in the tree. We will therefore match the tip labels of the
original tree in the data table using the binary (Latin) species names
in a character vector spmatch
. When no matching element
from the data table is found, the value NA
appears at the
corresponding position in spmatch
. We can therefore use
these missing values to reference the species that can be dropped from
the tree using function drop.tip()
from package
ape as follows:
match(
perissodactyla.tree$tip.label,
perissodactyla.data[,1L]
) -> spmatch
drop.tip(
perissodactyla.tree,
perissodactyla.tree$tip.label[is.na(spmatch)]
) -> perissodactyla.tree
Now that the data match the tree in terms of species content, we then need to make sure that species ordering also matches as follows:
## [,1] [,2]
## [1,] "Dicerorhinus sumatrensis" "Ceratotherium simum"
## [2,] "Rhinoceros sondaicus" "Dicerorhinus sumatrensis"
## [3,] "Rhinoceros unicornis" "Diceros bicornis"
## [4,] "Diceros bicornis" "Equus africanus"
## [5,] "Ceratotherium simum" "Equus burchelli"
## [6,] "Tapirus terrestris" "Equus grevyi"
## [7,] "Tapirus pinchaque" "Equus hemionus"
## [8,] "Tapirus indicus" "Equus zebra"
## [9,] "Equus grevyi" "Rhinoceros sondaicus"
## [10,] "Equus burchelli" "Rhinoceros unicornis"
## [11,] "Equus zebra" "Tapirus indicus"
## [12,] "Equus africanus" "Tapirus pinchaque"
## [13,] "Equus hemionus" "Tapirus terrestris"
Since they do not, we need to recalculate spmatch
with
the new, reduced, tree and re-order the data accordingly:
match(
perissodactyla.tree$tip.label,
perissodactyla.data[,1L]
) -> spmatch
perissodactyla.data[spmatch,] -> perissodactyla.data
all(perissodactyla.tree$tip.label == perissodactyla.data[,1L])
## [1] TRUE
The last code line is just a last check to guarantee that all species names are matching. As a last step before we are done with data manipulation, I will put the binary names in place of the row names and delete the table’s first row:
perissodactyla.data[,1L] -> rownames(perissodactyla.data)
perissodactyla.data[,-1L] -> perissodactyla.data
Our data of interest now appear as follows:
log.female.wt | log.neonatal.wt | |
---|---|---|
Dicerorhinus sumatrensis | 5.910 | 4.544 |
Rhinoceros sondaicus | 6.153 | 4.699 |
Rhinoceros unicornis | 6.237 | 4.845 |
Diceros bicornis | 6.183 | 4.699 |
Ceratotherium simum | 6.262 | 4.903 |
Tapirus terrestris | 5.332 | 3.760 |
Tapirus pinchaque | 5.377 | 3.699 |
Tapirus indicus | 5.455 | 3.919 |
Equus grevyi | 5.653 | 4.602 |
Equus burchelli | 5.477 | 4.477 |
Equus zebra | 5.462 | 4.398 |
Equus africanus | 5.439 | 4.398 |
Equus hemionus | 5.462 | 4.398 |
Finally, for the sake of demonstrating how to obtain predictions, we
will remove the Sumatran rhinoceros (Dicerorhinus sumatrensis),
the first species on top of the table) to obtain our training data set
{perissodactyla.train
}, keep the withdrawn data as
{perissodactyla.test
}, and calculate a tree without the
target species:
As previously announced, I use the vocabulary of the graph theory when describing PEM: a tree is a (directed) graph, a branch is an edge, and the root, nodes, and tips are vertices. PEM allows one to specify a model of trait evolution along the edges of the tree. The trait evolution model is a power function of the edge lengths, with parameters \(a\) and \(\psi\) describing the shape of the relationship between the edge lengths and trait evolution rate:
\[ w_{a,\psi}(\phi_{j})= \begin{cases} \psi\phi^{\frac{1-a}{2}} & \phi_{j}>0\\ 0 & \phi_{j}=0, \end{cases} \] where \(a\) is the steepness parameter describing how abrupt the changes in trait values occur with time following branching, \(\psi\) is the evolution rate of the trait, and \(\phi_{j}\) is the length of edge \(j\) (Guénard, Legendre, and Peres-Neto 2013).
Fig. 2. Output of the edge weighting function for different sets of parameters \(a\) and \(\psi\).
As the steepness parameter increases, the weight assigned to a given edge increases more sharply with respect to the phylogenetic distance (or evolutionary time; Fig. 2). In the context of PEM, the edge weight represent the relative rate of evolution of the trait; the greater the edge weight, the greater the trait change along that edge. When \(a=0\), trait evolution is neutral and therefore proceeds by random walk along edges. When \(a=1\), edge weights no longer increase as a function of edge lengths. That situation corresponds to the scenario in which trait evolution is driven by the strongest possible natural selection: following a speciation event, trait either change abruptly (directional selection) at the vertex or do not change at all (stabilizing selection).
Also, the shape parameters may differ for different parts of the phylogenetic tree or network. For instance, the trait may have evolved neutrally at a steady rate from the base of a tree up to a certain node, and then, may have been subjected to different levels of selection pressure and various evolution rate from some of the nodes towards their descendent tips.
The first step to build a PEM is to convert the
phylogenetic tree. The is done by giving the tree to function
Phylo2DirectedGraph()
as follows:
Here’s a snipet showing how the graph container used by MPSEM stores graph information:
## List of 2
## $ edge :List of 3
## ..$ : num [1:21] 13 14 15 16 16 15 17 17 14 18 ...
## ..$ : num [1:21] 14 15 16 1 2 17 3 4 18 19 ...
## ..$ distance: int [1:21] 9 4 3 1 1 3 1 1 5 2 ...
## $ vertex:List of 1
## ..$ species: logi [1:22] TRUE TRUE TRUE TRUE TRUE TRUE ...
## - attr(*, "ev")= int [1:2] 21 22
## - attr(*, "class")= chr "graph"
## - attr(*, "elabel")= chr [1:21] "E1" "E2" "E3" "E4" ...
## - attr(*, "vlabel")= chr [1:22] "Rhinoceros sondaicus" "Rhinoceros unicornis" "Diceros bicornis" "Ceratotherium simum" ...
This list contains two main elements, $edge
and
$vertex
, plus additional information. $edge
and $vertex
elements each contain a list of
sub-elements:
Element $edge
is a list containing information about
the edges of the graph, namely the indices of their origin and
destination vertices (the two first unnamed elements) and an arbitrary
number of supplementary elements storing other edge properties. In the
present case, a numeric vector created by
Phylo2DirectedGraph()
and called $distance
stores the phylogenetic distances (\(\phi_{j}\)), which, in this example,
correspond to the branch lengths of
perissodactyla.tree
.
The element $vertex
is a list containing an
arbitrary number of elements storing vertex properties. In the present
case, a logical vector created by Phylo2DirectedGraph()
and
called $species
stores whether a given vertex represents a
species (i.e., it is a tip) or not (i.e., it is a
node).
In addition to edge and vertex information, the container stores other useful information in the form of attributes:
ev
stores the number of edges and vertices.
elabel
stores the edge labels.
vlabel
stores the vertex labels.
In MPSEM, PEM are build using
function PEM.build()
. As an example, let us assume that the
steepness and evolution rate are \(a=0.25\) and \(\psi=2\) among genus Equus, \(a=0.8\) and \(\psi=0.5\) among genus Tapirus,
and \(a=0\) and \(\psi=1\) from the root of the tree up to
the vertex where the two latter genera begin as well as among the other
genera. The following figure will help us figure out the indices of the
edges involved:
perissodactyla.tree.train -> tree
paste("N",1L:tree$Nnode) -> tree$node.label
par(mar=c(2,2,2,2))
plot(tree,show.node.label=TRUE)
edgelabels(
1L:nrow(tree$edge),
edge=1L:nrow(tree$edge),
bg="white",
cex=0.75
)
Fig. 3. The labelled training species tree for this example.
Hence, \(a=0.25\) and \(\psi=2\) for edges 15–21, \(a=0.8\) and \(\psi=0.5\) for edges 10–13, and \(a=0\) and \(\psi=1\) for edges 1–9 and 14:
rep(0,attr(perissodactyla.pgraph,"ev")[1L]) -> steepness
rep(1,attr(perissodactyla.pgraph,"ev")[1L]) -> evol_rate
steepness[15L:21] <- 0.25
evol_rate[15L:21] <- 2
steepness[9L:13] <- 0.8
evol_rate[9L:13] <- 0.5
The PEM is obtained as follows:
PEM.build(
perissodactyla.pgraph,
d="distance",
sp="species",
a=steepness,
psi=evol_rate
) -> perissodactyla.PEM
In addition to the phylogenetic graph, function
PEM.build()
needs arguments d
, the name of the
edge property where the phylogenetic distances are stored,
sp
, the name of the vertex property specifying what vertex
is a species, as well as the user-specified steepness and evolution
rate. When the vectors given to arguments a
or
psi
, have smaller sizes then the number of edges, the
values are recycled. The default values for d
and
sp
are those produced by
Phylo2DirectedGraph()
, and can therefore be omitted in most
cases. The object that MPSEM uses to store
PEM information is rather complex and we will hereby
not browse through it. Method as.data.frame
can be used to
extract the eigenvector from the PEM-class
object. For a
set of \(n\) species, that method
returns a matrix encompassing at most \(n-1\) column vectors that can be used in
models to represent phylogenetic structure in traits. Here the
phylogenetic patterns of variation described by two eigenvectors of the
PEM we calculated above:
layout(matrix(c(1,1,1,2,2,3,3),1L,7L))
par(mar=c(5.1,2.1,4.1,2.1))
## Singular vectors are extracted using the as.data.frame method:
as.data.frame(perissodactyla.PEM) -> perissodactyla.U
plot(perissodactyla.tree.train, x.lim=60, cex=1.5)
plot(y = 1L:nrow(perissodactyla.train), ylab="", xlab = "Loading",
x = perissodactyla.U[,1L], xlim=0.5*c(-1,1),
axes=FALSE, main = expression(bold(v)[1]), cex=1.5)
axis(1)
abline(v=0)
plot(y = 1L:nrow(perissodactyla.train), ylab="", xlab = "Loading",
x = perissodactyla.U[,5L], xlim=0.5*c(-1,1),
axes=FALSE, main = expression(bold(v)[5]), cex=1.5)
axis(1)
abline(v=0)
Fig. 4. Example of two eigenvectors obtained from the training species phylogeny.
The pattern shown by the first eigenvector essentially contrasts Equids and the other odd-toed ungulate species whereas the pattern shown by the second eigenvector essentially contrasts tapirs and Rhinocerotids.
Because users do often not have information about the best set of
weighting function parameters to use for modelling,
MPSEM as has a function called
PEM.fitSimple()
that allows them to empirically estimate a
single value of parameter \(a\) for the
whole phylogeny using restricted maximum likelihood. Function
PEM.fitSimple()
does not estimate parameter \(\psi\) because the latter has no effect
when its value is assumed to be constant throughout the phylogeny. A
function to estimate different sets of weighting function parameters for
different portions of the phylogeny has yet to be developed. That
function requires a response variable that will be used to optimize the
steepness parameter (here the \(\log_{10}\) neonate weight) as well as
lower and upper bounds for the admissible parameter values and is called
as follows:
PEM.fitSimple(
y = perissodactyla.train[,"log.neonatal.wt"],
x = NULL,
w = perissodactyla.pgraph,
d = "distance",
sp="species",
lower = 0,
upper = 1
) -> perissodactyla.PEM_opt1
If other traits are to be used in the model (here the \(\log_{10}\) female weight), they are passed
to argument x
as follows:
PEM.fitSimple(
y = perissodactyla.train[,"log.neonatal.wt"],
x = perissodactyla.train[,"log.female.wt"],
w = perissodactyla.pgraph,
d = "distance",
sp="species",
lower = 0,
upper = 1
) -> perissodactyla.PEM_opt2
The results of the latter calls are a PEM similar to
that obtained using PEM.build()
, with additional
information resulting from the optimization process. It is noteworthy
that estimates of the steepness parameter (stored as element
$optim\$par
of the PEM objects) and,
consequently, the resulting phylogenetic eigenvectors, will be different
depending on the use of auxiliary traits. In the example above, for
instance, \(a\) was estimated to 0 by
PEM.fitSimple()
when no auxiliary trait is involved (first
call) and to 0.08 when the female weight is used as an auxiliary trait
(second call).
To model trait values, PEM are used as descriptors
in other modelling method. Any suitable method can be used. For
instance, package MPSEM contains a utility function
called lmforwardsequentialAICc()
that does step-wise
variable addition in multiple regression analysis on the basis of the
corrected Akaike Information Criterion (AICc) (Hurvich and Tsai 1993):
lmforwardsequentialAICc(
y = perissodactyla.train[,"log.neonatal.wt"],
object = perissodactyla.PEM_opt1
) -> lm1
summary(lm1)
##
## Call:
## lm(formula = as.formula(paste(p1, p2, sep = "")), data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.122057 -0.053781 -0.004971 0.055333 0.186364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.39975 0.02740 160.58 < 2e-16 ***
## V_2 1.31105 0.09491 13.81 7.7e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09491 on 10 degrees of freedom
## Multiple R-squared: 0.9502, Adjusted R-squared: 0.9452
## F-statistic: 190.8 on 1 and 10 DF, p-value: 7.699e-08
lmforwardsequentialAICc(
y = perissodactyla.train[,"log.neonatal.wt"],
x = perissodactyla.train[,"log.female.wt",drop=FALSE],
object = perissodactyla.PEM_opt2
) -> lm2
summary(lm2)
##
## Call:
## lm(formula = as.formula(paste(p1, p2, sep = "")), data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.024813 -0.010155 -0.003962 0.008556 0.040311
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.66734 0.12638 -21.106 7.37e-07 ***
## log.female.wt 1.23818 0.02211 56.001 2.18e-09 ***
## V_1 -0.91653 0.02742 -33.432 4.77e-08 ***
## V_9 -0.09882 0.02362 -4.183 0.00579 **
## V_8 0.08489 0.02378 3.570 0.01178 *
## V_3 0.07163 0.02378 3.012 0.02364 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0236 on 6 degrees of freedom
## Multiple R-squared: 0.9982, Adjusted R-squared: 0.9966
## F-statistic: 648.5 on 5 and 6 DF, p-value: 4.126e-08
It is noteworthy that in order to pass a single auxiliary trait to
lmforwardsequentialAICc()
, it is necessary, as exemplified above, to setdrop=FALSE
to the extraction operator ([]
) to keep thedata.frame
property of the object. Hence, it is crucial that the auxiliary trait values be referenced with the same variable names in alldata.frame
objects involved in the analysis.Since, the extraction operator drops the
data.frame
property on its output object by default whenever a single column is selected, it is mandatory to setdrop=FALSE
when a single variable is (or could be) selected. Adata.frame
object is required for making predictions using the resulting linear model.
To obtain predictions, we need to calculate the locations of the
target species with respect to the phylogeny of the model species. This
is accomplished by getGraphLocations()
, to which we give
the tree for all species (model + targets) and the names (or indices) of
the target species. Then, we use the predict()
method for
PEM objects. The latter takes, in addition to the
PEM object, the locations of the target species as
obtained by getGraphLocations()
, an lm
(or
glm
) object involving the eigenvectors of the
PEM, and a table of auxiliary trait values for the
target species, which can be omitted if no auxiliary trait is present in
the linear model.
getGraphLocations(
perissodactyla.tree,
targets = "Dicerorhinus sumatrensis"
) -> perissodactyla.loc
predict(
object = perissodactyla.PEM_opt2,
targets = perissodactyla.loc,
lmobject = lm2,
newdata = perissodactyla.test,
interval = "prediction",
level = 0.95) -> pred
Here, the predicted neonatal weight for the Sumatran rhinoceros is 26.7 \(\,\mathrm{kg}\) and the bounds of the \(95\%\) prediction interval are 17.6 and 40.4 \(\,\mathrm{kg}\), while the observed value was actually 35 \(\,\mathrm{kg}\).
Here is and example of how to perform a leave-one-out
cross-validation of a data set using the R code from
the previous two sections. Predictions will be added to table
perissodactyla.data
:
data.frame(
perissodactyla.data,
predictions = NA,
lower = NA,
upper = NA
) -> perissodactyla.data
jackinfo <- list()
for(i in 1L:nrow(perissodactyla.data)) {
jackinfo[[i]] <- list()
getGraphLocations(
perissodactyla.tree,
targets = rownames(perissodactyla.data)[i]
) -> jackinfo[[i]][["loc"]]
PEM.fitSimple(
y = perissodactyla.data[-i,"log.neonatal.wt"],
x = perissodactyla.data[-i,"log.female.wt"],
w = jackinfo[[i]][["loc"]]$x
) -> jackinfo[[i]][["PEM"]]
lmforwardsequentialAICc(
y = perissodactyla.data[-i,"log.neonatal.wt"],
x = perissodactyla.data[-i,"log.female.wt",drop=FALSE],
object = jackinfo[[i]][["PEM"]]
) -> jackinfo[[i]][["lm"]]
predict(
object = jackinfo[[i]][["PEM"]],
targets = jackinfo[[i]][["loc"]],
lmobject = jackinfo[[i]][["lm"]],
newdata = perissodactyla.data[i,"log.female.wt",drop=FALSE],
interval = "prediction",
level = 0.95
) -> predictions
unlist(predictions) -> perissodactyla.data[i, c("predictions", "lower", "upper")]
}
## Warning in PEM.fitSimple(y = perissodactyla.data[-i, "log.neonatal.wt"], : No
## optimum found... Message from optim() - ERROR: ABNORMAL_TERMINATION_IN_LNSRCH.
## Status = 52
Because the result of getGraphLocations()
includes the
phylogenetic graph without the target species Dicerorhinus
sumatrensis, which was removed from the tree given as the argument
tpall
and can be found as its element $x
, it
is not necessary to re-calculate the tree with the target species
dropped, as well as the phylogenetic graph, as we did previously for
explanatory purposes. Also, the internal information about each
cross-validation steps is stored into a list (hereby called
jackinfo
), in order for the details of the analyses to be
accessible later on.
Fig. 5. Leave-one-out crossvalidated prediction of the neonatal weight for 13 odd-toed ungulate species.
From the present cross-validation, we found that the (\(\log_{10}\)) neonatal body mass can be predicted with a cross-validated \(R^{2}\) of 0.96 (Fig. 5).
The influence matrix is used internally to calculate PEM. It is a matrix having as many rows as the number of vertices (species + nodes) and as many columns as the number of edges. Any given element of the influence matrix is coding whether a vertex, which is represented a row of the matrix is influenced an edge, which is represented by a column of the matrix. In the context of PEM, a vertex is influenced by an edge when the former has ancestors on the latter or, in other words, when an edge is on the path leading from a tip to the root of the tree. The influence matrix is obtained as follows:
## E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17
## Rhinoceros sondaicus 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## Rhinoceros unicornis 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
## Diceros bicornis 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
## Ceratotherium simum 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
## Tapirus terrestris 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
## Tapirus pinchaque 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0
## Tapirus indicus 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0
## Equus grevyi 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
## Equus burchelli 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
## Equus zebra 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
## Equus africanus 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## Equus hemionus 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## n1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## n2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## n3 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## n4 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## n5 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
## n6 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
## n7 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
## n8 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
## n9 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
## n10 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0
## E18 E19 E20 E21
## Rhinoceros sondaicus 0 0 0 0
## Rhinoceros unicornis 0 0 0 0
## Diceros bicornis 0 0 0 0
## Ceratotherium simum 0 0 0 0
## Tapirus terrestris 0 0 0 0
## Tapirus pinchaque 0 0 0 0
## Tapirus indicus 0 0 0 0
## Equus grevyi 0 0 0 0
## Equus burchelli 1 0 0 0
## Equus zebra 0 1 0 0
## Equus africanus 0 0 1 0
## Equus hemionus 0 0 0 1
## n1 0 0 0 0
## n2 0 0 0 0
## n3 0 0 0 0
## n4 0 0 0 0
## n5 0 0 0 0
## n6 0 0 0 0
## n7 0 0 0 0
## n8 0 0 0 0
## n9 0 0 0 0
## n10 0 0 0 0
The calculation of the influence matrix performed by
PEM.build()
for a given phylogenetic graph need not be done
every time new weighting function parameters are to be tried. For that
reason, MPSEM provides a function called
PEM.updater()
that takes a previously calculated
PEM object, applies new edge weighting, and
recalculates the phylogenetic eigenvectors:
## A phylogenetic eigenvector map (PEM) for 12 species:
## Rhinoceros sondaicus,Rhinoceros unicornis,Diceros bicornis,Ceratotherium simum,Tapirus terrestris,Tapirus pinchaque,Tapirus indicus,Equus grevyi ... Equus hemionus
## obtained from the following phylogenetic graph:
##
## A graph with 21 edges and 22 vertices.
## Edge labels: E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21
## Vertex labels: Rhinoceros sondaicus Rhinoceros unicornis Diceros bicornis Ceratotherium simum Tapirus terrestris Tapirus pinchaque Tapirus indicus Equus grevyi Equus burchelli Equus zebra Equus africanus Equus hemionus n1 n2 n3 n4 n5 n6 n7 n8 n9 n10
## Available edge information: distance
## Available vertex information: species
The result of PEM.build()
and PEM.updater()
does not contain all the information necessary to predict trait values.
Hence, neither of these functions is given information about the
response variable and auxiliary traits. To perform these preliminary
calculations, MPSEM provides the user with function
PEM.forcedSimple()
that produce the same output as
PEM.fitSimple()
with user-provided values of weighting
parameters. It is called as follows:
PEM.forcedSimple(
y = perissodactyla.train[,"log.neonatal.wt"],
x = perissodactyla.train[,"log.female.wt"],
w = perissodactyla.pgraph,
a = steepness,
psi = evol_rate
) -> res
res
## A phylogenetic eigenvector map (PEM) for 12 species:
## Rhinoceros sondaicus,Rhinoceros unicornis,Diceros bicornis,Ceratotherium simum,Tapirus terrestris,Tapirus pinchaque,Tapirus indicus,Equus grevyi ... Equus hemionus
## obtained from the following phylogenetic graph:
##
## A graph with 21 edges and 22 vertices.
## Edge labels: E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 E21
## Vertex labels: Rhinoceros sondaicus Rhinoceros unicornis Diceros bicornis Ceratotherium simum Tapirus terrestris Tapirus pinchaque Tapirus indicus Equus grevyi Equus burchelli Equus zebra Equus africanus Equus hemionus n1 n2 n3 n4 n5 n6 n7 n8 n9 n10
## Available edge information: distance, a, psi
## Available vertex information: species
It is noteworthy that contrary to function
PEM.fitSimple()
, function PEM.forcedSimple()
can be used to apply different weighting parameters for different
edges.
PEM scores are the values of target species on the
eigenfunctions underlying the PEM. These scores are
calculated from the graph locations and a PEM object
using function Locations2PEMscores()
as follows:
## $scores
## V_1 V_2 V_3 V_4 V_5
## Dicerorhinus sumatrensis 0.2439632 0.287517 -0.01215849 -2.081668e-17 -0.351012
## V_6 V_7 V_8 V_9
## Dicerorhinus sumatrensis -0.0007381739 -0.005777055 5.085388e-17 1.995388e-16
## V_10 V_11
## Dicerorhinus sumatrensis -3.768614e-17 4.815588e-16
##
## $VarianceFactor
## Dicerorhinus sumatrensis
## y 0.004808351
The function is used internally by PEM-class
predict
method, and therefore need not be called when
performing linear phylogenetic modelling as exemplified above. It comes
in handy when the PEM is used together with other
modelling approaches (e.g. multivariate regression trees, linear
discriminant analysis, artificial neural networks) that have
predict
methods that are not specially adapted for
phylogenetic modelling.
Package MPSEM comes with functions, some implemented in the C language, to simulate quantitative traits evolution by Ornstein-Uhlenbeck process on potentially large phylogenies (Butler and King 2004). These functions are only useful to perform simulations, which is a rather advanced matter outside the scope of the present tutorial. I refer the user to MPSEM help files for further details.
In addition to function Phylo2DirectedGraph()
, which we
have seen previously MPSEM also has built-in graph
manipulation functions to populate a graph with vertices, add and remove
vertices and edges, etc. These functions were mainly intended to be
called internally by MPSEM functions. They were made
visible upon loading the package because of their potential usefulness
to some advanced applications that are outside the scope of the present
tutorial. Again, I refer the user to MPSEM help files
for further details.