library(paleoDiv)
#> Loading required package: ape
#> Loading required package: stringr
We can start the analysis by downloading some occurrence data from the Paleobiology Database, in this case for Stegosaurs:
pdb("Stegosauria")->Stegosauria
The resulting occurrence dataset should look somewhat like this (but with many more rows):
::kable(head(Stegosauria)) knitr
occurrence_no | record_type | reid_no | flags | collection_no | identified_name | identified_rank | identified_no | difference | accepted_name | accepted_rank | accepted_no | early_interval | late_interval | max_ma | min_ma | reference_no | hltax | tna | lag | eag | genus |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
149599 | occ | NA | 13214 | Stegosaurus sp. | genus | 38814 | Stegosaurus | genus | 38814 | Kimmeridgian | Tithonian | 157.3 | 145.0 | 4387 | Stegosauria | Stegosaurus sp. | 145.0 | 157.3 | Stegosaurus | ||
219973 | occ | 12664 | 22648 | Stegosaurus stenops | species | 56491 | Stegosaurus stenops | species | 56491 | Kimmeridgian | Tithonian | 157.3 | 145.0 | 58803 | Stegosauria | Stegosaurus stenops | 145.0 | 157.3 | Stegosaurus | ||
220057 | occ | NA | 22678 | Stegosaurus sp. | genus | 38814 | Stegosaurus | genus | 38814 | Late Kimmeridgian | Tithonian | 155.7 | 145.0 | 5760 | Stegosauria | Stegosaurus sp. | 145.0 | 155.7 | Stegosaurus | ||
220164 | occ | NA | 22711 | Stegosaurus sp. | genus | 38814 | Stegosaurus | genus | 38814 | Tithonian | 152.1 | 145.0 | 15179 | Stegosauria | Stegosaurus sp. | 145.0 | 152.1 | Stegosaurus | |||
225585 | occ | NA | 21852 | Stegosaurus sp. | genus | 38814 | Stegosaurus | genus | 38814 | Early Tithonian | 150.8 | 145.0 | 15567 | Stegosauria | Stegosaurus sp. | 145.0 | 150.8 | Stegosaurus | |||
260839 | occ | 23241 | 25221 | Stegosauria ? indet. | unranked clade | 147919 | Stegosauria | unranked clade | 38798 | Late Albian | Early Cenomanian | 105.3 | 93.5 | 12528 | Stegosauria | Stegosauria ? indet. | 93.5 | 105.3 | Stegosauria |
The columns that are of primary interest here are occurrence_no, tna (which at this point is simply a copy of identified_name), eag and lag (which contain the maximum and minimum ages of each occurrence).
Since the identified_name/tna column at this point will contain a fair deal of common character combinations leading to an overinflation of unique values (e.g. cf., gen. nov., sp. nov., extra spaces etc), it is recommended to run the following:
occ.cleanup(Stegosauria)->Stegosauria$tna
#> 94 factor levels reduced down to 71
This replaces the tna-column with a filtered version, with such common character strings removed. The function also prints a message giving information about the reduction in unique factor levels of tna. In this case 94 are reduced to 71. This reduces the number of duplicate taxa in the dataset, and thus will help provide more accurate absolute diversity estimates down the line. However, if complete taxonomic accuracy is the goal, then manually checking the dataset (e.g. after the next step, see below) may become necessary.
If we are interested in analyzing the pattern of paleobiodiversity, our next step in the paleoDiv workflow is to build a taxon-range table, which is simply a data.frame() containing the minimum and maximum ages for each unique factor value in tna (or, if manually modified, any other column or vector). We do this using the mk.sptab() function.
mk.sptab(Stegosauria)->sptab_Stegosauria
By default, mk.sptab() takes an occurrence dataset as input and creates a range table containing one row for each unique factor level in the column tna. However, these settings are freely modifiable, making the function applicable to any columns in a data.frame() or even individual vectors containing the maximum and minimum ages and the taxon names (or other category).
Our generated taxon-range table should now look like this:
tna | max | min | ma |
---|---|---|---|
Adratiklit boulahfa | 167.7 | 163.5 | 165.6 |
Alcovasaurus longispinus | 157.3 | 152.1 | 154.7 |
Amargastegos brevicollus | 129.4 | 125.0 | 127.2 |
Bashanosaurus primitivus | 168.3 | 163.5 | 165.9 |
Changdusaurus laminaplacodus | 174.1 | 163.5 | 168.8 |
Chialingosaurus kuani | 168.3 | 163.5 | 165.9 |
For convenience, the function pdb.autodiv() combines all of these steps into a single function, and can also be used with a vector of several taxon names as input:
pdb.autodiv(c("Stegosauria","Ankylosauria"))->Thyreophora
This represents the most convenient method of quickly downloading occurrence data and constructing taxon-range tables for multiple clades in the paleoDiv package. The output of pdb.autodiv is a list() object containing multiple data.frames(): one for each occurrence dataset downloaded from the Paleobiology database, and one for each taxon’s taxon-range table, with the prefix “sptab_”.
Having already generated a taxon-range table, we can now use it to estimate and plot diversity using the function divdistr_():
divdistr_(150,table=sptab_Stegosauria)
#> [1] 30
This tells us that at there are 30 species (or other lowest identified taxonomic levels) in our taxon-range table whose stratigraphic ranges include 150 ma. This function can be applied to an entire vector of geological ages, if we are interested in how diversity changed over time:
divdistr_(c(170:120),table=sptab_Stegosauria)
#> [1] 8 8 16 18 19 19 24 19 19 15 15 15 15 28 28 34 34 34 35 35 30 30 30 30 30
#> [26] 31 10 10 10 10 11 13 13 13 11 11 11 11 12 12 12 12 12 12 12 14 10 10 9 9
#> [51] 9
And it can also be used as a function for plotting these data, e.g. using curves():
curve(divdistr_(x,sptab_Stegosauria), xlim=c(200,100),ylim=c(-5,35), lwd=2,xlab="Age [ma]",ylab="Species Diversity Estimate")
#to add a geological timescale, we can use ts.stages() and ts.periods():
ts.stages(ylim=c(-6,-1),alpha=0.3,border=add.alpha("grey",0.3))
ts.periods(ylim=c(-6,-1),alpha=0.0)
…or as a spindle diagram, using the viol()-function provided with this package, by providing divdistr_ for its stat parameter, overriding its default binding to the density() function:
viol(c(100:200), pos=0, stat=divdistr_, table=sptab_Stegosauria, lim=c(200,100),add=F,ylab="Species Diversity Estimate",xlab="Age [ma]")
We can also plot abundance on the same graph using the abdistr_() function (which works much like divdistr_(), but by default uses occurrence datasets instead of taxa), e.g. as follows:
curve(divdistr_(x,sptab_Stegosauria), xlim=c(200,100),ylim=c(-5,35), lwd=2,xlab="Age [ma]",ylab="Species Diversity Estimate")
#to add a geological timescale, we can use ts.stages() and ts.periods():
ts.stages(ylim=c(-6,-1),alpha=0.3,border=add.alpha("grey",0.3))
ts.periods(ylim=c(-6,-1),alpha=0.0)
curve(abdistr_(x,Stegosauria,ab.val=1)/4, xlim=c(200,100),ylim=c(-5,35),col="red", lwd=2,lty=2,add=T)
axis(4,at=seq(0,30,5), lab=seq(0,30,5)*4, col.axis="red")
Note that since our dataset does not have a collumn with abundance values for each occurrence, we need to specify ab.val=1 here to treat each occurrence as having the same value. A dataset downloaded using pdb(…,full=T) would have a collumn named abundance_value, which should be specified here instead if available (this is already the standard setting).
If we are interested in how collection intensity/the number of collections compares to taxonomic diversity or abundance, we can use pdb() to download collections data instead of occurrence data:
pdb("Stegosauria", what="colls")->Stegosauria_colls
curve(divdistr_(x,sptab_Stegosauria), xlim=c(200,100),ylim=c(-5,35), lwd=2,xlab="Age [ma]",ylab="Species Diversity Estimate")
#to add a geological timescale, we can use ts.stages() and ts.periods():
ts.stages(ylim=c(-6,-1),alpha=0.3,border=add.alpha("grey",0.3))
ts.periods(ylim=c(-6,-1),alpha=0.0)
#now plot the number of collections alongside the diversity curve
curve(abdistr_(x,Stegosauria_colls,ab.val=1)/4, xlim=c(200,100),ylim=c(-5,35),col="blue", lwd=2,lty=2,add=T)
axis(4,at=seq(0,30,5), lab=seq(0,30,5)*4, col.axis="blue")
For the next examples, a calibrated phylogeny of Archosauria, a matrix used for its calibration and a list() object containing multiple taxon-range tables are provided with the package as example data:
data(archosauria)
data(tree_archosauria)
data(ages_archosauria)
We can plot the phylogeny using ape:
::plot.phylo(tree_archosauria)
apets.stages(tree_archosauria,alpha=0.8)
ts.periods(tree_archosauria, names=T, alpha=0)
One of the key functions of paleoDiv is phylo.spindles(), which is optimized for plotting diversity as a spindle-diagram relative to phylogeny:
phylo.spindles(phylo0=tree_archosauria,occ=archosauria,col=add.alpha(ggcol(13)),ages=ages_archosauria,txt.y=.5, dscale=0.005,xlim=c(260,0),axis=F,tbmar=c(1.5,.5),txt.x=c(66,ages_archosauria[2:12,"LAD"],66))
#add a timescale
ts.stages(tree_archosauria, ylim=c(-1,0),alpha=0.8)
ts.periods(tree_archosauria, names=T, ylim=c(-1,0),alpha=0)
#add an x axis with custom tick positions:
axis(1,at=tsconv(seq(300,-50,-50),tree_archosauria), lab=seq(300,-50,-50), cex=0.75,col="grey30",col.lab="grey30")
#add a short y axis serving as a scale bar
axis(2, at=c(500,1000)*0.005, lab=c(0,500))
#add vertical lines marking major mass extinctions.
abline(v=tsconv(c(252,201.3,66),tree_archosauria),lwd=2, col=add.alpha("black",0.3))
mtext(side=3, cex=1.2,"paleoDiv::divdistr_()",col="darkgrey")
#we can manually add spindles anywhere using viol(), e.g. to plot the diversity of important subtaxa, in this case that of birds:
viol(x=c(260:0),stat=divdistr_, pos=13, table=convert.sptab(archosauria$sptab_Aves,tree_archosauria),dscale=0.005, cutoff=tsconv(c(165,0),tree_archosauria),fill=add.alpha("grey40"), col=add.alpha("grey40"))
phylo0 The phylogeny (or taxa) to be plotted. This should be time-calibrated (e.g. using strap::datePhylo())
occ The dataset to be used for plotting. This has to either be a list() containing data.frames to be used as an argument for the function divdistr_() (or any other function it is overridden with) as parameter “table”, or a data.frame() or matrix() containing an x column and columns matching the names of the phylogenetic tree’s tip.labels. By default, data.frames in occ are looked for using taxon names (from phylo0) with the prefix “sptab_”. If another prefix is desired, this can be changed using the parameter named prefix (see documentation).
ages Optional data matrix used for cropping each spindle to the known stratigraphic range of the taxon. If provided, this should take the form of a matrix with row names being the same as the trees tip.labels, and two columns named “FAD” and “LAD” giving minimum and maximum geological ages for each taxon.
dscale Vertical scaling parameter for the spindles. May need manual adjustment, depending on the desired scale.
txt.x Either a single number of a vector of the same length as the number of tree tips giving the horizontal position for labels (if labels==TRUE). In this case, the second column of ages is used, placing the labels at the end of the spindles, but the first and last values are replaced by 66 in order to place labels within the plot boundaries
For further details, see ?phylo.spindles
Note that ape::plot.phylo(), which is used by this function for generating the tree, plots trees to a time coordinate counting forward from the root age. As a result, we need to convert all geological ages to this coordinate system by subtracting them from the root age of the phylogeny (facilitated by the tsconv()-function which takes ages and a tree as input).
By replacing occ with a data.frame() or matrix(), we can also use phylo.spindles() to plot a variety of other time series data, such as morphological disparity, abundance, number of occurrences, diversity estimates made using different methods and/or packages, etc. As an example, the object diversity_table contains a time series of by-stage diversity estimates for the same taxa made using divDyn::divDyn (in this case the column divRT):
data(diversity_table)
phylo.spindles(tree_archosauria,occ=diversity_table,ages=ages_archosauria,txt.y=.5,xlim=c(260,0),axis=F,dscale=0.01,col=add.alpha(ggcol(13)),tbmar=c(1.5,0.5),txt.x=c(66,ages_archosauria[2:12,"LAD"],66))
#add a timescale
ts.stages(tree_archosauria, ylim=c(-1,0),alpha=0.8)
ts.periods(tree_archosauria, names=T, ylim=c(-1,0),alpha=0)
#add an x axis with custom tick positions:
axis(1,at=tsconv(seq(300,-50,-50),tree_archosauria), lab=seq(300,-50,-50), cex=0.75,col="grey30",col.lab="grey30")
#add a short y axis serving as a scale bar
axis(2, at=c(500,1000)*0.01, lab=c(0,500))
#add vertical lines marking major mass extinctions.
abline(v=tsconv(c(252,201.3,66),tree_archosauria),lwd=2, col=add.alpha("black",0.3))
mtext(side=3, cex=1.2,"divDyn::divDyn()$divRT",col="darkgrey")
If desired, phylo.spindles can also be used to plot spindle diagrams without a phylogeny by simply specifying a list of taxa (in occ):
phylo.spindles(phylo0=tree_archosauria$tip.label,occ=archosauria,col=add.alpha(ggcol(13)),ages=ages_archosauria,txt.y=.5, dscale=0.005,xlim=c(260,0),axis=F,tbmar=c(1.5,.5),txt.x=c(66,ages_archosauria[2:12,"LAD"],66))
#add a timescale
ts.stages(ylim=c(-1,0),alpha=0.8)
ts.periods(names=T, ylim=c(-1,0),alpha=0)
#add an x axis with custom tick positions:
axis(1,at=seq(300,-50,-50), cex=0.75,col="grey30",col.lab="grey30")
#add a short y axis serving as a scale bar
axis(2, at=c(500,1000)*0.005, lab=c(0,500))
#add vertical lines marking major mass extinctions.
abline(v=c(252,201.3,66),lwd=2, col=add.alpha("black",0.3))
mtext(side=3, cex=1.2,"Phylo.spindles without Phylogeny",col="darkgrey")
Note how in this case, we are plotting to normal age values, so there is no need to convert ages to the root.time-coordinate system used by ape::plot.phylo
Similarly, we can add phylo.spindles to an already-existing phylogeny (or any other plot) by specifying add=TRUE:
::plot.phylo(tree_archosauria)
ape
phylo.spindles(phylo0=tree_archosauria,occ=archosauria,col=add.alpha(ggcol(13)),ages=ages_archosauria,txt.y=.5, dscale=0.005,xlim=c(260,0),axis=F,tbmar=c(1.5,.5),txt.x=c(66,ages_archosauria[2:12,"LAD"],66),add=T)