Taxonomic filtering

Callum Waite & Shandiya Balasubramaniam

2023-10-13

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed.

library(galah)
library(dplyr)
galah_config(email = "your_email_here", verbose = FALSE)

search_taxa()

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

search_taxa("Petroica boodang") |> gt::gt()
search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus species vernacular_name issues
Petroica boodang Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica boodang Scarlet Robin noIssue
# Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
# Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
search_taxa("Muscicapa chrysoptera", "Guniibuu") |> gt::gt()

search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus species vernacular_name issues
Muscicapa chrysoptera Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica phoenicea Flame Robin noIssue

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks in a tibble. This example differentiates among the genus Morganella in three kingdoms:

search_taxa("Morganella") |> gt::gt()
## Warning: Search returned multiple taxa due to a homonym issue.
## ℹ Please provide another rank in your search to clarify taxa.
## ℹ Use a `tibble` to clarify taxa, see `?search_taxa`.
## ✖ Homonym issue with "Morganella".
search_term issues
Morganella homonym
search_taxa(tibble(kingdom = "Fungi", genus = "Morganella")) |> gt::gt()

search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus issues
Fungi_Morganella Morganella Zeller https://id.biodiversity.org.au/node/fungi/60091999 genus exactMatch Fungi Basidiomycota Agaricomycetes Agaricales Agaricaceae Morganella noIssue

identify()

identify() is similar to search_taxa(), except that it can be used within a piped workflow to retrieve counts, species, or records e.g.

galah_call() |>
  identify("Petroica boodang") |>
  count() |>
  collect()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 119417
galah_call(type = "species") |>
  identify("Muscicapa chrysoptera", "Guniibuu") |>
  collect() |> 
  gt::gt()
taxon_concept_id species_name scientific_name_authorship taxon_rank kingdom phylum class order family genus vernacular_name
https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) species Animalia Chordata Aves Passeriformes Petroicidae Petroica Red-capped Robin
https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 Petroica (Littlera) phoenicea Gould, 1837 species Animalia Chordata Aves Passeriformes Petroicidae Petroica Flame Robin
galah_call() |>
  identify(tibble(kingdom = "Fungi", genus = "Morganella")) |>
  collect() |>
  head() |> 
  gt::gt()
recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus dataResourceName
001ec30d-3376-4f63-ba32-b48bc3dd137d Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 -33.66218 150.2708 2021-04-10 PRESENT NSW BioNet Atlas
02ba39cd-8077-4868-a0ff-e50765089788 Morganella compacta NZOR-6-128055 -41.28082 174.7943 NA PRESENT New Zealand Virtual Herbarium
0422009d-c1f0-4e2e-8d29-e88df6de2049 Morganella compacta NZOR-6-128055 -36.44838 174.6714 NA PRESENT New Zealand Virtual Herbarium
04aeeb8d-0538-477c-aff1-29574eafa349 Morganella compacta NZOR-6-128055 -36.84225 174.4695 1993-06-19 PRESENT New Zealand Virtual Herbarium
092b6f3e-ef27-4cbf-bb28-b172c1b200c5 Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 -26.77940 152.8803 2009-02-19 PRESENT National Herbarium of Victoria (MEL) AVH data
0be95d59-29a4-4475-8586-a497cac607f7 Morganella compacta NZOR-6-128055 -43.15002 171.7305 NA PRESENT New Zealand Virtual Herbarium

filter()

filter() subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering e.g.

galah_call() |>
  filter(species == "Petroica boodang") |>
  count() |>
  collect()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 119417

Alternatively, we could use filter() after first checking taxonomy with search_taxa(), in place of identify():

robins <- search_taxa("Muscicapa chrysoptera", "Guniibuu") 

galah_call() |>
  filter(taxonConceptID == robins$taxon_concept_id) |>
  count() |>
  collect()
## # A tibble: 1 × 1
##   count
##   <int>
## 1 81570

It is also possible to specify several species at once using filter():

aus_petroica <- c("Petroica boodang", "Petroica goodenovii", 
                  "Petroica phoenicea", "Petroica rosea",
                  "Petroica rodinogaster", "Petroica multicolor")

galah_call() |>
  filter(species %in% aus_petroica) |>
  group_by(species, vernacularName) |>
  count() |> 
  collect() |>
  gt::gt()
species vernacularName count
Petroica boodang Scarlet Robin 115735
Petroica boodang Eastern Scarlet Robin 3496
Petroica boodang South-western Scarlet Robin 160
Petroica boodang Tasmanian Scarlet Robin 26
Petroica goodenovii Red-capped Robin 110284
Petroica phoenicea Flame Robin 81570
Petroica rosea Rose Robin 52414
Petroica rodinogaster Pink Robin 13417
Petroica rodinogaster Mainland Pink Robin 60
Petroica rodinogaster Tasmanian Pink Robin 14
Petroica multicolor Pacific Robin 6699

This can be useful in searching for paraphyletic or polyphyletic groups, which cannot be done using identify(). For example, to get counts of non-chordates:

galah_call() |>
  filter(kingdom == "Animalia", phylum != "Chordata") |>
  group_by(phylum) |>
  count() |>
  collect() |>
  head() |> 
  gt::gt()
phylum count
Arthropoda 8567585
Mollusca 1378754
Annelida 316619
Cnidaria 273227
Echinodermata 196016
Porifera 130249

filter(), identify(), and taxonomic ranks

Deciding between using filter() and identify() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with galah_filter() and galah_identify(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

galah_call() |>
  identify(tibble(genus = "Pitta")) |>
  group_by(scientificName, taxonRank) |>
  count() |>
  collect() |>
  filter(!is.na(scientificName)) |>
  gt::gt()
scientificName taxonRank count
Pitta (Pitta) versicolor species 26085
Pitta (Pitta) iris species 5722
Pitta (Erythropitta) subgenus 728
Pitta (Pitta) versicolor versicolor subspecies 310
Pitta (Erythropitta) erythrogaster species 190
Pitta (Pitta) iris iris subspecies 76
Pitta genus 74
Pitta (Pitta) versicolor intermedia subspecies 42
Pitta (Pitta) versicolor simillima subspecies 38
Pitta (Pitta) iris johnstoneiana subspecies 27
Pitta (Erythropitta) erythrogaster digglesi subspecies 21

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa() and identify() rather than filter(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in a piped workflow with identify().

tas_endemic <- c("Sarcophilus", # Tasmanian Devil
                 "Bettongia gaimardi", # Tasmanian Bettong
                 "Melanodryas vittata", # Dusky Robin
                 "Platycercus caledonicus",# Green Rosella
                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
                 "Tyto novaehollandiae castanops") # Tasmanian Masked Owl

search_taxa(tas_endemic) |> gt::gt()
search_term scientific_name scientific_name_authorship taxon_concept_id rank match_type kingdom phylum class order family genus species vernacular_name issues
Sarcophilus Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/06455b77-7d50-4ec7-9122-8ab48cfb0c1c genus exactMatch Animalia Chordata Mammalia Dasyuromorphia Dasyuridae Sarcophilus NA NA noIssue
Bettongia gaimardi Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11 species exactMatch Animalia Chordata Mammalia Diprotodontia Potoroidae Bettongia Bettongia gaimardi Tasmanian Bettong noIssue
Melanodryas vittata Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Melanodryas Melanodryas vittata Dusky Robin noIssue
Platycercus caledonicus Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species exactMatch Animalia Chordata Aves Psittaciformes Psittacidae Platycercus Platycercus caledonicus Green Rosella noIssue
Aquila audax fleayi Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies exactMatch Animalia Chordata Aves Accipitriformes Accipitridae Aquila Aquila audax Tasmanian Wedge-tailed Eagle noIssue
Tyto novaehollandiae castanops Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies exactMatch Animalia Chordata Aves Strigiformes Tytonidae Tyto Tyto novaehollandiae Tasmanian Masked Owl noIssue
galah_call() |>
  identify(tas_endemic) |>
  group_by(scientificName) |>
  count() |>
  collect() |>
  arrange(scientificName) |>
  gt::gt()
scientificName count
Aquila (Uroaetus) audax fleayi 4935
Bettongia gaimardi 1941
Bettongia gaimardi cuniculus 41
Bettongia gaimardi gaimardi 9
Melanodryas (Amaurodryas) vittata 14131
Melanodryas (Amaurodryas) vittata kingi 15
Melanodryas (Amaurodryas) vittata vittata 39
Platycercus (Platycercus) caledonicus 43463
Platycercus (Platycercus) caledonicus brownii 24
Platycercus (Platycercus) caledonicus caledonicus 33
Sarcophilus 3
Sarcophilus harrisii 36302
Tyto novaehollandiae castanops 63