This vignette illustrates how to detect ambiguity and inconsistency in a merged taxonomy. Start by loading the 2000 row sample dataset that comes with taxonbridge:
library(taxonbridge)
sample <- load_sample()
dim(sample)
#> [1] 2000 20Next, retrieve all rows that have lineage information in both the GBIF backbone and NCBI:
lineages <- get_lineages(sample)Then validate the lineages by using the kingdom and family taxonomic ranks, and create a list of the resulting tibble(s). Note that phylum, class, and order may also be used. In this example, entries that failed validation are returned by setting valid = FALSE.
kingdom <- get_validity(lineages, rank = "kingdom", valid = FALSE)
#> Term conversion carried out on kingdom taxonomic rank
family <- get_validity(lineages, rank = "family", valid = FALSE)
candidates <- list(kingdom, family)Finally, detect candidate incongruencies (excluding those with uninomial scientific names):
get_inconsistencies(candidates, uninomials = FALSE)
#> [1] "Gordonia neofelifaecis" "Attheya septentrionalis"Two binomial names exhibit incongruency. Upon reference to the literature and the individual entries it can be seen that:
Attheya septentrionalis is assigned to different families of the problematica order Chaetocerotales
Gordonia neofelifaecis is a plant (family: Theaceae) in the GBIF but a bacterium in the NCBI (family: Gordoniaceae)
Attheya septentrionalis has the status “synonym” in the GBIF data:
lineages[lineages$canonicalName=="Attheya septentrionalis", "taxonomicStatus"]
#> # A tibble: 1 × 1
#> taxonomicStatus
#> <chr>
#> 1 synonymApplying the get_status() function and rerunning the exercise leaves only Gordonia neofelifaecis as a binomial incongruency with biological provenance:
lineages <- get_status(get_lineages(sample), status = "accepted")
kingdom <- get_validity(lineages, rank = "kingdom", valid = FALSE)
#> Term conversion carried out on kingdom taxonomic rank
family <- get_validity(lineages, rank = "family", valid = FALSE)
candidates <- list(kingdom, family)
get_inconsistencies(candidates, uninomials = FALSE)
#> [1] "Gordonia neofelifaecis"