Version 1.4.0 brings in three major changes in how galah works:
galah_callBelow we discuss each of these changes in turn. Please note, however, that these changes are by no means set in stone - it is absolutely possible to change syntax in future versions of galah if alternative names are easier to use and understand. We would appreciate any feedback from users about what works or what doesn’t work. It is our goal to create a package that is as easy and intuitive for users as possible!
dplyrgalah_ functions now evaluate arguments just like dplyr. To see what we mean, let’s look at an example of how dplyr::filter() works. Notice how dplyr::filter and galah_filter both require logical arguments to be added by using the == sign:
library(dplyr)
mtcars %>%
filter(mpg == 21)## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
galah_call() %>%
galah_filter(year == 2021) %>%
atlas_counts()## # A tibble: 1 × 1
## count
## <int>
## 1 1161557
As another example, notice how galah_group_by() + atlas_counts() works very similarly to dplyr::group_by() + dplyr::count():
mtcars %>%
group_by(vs) %>%
count()## # A tibble: 2 × 2
## # Groups: vs [2]
## vs n
## <dbl> <int>
## 1 0 18
## 2 1 14
galah_call() %>%
galah_group_by(biome) %>%
atlas_counts()## # A tibble: 2 × 2
## biome count
## <chr> <int>
## 1 TERRESTRIAL 93590939
## 2 MARINE 3519480
We made this move towards tidy evaluation to make it possible to use piping for building queries to the Atlas of Living Australia. In practice, this means that data queries can be filtered just like how you might filter a data.frame with the tidyverse suite of functions.
Prior to version 1.4.0, galah naming conventions had two major problems:
ala, but actually could query many other living atlasesselect, but this is reserved in dplyr (and elsewhere) for operations specifically on columnsTo address these concerns (and other smaller points), we have completed a rewrite of our function names to increase clarity (see table below). Deprecated function names will now return a warning message when used, suggesting to users that they switch to the new syntax.
| galah 1.3.1 and earlier | galah 1.4.0 |
|---|---|
| galah_call | |
| select_taxa | galah_identify |
| select_filters | galah_filter |
| select_columns | galah_select |
| select_locations | galah_geolocate |
| galah_group_by | |
| galah_down_to | |
| ala_counts | atlas_counts |
| ala_occurrences | atlas_occurrences |
| ala_species | atlas_species |
| ala_media | atlas_media |
| ala_taxonomy | atlas_taxonomy |
| ala_citation | atlas_citation |
| select_taxa | search_taxa, search_identifiers |
| search_fields | search_fields |
| show_all_fields | |
| find_profiles | show_all_profiles |
| find_ranks | show_all_ranks |
| find_atlases | show_all_atlases |
| find_reasons | show_all_reasons |
| find_cached_files | show_all_cached_files |
| find_field_values | search_field_values |
| find_profile_attributes | search_profile_attributes |
galah_call()Perhaps the largest change from galah 1.4.0 is the implementation of piping using galah_call().
Beginning a query with galah_call() (be sure to add the parentheses!) tells galah that you will be using pipes to construct your query. Follow this with your preferred pipe (|> from base or %>% from magrittr). You can then narrow your query line-by-line using galah_ functions. Finally, end with an atlas_ function to identify what type of data you want from your query.
Unlike old function names, which will be removed from future versions, we do intend to continue supported un-piped syntax in future, although piping only works with revamped function names. If you’re new to piping, here’s a comparison against code from previous versions of galah.
Previously, if you wanted to look up the number of records of each bandicoot species every year from 2010 to 2021, you’d have had to do something like this:
library(purrr)
library(dplyr)
taxa <- ala_species(taxa = select_taxa("perameles"))$species
years <- select_filters(year = seq(2010:2021))
taxa %>%
map_dfr( ~ ala_counts(
taxa = select_taxa(list(species = .x)),
filters = years,
group_by = "year")Not very easy because you had to use multiple atlas_ functions and you had to use loops. However, now with piping you can do it like this:
galah_call() %>%
galah_identify("perameles") %>%
galah_filter(year > 2010) %>%
galah_group_by(species, year) %>%
atlas_counts()And a second example, if you wanted to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates, previously you would have had to do this:
atlas_occurrences(taxa = select_taxa("perameles"),
filters = select_filters(year = 2021),
columns = select_columns(group = "basic", "ZERO_COORDINATE"))Now with piping:
galah_call() %>%
galah_identify("perameles") %>%
galah_filter(year == 2021) %>%
galah_select(group = "basic", ZERO_COORDINATE) %>%
atlas_occurrences()