Title: Access Brazilian Public Health Data
Version: 0.1.1
Description: Provides easy access to Brazilian public health data from multiple sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases by Telephone Survey), PNS (National Health Survey), SIM (Mortality Information System), SINASC (Live Birth Information System), and other health information systems. Data is downloaded from the Brazilian Ministry of Health VIGITEL repository https://svs.aids.gov.br/download/Vigitel/. Data is returned in tidy format following tidyverse conventions.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.3
Depends: R (≥ 4.2.0)
Imports: tibble, dplyr, readxl, curl, cli, rlang, stringr, janitor, purrr
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown, furrr, future, arrow
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://github.com/SidneyBissoli/healthbR
BugReports: https://github.com/SidneyBissoli/healthbR/issues
NeedsCompilation: no
Packaged: 2026-02-04 02:06:15 UTC; SIDNEY
Author: Sidney Bissoli ORCID iD [aut, cre]
Maintainer: Sidney Bissoli <sbissoli76@gmail.com>
Repository: CRAN
Date/Publication: 2026-02-04 08:20:36 UTC

healthbR: Access Brazilian Public Health Data

Description

Provides easy access to Brazilian public health data from multiple sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases by Telephone Survey), PNS (National Health Survey), SIM (Mortality Information System), SINASC (Live Birth Information System), and other health information systems. Data is downloaded from the Brazilian Ministry of Health VIGITEL repository https://svs.aids.gov.br/download/Vigitel/. Data is returned in tidy format following tidyverse conventions.

Author(s)

Maintainer: Sidney Bissoli sbissoli76@gmail.com (ORCID)

See Also

Useful links:


Check arrow availability and stop with informative message

Description

Check arrow availability and stop with informative message

Usage

check_arrow(feature = "Parquet file support")

Arguments

feature

Character describing what feature requires arrow

Value

NULL (invisibly), stops if arrow not available


Check if arrow package is available

Description

Check if arrow package is available

Usage

has_arrow()

Value

TRUE if arrow is available, FALSE otherwise


List Available Data Sources

Description

Returns information about all data sources available in healthbR.

Usage

list_sources()

Value

A tibble with columns:

Examples

list_sources()

Utility Functions for healthbR

Description

Utility Functions for healthbR


Get VIGITEL base URL

Description

Get VIGITEL base URL

Usage

vigitel_base_url()

Value

Character string with base URL


Get VIGITEL cache directory

Description

Get VIGITEL cache directory

Usage

vigitel_cache_dir(cache_dir = NULL)

Arguments

cache_dir

Optional custom cache directory. If NULL, uses default user cache directory.

Value

Path to cache directory


Get VIGITEL cache status

Description

Shows which years are cached and file sizes.

Usage

vigitel_cache_status(cache_dir = NULL)

Arguments

cache_dir

Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory.

Value

A tibble with cache information

Examples

# check cache status
vigitel_cache_status()

Clear VIGITEL cache

Description

Removes all cached VIGITEL data files (Excel and Parquet).

Usage

vigitel_clear_cache(keep_parquet = FALSE, cache_dir = NULL)

Arguments

keep_parquet

Logical. If TRUE, keep Parquet files and only remove Excel files. Default is FALSE (remove all).

cache_dir

Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory.

Value

NULL (invisibly)

Examples

# remove all cached files from default cache
vigitel_clear_cache()

Convert Excel file to Parquet format

Description

Convert Excel file to Parquet format

Usage

vigitel_convert_to_parquet(year, force = FALSE, cache_dir = NULL)

Arguments

year

Integer year

force

Logical. If TRUE, reconvert even if parquet exists.

cache_dir

Optional custom cache directory

Value

Path to parquet file (invisibly)


Load VIGITEL microdata

Description

Downloads (if necessary) and loads VIGITEL survey microdata into R. Data is automatically converted to Parquet format for faster subsequent loading. The data includes survey weights for proper statistical analysis.

Usage

vigitel_data(
  year,
  vars = NULL,
  force_download = FALSE,
  parallel = TRUE,
  lazy = FALSE,
  cache_dir = NULL
)

Arguments

year

Year(s) of the survey. Can be:

  • Single year: 2023

  • Range: 2021:2023

  • Vector: c(2021, 2023)

  • Character: c("2021", "2023")

  • All years: "all"

vars

Character vector. Variable names to select, or NULL for all variables. Default is NULL.

force_download

Logical. If TRUE, re-download and reconvert data. Default is FALSE.

parallel

Logical. If TRUE, download and process multiple years in parallel. Default is TRUE when multiple years are requested.

lazy

Logical. If TRUE, return an Arrow Dataset for lazy evaluation instead of loading all data into memory. Useful for filtering large datasets before collecting. Use collect() to retrieve results. Default is FALSE.

cache_dir

Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory. Use tempdir() for temporary storage that won't persist.

Details

On first access, data is downloaded from the Ministry of Health and converted to Parquet format. Subsequent loads read directly from the Parquet file, which is significantly faster.

The arrow package is required for Parquet file support. If not installed, an informative error message will be shown with installation instructions.

For parallel downloads, the function uses the furrr and future packages if installed. Install them with install.packages(c("furrr", "future")) to enable parallel processing. The number of workers is automatically set based on available CPU cores. If these packages are not installed, processing falls back to sequential mode.

When lazy = TRUE, the function returns an Arrow Dataset that supports dplyr operations (filter, select, mutate, etc.) without loading data into memory. This is useful for working with large datasets or when you only need a subset of the data. Call collect() to retrieve the results as a tibble.

The VIGITEL survey uses complex sampling weights. For proper statistical analysis, use survey packages like survey or srvyr. The weight variable is named pesorake.

Value

A tibble with the VIGITEL microdata. When multiple years are requested, a year column is added to identify the source year. If lazy = TRUE, returns an Arrow Dataset that can be queried with dplyr verbs before calling collect().

Examples


# single year (uses tempdir to avoid leaving files on system)
df <- vigitel_data(2023, cache_dir = tempdir())

# specific variables
df <- vigitel_data(2023, vars = c("cidade", "sexo", "idade", "pesorake"),
                   cache_dir = tempdir())


Load single year of VIGITEL data

Description

Load single year of VIGITEL data

Usage

vigitel_data_single(
  year,
  vars = NULL,
  force_download = FALSE,
  lazy = FALSE,
  cache_dir = NULL
)

Arguments

year

Integer year

vars

Character vector of variables or NULL

force_download

Logical

lazy

Logical. If TRUE, return Arrow object for lazy evaluation.

cache_dir

Optional custom cache directory

Value

A tibble or Arrow Table (if lazy = TRUE)


Get VIGITEL variable dictionary

Description

Returns the data dictionary with variable descriptions, labels, and coding information for VIGITEL surveys.

Usage

vigitel_dictionary(force_download = FALSE, cache_dir = NULL)

Arguments

force_download

Logical. If TRUE, re-download the dictionary.

cache_dir

Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory. Use tempdir() for temporary storage that won't persist.

Value

A tibble with variable metadata

Examples


# get the dictionary (uses tempdir to avoid leaving files)
dict <- vigitel_dictionary(cache_dir = tempdir())

# view column names
names(dict)


Download VIGITEL microdata for a specific year

Description

Downloads the VIGITEL survey microdata file from the Ministry of Health website. Files are cached locally to avoid repeated downloads.

Usage

vigitel_download(year, force = FALSE, cache_dir = NULL)

Arguments

year

Integer. Year of the survey (use vigitel_years() to see available years).

force

Logical. If TRUE, re-download even if file exists in cache. Default is FALSE.

cache_dir

Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory. Use tempdir() for temporary storage that won't persist.

Value

Path to the downloaded file (invisibly)

Examples


# download 2023 data (uses tempdir to avoid leaving files)
vigitel_download(2023, cache_dir = tempdir())


Download VIGITEL data dictionary

Description

Downloads the official VIGITEL data dictionary from the Ministry of Health.

Usage

vigitel_download_dictionary(force = FALSE, cache_dir = NULL)

Arguments

force

Logical. If TRUE, re-download even if cached.

cache_dir

Optional custom cache directory

Value

Path to the downloaded file (invisibly)


Get path to Excel file for a specific year

Description

Get path to Excel file for a specific year

Usage

vigitel_excel_path(year, cache_dir = NULL)

Arguments

year

Integer year

cache_dir

Optional custom cache directory

Value

Path to excel file


Build VIGITEL file URL for a specific year

Description

Build VIGITEL file URL for a specific year

Usage

vigitel_file_url(year)

Arguments

year

Integer year

Value

Character string with file URL


Get VIGITEL survey information

Description

Returns metadata about the VIGITEL survey.

Usage

vigitel_info()

Value

A list with survey information

Examples

vigitel_info()

Get path to Parquet file for a specific year

Description

Get path to Parquet file for a specific year

Usage

vigitel_parquet_path(year, cache_dir = NULL)

Arguments

year

Integer year

cache_dir

Optional custom cache directory

Value

Path to parquet file


Parse year argument

Description

Converts various year input formats to integer vector.

Usage

vigitel_parse_years(year)

Arguments

year

Year specification (integer, character, vector, or "all")

Value

Integer vector of years


List VIGITEL variables

Description

Returns a character vector of variable names available in a VIGITEL survey year.

Usage

vigitel_variables(year, cache_dir = NULL)

Arguments

year

Integer. Year of the survey.

cache_dir

Character. Optional custom cache directory. If NULL (default), uses the standard user cache directory. Use tempdir() for temporary storage that won't persist.

Value

A character vector of variable names

Examples


# list variables for 2023 (uses tempdir to avoid leaving files)
vigitel_variables(2023, cache_dir = tempdir())


List available VIGITEL survey years

Description

Returns a vector of years for which VIGITEL microdata is available for download from the Ministry of Health website.

Usage

vigitel_years()

Value

An integer vector of available years

Examples

vigitel_years()