| Type: | Package |
| Title: | Fast Access to Brazilian Public Health Data from 'DATASUS' |
| Version: | 0.1.0 |
| Description: | Provides fast, in-memory reading of 'DATASUS' 'DBC' files using native 'C' code, along with a catalog of public health data sources, 'FTP' file discovery, caching downloads, and a high-level datasus_fetch() function that lists, downloads, and reads files in a single call. Bundles the 'blast' decompressor from 'zlib' contrib/blast to decode 'PKWare DCL' compressed 'DBC' files and parses 'DBF' records directly for efficient import into tibbles. See the 'DATASUS' file transfer site https://datasus.saude.gov.br and Adler (2003) https://github.com/madler/zlib/tree/master/contrib/blast for details on the underlying data and compression format. |
| License: | MIT + file LICENSE |
| URL: | https://strategicprojects.github.io/datasusr/, https://github.com/StrategicProjects/datasusr |
| BugReports: | https://github.com/StrategicProjects/datasusr/issues |
| Encoding: | UTF-8 |
| Language: | en-US |
| Depends: | R (≥ 4.1.0) |
| Imports: | cli, curl (≥ 5.0.0), dplyr (≥ 1.1.0), purrr, rlang (≥ 1.0.0), stringr, tibble, tidyr |
| Suggests: | knitr, pkgdown, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | yes |
| SystemRequirements: | C99 |
| Packaged: | 2026-04-29 18:58:43 UTC; leite |
| Author: | Andre Leite [aut, cre], Marcos Wasilew [aut], Hugo Vasconcelos [aut], Carlos Amorin [aut], Diogo Bezerra [aut], Mark Adler [ctb, cph] (Author of bundled blast.c and blast.h from zlib contrib/blast) |
| Maintainer: | Andre Leite <leite@castlab.org> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-04 11:50:02 UTC |
datasusr: Fast Access to Brazilian Public Health Data from 'DATASUS'
Description
Provides fast, in-memory reading of 'DATASUS' 'DBC' files using native 'C' code, along with a catalog of public health data sources, 'FTP' file discovery, caching downloads, and a high-level datasus_fetch() function that lists, downloads, and reads files in a single call. Bundles the 'blast' decompressor from 'zlib' contrib/blast to decode 'PKWare DCL' compressed 'DBC' files and parses 'DBF' records directly for efficient import into tibbles. See the 'DATASUS' file transfer site https://datasus.saude.gov.br and Adler (2003) https://github.com/madler/zlib/tree/master/contrib/blast for details on the underlying data and compression format.
Author(s)
Maintainer: Andre Leite leite@castlab.org
Authors:
Marcos Wasilew marcos.wasilew@gmail.com
Hugo Vasconcelos hugo.vasconcelos@ufpe.br
Carlos Amorin carlos.agaf@ufpe.br
Diogo Bezerra diogo.bezerra@ufpe.br
Other contributors:
Mark Adler (Author of bundled blast.c and blast.h from zlib contrib/blast) [contributor, copyright holder]
See Also
Useful links:
Report bugs at https://github.com/StrategicProjects/datasusr/issues
Internal FTP paths for territorial and documentation resources
Description
Internal FTP paths for territorial and documentation resources
Usage
.datasus_extra_paths()
Value
A named list of FTP URLs.
Internal DATASUS file type catalog
Description
Internal DATASUS file type catalog
Usage
.datasus_file_types_tbl()
Value
A tibble.
Internal DATASUS modality catalog
Description
Internal DATASUS modality catalog
Usage
.datasus_modalities_tbl()
Value
A tibble.
Internal path templates for DATASUS FTP
Description
Internal path templates for DATASUS FTP
Usage
.datasus_path_templates_tbl()
Value
A tibble.
Internal DATASUS program catalog
Description
Internal DATASUS program catalog
Usage
.datasus_programs_tbl()
Value
A tibble.
Internal DATASUS source catalog
Description
Internal DATASUS source catalog
Usage
.datasus_sources_tbl()
Value
A tibble.
Build DATASUS FTP paths from filters
Description
Builds candidate FTP directories for a given source and file type. For SIH/SIA, the function selects historical vs. current trees by year/month. For SIM and SINASC, preliminary trees are included when requested.
Usage
datasus_build_path(
source,
file_type,
year = NULL,
month = NULL,
include_prelim = TRUE
)
Arguments
source |
Source code (e.g. |
file_type |
File type code (e.g. |
year |
Integer vector of years. |
month |
Integer vector of months. |
include_prelim |
Logical. Include preliminary trees when available
(default |
Value
A tibble with columns source, file_type, period, and path.
Examples
datasus_build_path(source = "SIHSUS", file_type = "RD", year = 2024, month = 1)
Clear cached DATASUS files
Description
Removes files from the cache directory. By default all cached files are
removed; pass a character vector to files to remove specific paths.
Usage
datasus_cache_clear(cache_dir = NULL, files = NULL, verbose = TRUE)
Arguments
cache_dir |
Optional cache directory. |
files |
Optional character vector of file paths to remove. When |
verbose |
Logical. Emit progress messages (default |
Value
A tibble with columns path and removed.
Resolve the datasusr cache directory
Description
Returns the active cache directory path. Checks, in order: the cache_dir
argument, the DATASUSR_CACHE_DIR environment variable, the
datasusr.cache_dir option, and finally a session-scoped subdirectory of
tempdir(). To opt in to a persistent cache across sessions, set the
DATASUSR_CACHE_DIR environment variable, the datasusr.cache_dir option,
or pass cache_dir explicitly (for example,
tools::R_user_dir("datasusr", "cache")).
Usage
datasus_cache_dir(cache_dir = NULL)
Arguments
cache_dir |
Optional cache directory supplied by the caller. |
Value
A single path string.
Examples
datasus_cache_dir()
Summarise the datasusr cache
Description
Returns a one-row tibble summarising the current cache: directory path, file count, total size, and modification time range.
Usage
datasus_cache_info(cache_dir = NULL, verbose = TRUE)
Arguments
cache_dir |
Optional cache directory. |
verbose |
Logical. Print a summary to the console (default |
Value
A tibble with one row.
Examples
datasus_cache_info()
List cached DATASUS files
Description
Returns a tibble describing every file currently stored in the cache directory.
Usage
datasus_cache_list(cache_dir = NULL)
Arguments
cache_dir |
Optional cache directory. |
Value
A tibble with columns path, file_name, size_bytes,
modified_time, and accessed_time.
Examples
datasus_cache_list()
Prune the datasusr cache
Description
Selectively removes cached files based on age and/or total size. Older and least-recently-accessed files are removed first.
Usage
datasus_cache_prune(
cache_dir = NULL,
max_size_bytes = NULL,
older_than_days = NULL,
verbose = TRUE
)
Arguments
cache_dir |
Optional cache directory. |
max_size_bytes |
Maximum total cache size in bytes. When exceeded, the least-recently-accessed files are removed until the cache fits. |
older_than_days |
Age threshold in days. Files with a modification time older than this are removed. |
verbose |
Logical. Emit progress messages (default |
Value
A tibble with columns path, removed, and reason.
Get FTP URLs for DATASUS documentation
Description
Returns the FTP URLs where documentation files (layouts, data dictionaries)
for each source system can be found. Use datasus_ftp_ls() to list the
actual files at each URL.
Usage
datasus_docs_url(source = NULL)
Arguments
source |
Optional source code to filter (e.g. |
Value
A tibble with columns source and docs_url.
Examples
datasus_docs_url()
datasus_docs_url("CNES")
tryCatch({
# List documentation files for CNES (network required).
docs <- datasus_docs_url("CNES")
datasus_ftp_ls(docs$docs_url[[1]], verbose = FALSE)
}, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
Download DATASUS files
Description
Downloads one or many DATASUS files. When use_cache = TRUE, files that
already exist in the cache directory are reused instead of re-downloaded.
Usage
datasus_download(
files = NULL,
...,
dest_dir = NULL,
overwrite = FALSE,
timeout = 240,
use_cache = TRUE,
cache_dir = NULL,
refresh = FALSE,
verbose = TRUE
)
Arguments
files |
A tibble returned by |
... |
Filters passed to |
dest_dir |
Optional destination directory. When |
overwrite |
Logical. Overwrite existing files (default |
timeout |
Timeout in seconds for each download (default 240). |
use_cache |
Logical. Store and reuse downloads in the cache directory
(default |
cache_dir |
Optional cache directory. |
refresh |
Logical. Force re-download even when a cached file exists
(default |
verbose |
Logical. Emit progress messages (default |
Value
A tibble with a local_file column containing the paths to the
downloaded files, plus a downloaded flag.
Examples
tryCatch({
files <- datasus_list_files(
source = "SIHSUS", file_type = "RD",
year = 2024, month = 1, uf = "AC",
verbose = FALSE
)
downloads <- datasus_download(
files,
cache_dir = tempdir(),
verbose = FALSE
)
}, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
Fetch DATASUS data in one step
Description
A convenience wrapper that lists, downloads, and reads DATASUS files in a single call. Particularly useful for interactive / exploratory work.
Usage
datasus_fetch(
source,
file_type,
year = NULL,
month = NULL,
uf = NULL,
...,
bind = TRUE,
include_prelim = TRUE,
timeout = 240,
use_cache = TRUE,
cache_dir = NULL,
verbose = TRUE
)
Arguments
source |
Character vector of source codes. |
file_type |
Character vector of file type codes. |
year |
Integer vector of years. |
month |
Integer vector of months (required for monthly sources). |
uf |
Character vector of UF codes (required for UF-scoped sources). |
... |
Additional arguments forwarded to |
bind |
Logical. When |
include_prelim |
Logical. Include preliminary data trees (default |
timeout |
Timeout in seconds for FTP and download operations (default 240). |
use_cache |
Logical. Reuse cached downloads (default |
cache_dir |
Optional cache directory. |
verbose |
Logical. Emit progress messages (default |
Value
A tibble (when bind = TRUE) or a named list of tibbles.
Examples
tryCatch({
# Fetch a small SIHSUS slice into tempdir() (network required).
df <- datasus_fetch(
source = "SIHSUS",
file_type = "RD",
year = 2024,
month = 1,
uf = "AC",
select = c("uf_zi", "ano_cmpt", "munic_res", "val_tot"),
cache_dir = tempdir(),
verbose = FALSE
)
}, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
List DATASUS file types
Description
Returns the internal catalog of file types. Optionally filtered by source and/or file type codes.
Usage
datasus_file_types(source = NULL, file_type = NULL)
Arguments
source |
Optional character vector of source codes (e.g. |
file_type |
Optional character vector of file type codes (e.g. |
Value
A tibble.
Examples
datasus_file_types()
datasus_file_types(source = "SIHSUS")
datasus_file_types(source = "CNES", file_type = "ST")
List files and directories from the DATASUS FTP
Description
Fetches a raw directory listing from a DATASUS FTP path.
Usage
datasus_ftp_ls(path, timeout = 120, ftp_use_epsv = FALSE, verbose = TRUE)
Arguments
path |
FTP path or full URL. Relative paths are prefixed with the DATASUS public FTP root. |
timeout |
Timeout in seconds (default 120). |
ftp_use_epsv |
Logical. Passed to curl (default |
verbose |
Logical. Emit progress messages (default |
Value
A tibble with columns ftp_url and entry.
Examples
tryCatch(
datasus_ftp_ls("SIHSUS/200801_/Dados/", verbose = FALSE),
error = function(e) message("FTP unavailable: ", conditionMessage(e))
)
Download DATASUS territorial tables
Description
Downloads and reads tables from the DATASUS territorial data section. The DATASUS FTP publishes a ZIP archive per year containing reference tables for municipalities, health regions, and other geographic divisions used by SUS, in CSV, DBF, and TXT formats.
Usage
datasus_get_territory(
table = "tb_municip",
year = NULL,
format = "csv",
cache_dir = NULL,
verbose = TRUE
)
Arguments
table |
Name of the table to read. Common values:
|
year |
Year of the territorial table. Defaults to the current year.
Use |
format |
File format to extract from the ZIP: |
cache_dir |
Optional cache directory. |
verbose |
Logical. Emit progress messages (default |
Value
A tibble with column names in snake_case.
Examples
tryCatch({
# Download territorial tables into tempdir() (network required).
municipios <- datasus_get_territory(
"tb_municip",
cache_dir = tempdir(),
verbose = FALSE
)
ufs <- datasus_get_territory(
"tb_uf",
cache_dir = tempdir(),
verbose = FALSE
)
}, error = function(e) message("FTP unavailable: ", conditionMessage(e)))
List available DATASUS files
Description
Builds candidate file names from the internal catalog and, optionally, validates them against the DATASUS FTP.
Usage
datasus_list_files(
source,
file_type,
year = NULL,
month = NULL,
uf = NULL,
include_prelim = TRUE,
check_exists = TRUE,
timeout = 120,
ftp_use_epsv = FALSE,
verbose = TRUE
)
Arguments
source |
Character vector of source codes. |
file_type |
Character vector of file type codes. |
year |
Integer vector of years. |
month |
Integer vector of months (required for monthly sources). |
uf |
Character vector of UF codes (required for UF-scoped sources). |
include_prelim |
Logical. Include preliminary data trees (default |
check_exists |
Logical. Query the FTP and keep only files that exist
(default |
timeout |
Timeout in seconds for FTP requests (default 120). |
ftp_use_epsv |
Logical. Passed to curl (default |
verbose |
Logical. Emit progress messages (default |
Value
A tibble with one row per file, including its FTP URL and metadata.
Examples
tryCatch(
datasus_list_files(
source = "SIHSUS",
file_type = "RD",
year = 2024,
month = 1,
uf = c("PE", "PB"),
verbose = FALSE
),
error = function(e) message("FTP unavailable: ", conditionMessage(e))
)
List DATASUS modalities
Description
Returns the available DATASUS data modalities (data, documentation, programs, etc.).
Usage
datasus_modalities()
Value
A tibble.
Examples
datasus_modalities()
List DATASUS data sources
Description
Returns the internal catalog of DATASUS sources available in datasusr.
Usage
datasus_sources()
Value
A tibble with one row per source, including its code, description, default scope, and flags for monthly and UF support.
Examples
datasus_sources()
List UF codes used by DATASUS downloads
Description
Returns a character vector of the 27 Brazilian state abbreviations accepted by DATASUS file naming conventions.
Usage
datasus_ufs()
Value
A character vector of length 27.
Examples
datasus_ufs()
Format byte sizes for display
Description
Converts raw byte counts into human-readable strings (e.g. "1.23 MB").
Usage
format_bytes(x)
Arguments
x |
Numeric vector of byte sizes. |
Value
A character vector with formatted sizes.
Examples
format_bytes(c(1024, 1048576, NA))
Read DATASUS DBC files
Description
Reads a DATASUS .dbc file directly into R using native C code for
PKWare DCL decompression and DBF parsing. The function always returns a
tibble and is designed to work well with the tidyverse.
Usage
read_datasus_dbc(
file,
select = NULL,
n_max = Inf,
trim_ws = TRUE,
encoding = "latin1",
guess_types = TRUE,
col_types = NULL,
parse_dates = FALSE,
clean_names = TRUE,
verbose = TRUE
)
Arguments
file |
Path to a |
select |
Optional character vector of column names to keep. When
|
n_max |
Maximum number of rows to read. Defaults to |
trim_ws |
Logical. Trim leading/trailing whitespace from character
fields (default |
encoding |
Encoding of the DBF character fields. Typically |
guess_types |
Logical. Inspect numeric fields to distinguish
integer-like columns from double columns (default |
col_types |
Optional named character vector of explicit column types.
Supported values: |
parse_dates |
Logical. If |
clean_names |
Logical. If |
verbose |
Logical. Emit progress messages (default |
Value
A tibble.
Examples
# The example downloads a small DATASUS file into tempdir() and reads it.
# Skipped automatically if the FTP is unreachable.
tmp <- file.path(tempdir(), "RDAC2401.dbc")
url <- paste0(
"ftp://ftp.datasus.gov.br/dissemin/publicos/SIHSUS/200801_/Dados/",
"RDAC2401.dbc"
)
ok <- tryCatch(
{
utils::download.file(url, tmp, mode = "wb", quiet = TRUE)
TRUE
},
error = function(e) FALSE,
warning = function(w) FALSE
)
if (ok) {
# Basic read (column names in snake_case by default)
x <- read_datasus_dbc(tmp, verbose = FALSE)
# Select works with either case
x <- read_datasus_dbc(
tmp,
select = c("uf_zi", "ano_cmpt", "val_tot"),
verbose = FALSE
)
# Keep original uppercase names
x <- read_datasus_dbc(tmp, clean_names = FALSE, verbose = FALSE)
unlink(tmp)
}