A tidy, pipe-friendly toolkit for reproducible web crawling and structured data collection, inspired by the architecture of the 'Crawlee' library. Provides a unified crawler with a deduplicating, resumable request queue, content-type aware handlers, structured storage backends and rich console logging via 'cli'. Supports crawling HTML pages, sitemaps, RSS and Atom feeds and PDF documents, with optional headless-browser rendering and helpers for retrieval-augmented generation.
| Version: | 0.1.0 |
| Depends: | R (≥ 4.1.0) |
| Imports: | cli, httr2, R6, rlang, rvest, tibble, vctrs, xml2 |
| Suggests: | arrow, chromote, DBI, dplyr, duckdb, httptest2, jsonlite, knitr, later, nanoparquet, pdftools, polite, promises, rmarkdown, testthat (≥ 3.0.0) |
| Published: | 2026-07-03 |
| DOI: | 10.32614/CRAN.package.crawlee (may not be active yet) |
| Author: | Andre Leite [aut, cre], Marcos Wasilew [aut], Hugo Vasconcelos [aut], Carlos Amorin [aut], Diogo Bezerra [aut] |
| Maintainer: | Andre Leite <leite at castlab.org> |
| BugReports: | https://github.com/StrategicProjects/crawlee/issues |
| License: | MIT + file LICENSE |
| URL: | https://github.com/StrategicProjects/crawlee, https://strategicprojects.github.io/crawlee/ |
| NeedsCompilation: | no |
| Language: | en-US |
| Materials: | README, NEWS |
| CRAN checks: | crawlee results |
| Reference manual: | crawlee.html , crawlee.pdf |
| Vignettes: |
Getting started with crawlee (source, R code) Crawling a website (source, R code) A RAG pipeline (source, R code) Scaling and politeness (source, R code) Storage and resumable runs (source, R code) |
| Package source: | crawlee_0.1.0.tar.gz |
| Windows binaries: | r-devel: not available, r-release: not available, r-oldrel: not available |
| macOS binaries: | r-release (arm64): crawlee_0.1.0.tgz, r-oldrel (arm64): crawlee_0.1.0.tgz, r-release (x86_64): not available, r-oldrel (x86_64): crawlee_0.1.0.tgz |
Please use the canonical form https://CRAN.R-project.org/package=crawlee to link to this page.