Fast and memory-friendly tools for text vectorization, topic
modeling (LDA, LSA), word embeddings (GloVe), similarities. This package
provides a source-agnostic streaming API, which allows researchers to perform
analysis of collections of documents which are larger than available RAM. All
core functions are parallelized to benefit from multicore machines.
| Version: |
0.6.4 |
| Depends: |
R (≥ 3.6.0), methods |
| Imports: |
Matrix (≥ 1.5-2), Rcpp (≥ 1.0.3), R6 (≥ 2.3.0), data.table (≥ 1.9.6), rsparse (≥ 0.3.3.4), stringi (≥ 1.1.5), mlapi (≥ 0.1.0), lgr (≥ 0.2), digest (≥ 0.6.8) |
| LinkingTo: |
Rcpp, digest (≥ 0.6.8) |
| Suggests: |
magrittr, udpipe (≥ 0.6), glmnet, testthat, covr, knitr, rmarkdown, proxy |
| Published: |
2023-11-09 |
| DOI: |
10.32614/CRAN.package.text2vec |
| Author: |
Dmitriy Selivanov [aut, cre, cph],
Manuel Bickel [aut, cph] (Coherence measures for topic models),
Qing Wang [aut, cph] (Author of the WaprLDA C++ code) |
| Maintainer: |
Dmitriy Selivanov <selivanov.dmitriy at gmail.com> |
| BugReports: |
https://github.com/dselivanov/text2vec/issues |
| License: |
GPL-2 | GPL-3 | file LICENSE [expanded from: GPL (≥ 2) | file LICENSE] |
| URL: |
http://text2vec.org |
| NeedsCompilation: |
yes |
| Materials: |
README, NEWS |
| In views: |
NaturalLanguageProcessing |
| CRAN checks: |
text2vec results [issues need fixing before 2025-11-15] |