metaphonebr

Codecov test coverage R-CMD-check Lifecycle: experimental

The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.

Installation

The package is in the process of submission to CRAN. When it is accepted, the stable version can be installed with:

install.packages("metaphonebr")

You can install the development version of metaphonebr from GitHub with :

# install.packages("remotes")
remotes::install_github("ipeadata-lab/metaphonebr")

Example

This is a basic example which shows how to use the main function:

example_names <- c("João da Silva", "Maria", "Marya",
                    "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr::metaphonebr(example_names)
print(data.frame(original = example_names, metaphonebr = phonetic_codes))

The metaphoneBR phonetic encoding algorithm proceeds as follows:

  1. Initial Cleanup & Preparation:
  2. Silent Letter Removal:
  3. Digraph Simplification (Sound Grouping):
  4. Similar Consonant Simplification:
  5. Terminal Nasal Sound Simplification:
  6. Duplicate Vowel Removal:
  7. Final Cleanup (Duplicate Letters & Spaces):

The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).

Nota Ipea

metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).