The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.
The package is in the process of submission to CRAN. When it is accepted, the stable version can be installed with:
install.packages("metaphonebr")
You can install the development version of metaphonebr from GitHub with :
# install.packages("remotes")
::install_github("ipeadata-lab/metaphonebr") remotes
This is a basic example which shows how to use the main function:
<- c("João da Silva", "Maria", "Marya",
example_names "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
<- metaphonebr::metaphonebr(example_names)
phonetic_codes print(data.frame(original = example_names, metaphonebr = phonetic_codes))
metaphoneBR
phonetic encoding algorithm proceeds as
follows:LH
is replaced by 1
(representing a
palatal lateral approximant, like in “Filha” -> “FI1A”).NH
is replaced by 3
(representing a
palatal nasal, like in “Manhã” -> “MA3A”).CH
is replaced by X
(representing the /ʃ/
sound, like in “Chico” -> “XICO”).SH
is replaced by X
(for foreign names
with /ʃ/ sound, like in “Shirley” -> “XIRLEY”).SCH
is replaced by X
(approximating /ʃ/ or
/sk/, like in “Schmidt” -> “XMIT”).PH
is replaced by F
(like in “Philip”
-> “FILIP”).SC
followed by E
or I
becomes
S
(like in “SCENA” -> “SENA”).SC
followed by A
, O
, or
U
becomes SK
(like in “ESCOVA” ->
“ESKOVA”).QU
or QÜ
followed by E
or
I
becomes K
(e.g., “QUEIJO” ->
“KEIJO”).GU
or GÜ
followed by E
or
I
becomes G
(the U
is silent,
e.g., “GUERRA” -> “GERRA”).QU
becomes K
(e.g., “QUANTO”
-> “KANTO”).Ç
is replaced by S
.C
followed by E
or I
is
replaced by S
(like in “CELSO” -> “SELSO”).C
(not part of an already transformed digraph
like CH or SC) is replaced by K
(like in “CARLOS” ->
“KARLOS”).G
followed by E
or I
is
replaced by J
(like in “GELO” -> “JELO”; GUE/GUI already
handled).Q
(that wasn’t part of QU) is replaced by
K
.W
is replaced by V
(common Brazilian
Portuguese pronunciation, e.g., “WALTER” -> “VALTER”).Y
is replaced by I
(e.g., “YARA” ->
“IARA”).Z
is replaced by S
(e.g., “ZEBRA” ->
“SEBRA”).X
preceded by S
has the X
removed (e.g., “EXCELENTE” -> “ESELENTE”, to avoid a double /s/
representation from SKS
).N
is replaced by M
(e.g.,
“JOAQUIN” -> “JOAQUIM”).AO
is replaced by OM
(e.g.,
“JOÃO” -> “JOOM”).ÃES
is replaced by AES
(e.g.,
“MÃES” -> “MAES”).1
for LH or 3
for NH) are
reduced to a single letter (e.g., “CARRO” might become “CARO”, “LESSA”
becomes “LESA”. Note: This rule simplifies sounds like ‘RR’ and ‘SS’ to
their single counterparts, which is a common Metaphone-style
simplification).The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).
metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).