% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stroke_edit_distance.R
\name{sedist}
\alias{sedist}
\title{Compute the stroke edit distances between two sets of kanji}
\usage{
sedist(k1, k2, type = c("full", "before_slash", "first"))
}
\arguments{
\item{k1, k2}{atomic vectors or lists of kanji in any format that can be treated by \code{\link[=convert_kanji]{convert_kanji()}}}

\item{type}{the type of stroke edit distance to compute. See details.}
}
\value{
A \code{length(k1)} x \code{length(k2)} matrix of stroke edit distances.
}
\description{
Variants of the stroke edit distance proposed by Yencken (2010).
Each kanji is encoded as sequence of stroke types according to
its stroke order, using the type attribute from the kanjiVG data. Then the
edit distance (a.k.a.\ Levenshtein distance) between sequences is computed and
divided by the maximum of the number of strokes
}
\details{
The kanjiVG type attribute is a single string composed of a CJK strokes Unicode character, an optional
latin letter providing further information and possibly a variant (another CJK strokes character with optional
letter) separated by "/". If \code{type} is "full"` a match is only counted if two strings are exactly the
same, "before_slash" ignores any slashes and what comes after them, "first" only considers the first
character of each string (so the first CJK stroke character) when counting matches.

The stroke edit distance used by Yencken (2010) is obtained by setting type = "all" (the default),
except that the underlying kanjiVG data has significantly changed since then. Comparing with the values
in \link{dstrokedit} we get an agreement of 96.3 percent, whereas the other distances disagree by
a small amount (usually 1-2 edit operations).
}
\section{Warning}{

Requires kanjistat.data package.
}

\examples{
ind1 <- 384  
k1 <- convert_kanji(ind1, "character")
ind2 <- which(dstrokedit[ind1,] > 0)  
# dstrokedit contains only the "closest" kanji
k2 <- convert_kanji(ind2, "character")
row_a <- dstrokedit[ind1, ind2]  
if (requireNamespace("kanjistat.data", quietly = TRUE)) {
  row_b <- sedist(k1, k2)  
  mat <- rbind(row_a, row_b)
  rownames(mat) = c(k1, k1)
  colnames(mat) = k2
  mat
}

}
\references{
Yencken, Lars (2010). Orthographic support for passing the reading hurdle in Japanese.\if{html}{\out{<br>}}
PhD Thesis, University of Melbourne, Australia
}
