New function convert_kanji for universal conversion
between kanji formats.
New function sedist for computing the stroke edit
distance by Lars Yencken.
compare_neighborhoods gave obscure errors when stroke
edit distances involved kanji with index > 2133. Fixed by returning
an explicit error if the key kanji has such an index and setting the
corresponding return value to NA if any of the closest kanji in the
kanji distance has such an index.kanjidist with approx = "pc" or
approx = "pcweighted" now runs only for
kanjivec objects generated with kanjistat 0.13.0 or
newer.The structure of kanjivec objects has been extended.
Each strokes in the stroketree component now has an
additional attribute "beziermat" which describes the Bézier
curves of the stroke in a standardized 2 x (1+3n) matrix format (n =
number of curves). The new structure is fully backward compatible.
Whether a given kanjivec object kan follows the new
structure can be tested by
attr(kan, "kanjistat_version") >= 0.13.0. The
kvecjoyo dataset on https://github.com/dschuhmacher/kanjistat.data has been
updated accordingly.
New function compare_neighborhoods, which currently
compares stroke edit distances and kanji distances in a dstrokedit
neighborhood of a given kanji and optionally extends the comparison to
nearest neighbors in the kanji distance. This function is still somewhat
experimental.
kanjidist and kanjidistmat have a new
parameter minor_warnings which toggles any warnings that
can be ignored by most users. These warnings usually point to issues in
the underlying kanjivec data or the kanjidist
computation that are currently addressed by workarounds.
approx = "pc" or
approx = "pcweighted" runs considerably faster with the new
kanjivec objects, because the inefficient (multiple)
parsing of d attributes from previous versions is now
avoided.kanjivec objects. Fixed in the internal
functions. Both kanjivec with non-default parameter
bezier_discr and kanjidist with
approx = "pc" or approx = "pcweighted" should
run now in all cases without problems (tested for Jouyou kanji).Function kanjidist has a new argument
approx, which specifies how the strokes are to be
approximated for computing component distances. The three options
“grid”, “pc” or “pcweighted” work in any combination with the three
options for the type argument (which now strictly specifies
the type of distance used for the components).
Function kanjivec has a new argument
bezier_discr, which may be any of “svgparser”, “eqtimed”
and “eqspaced”, specifing, for the discretization of the strokes in the
stroketree component, which code is used and according to
which strategy the points are placed.
Data set pooled_similarity contains the human
similarity judgements of kanji from Yencken and Baldwin (2008).
point cloud approximations (“pc” and “pcweighted”) use (approximately) equispaced points on the Bézier curves now.
Various speed improvements to options “pc” and “pcweighted”.
kanjidist for compo_seg_depth1 >= 5 returned
an error. Fixed.Function kanjidist accepts two new type
arguments “pc” and “pcweighted” for computing component distances based
on (weighted) point clouds rather than bitmap images.
Data sets dstrokedit and dyehli added
with stroke edit and Yeh-Li (bag-of-radicals) distances between Jouyou
kanji and (usually a bit more than) their closest ten neighbors. Based
on the PhD thesis by Lars Yencken (2010).
kanjimat cut off part of the kanji
under the default setting marging = 0 on Windows. The
algorithm for setting the effective margin in the bitmap representation
has been improved.read_kanjidic2, which reads a KANJIDIC2 file
and converts it to a list. All kanji information in the original file is
retained, but the structure is simplified.cjk_escape, which replaces CJK characters
by their Unicode escape sequences in files.More extensive readme file and main package vignette.
Add package website using pkgdown.
plotkanji. This function now
plots several kanji in possibly different fonts. A parameter
filename was added for devices that plot to a file.print.kanjivec() to package exports.