The first vignette (“Getting Started with sumer”) introduced the
basic concepts: cuneiform sign representations, dictionary lookup, the
type system, text analysis, and the interactive translate()
function. This vignette describes the complete workflow for translating
an entire document and building a custom dictionary from the
results.
The workflow consists of the following steps:
translate_line()Each translated line improves the dictionary, and the improved dictionary makes the next translation easier. This creates a virtuous cycle.
A translation project consists of a folder with the following structure:
project/
complete_cuneiform_text.txt # cuneiform text
lines/ # translated lines
Line_1.txt
Line_2.txt
...
The text file contains the cuneiform text. If you
have a transliterated text, you can convert it with
as.cuneiform(). Each line can optionally begin with a line
number (e.g. 8)\t...). Lines starting with #
are treated as comments and ignored during analysis.
The lines/ subfolder stores one file
per translated line. The file for line n is called
Line_n.txt. These files are created automatically by
translate_line() when you click “Done”.
The package includes an example project for the Sumerian myth “Enki and the World Order”. Since the project folder inside the installed package is read-only, we first copy it to a temporary directory:
pkg_path <- system.file("extdata", "project", package = "sumer")
file.copy(from = pkg_path, to = tempdir(), recursive = TRUE)
#> [1] TRUE
project_dir <- file.path(tempdir(), "project")
cat(head(list.files(project_dir, recursive=TRUE)), sep="\n")
#> enki_and_the_world_order.txt
#> lines/Line_1.txt
#> lines/Line_10.txt
#> lines/Line_11.txt
#> lines/Line_12.txt
#> lines/Line_13.txtNow we can set up the translation context and load the text:
ctx <- translation_context(
project_dir = project_dir,
text = "enki_and_the_world_order.txt",
dic = file.path(pkg_path, "sumer-dictionary.txt"),
mapping = NULL,
sentence_prob = 0.25
)
text <- readLines(ctx$text, encoding = "UTF-8")The parameters of translation_context() are:
project_dir: The project directory
with subfolder lines.text: The file path (or filename
relative to project_dir) of the file with the full cuneiform text.dic: One or more dictionary files. The
first has priority for automatic suggestions.mapping: A custom sign mapping table
(data frame or file path). If NULL, the package’s built-in
mapping is used. A custom mapping is needed when working with texts that
contain signs not covered by the default table.sentence_prob: Corrects for verb
underrepresentation in the dictionary. A value of 0.25 means that an
estimated 25% of the dictionary entries come from complete sentences;
verb probabilities are upweighted accordingly.translate_line()To translate a specific line, call:
This opens the interactive translation tool. If a file
Line_8.txt already exists in the lines/
folder, the previous translation is loaded so that you can continue
where you left off. When you click “Done”, the result is saved back to
the file.
Additionally, translate_line() builds on the fly a
project-specific dictionary from all previously translated lines in the
lines/ folder that can be used in the Shiny App. The
current line is excluded to avoid confirmation bias. This project
dictionary appears alongside the primary dictionary in the lookup
panel.
The gadget displays several sections on a scrollable page. The following sections describe each of them.
The first section displays frequent sign combinations (n-grams) computed from the entire text that appear in the current line. Recurring patterns point to fixed terms or compound words. Combinations that also appear in neighbouring lines are marked with a checkmark in the “Theme” column – these are thematic connections across lines.
Outside the gadget, the same analysis is available through the
functions ngram_frequencies() and
mark_ngrams() (see Vignette 1, Section 5.1).
This section lists sign combinations from the current line for which one of the dictionaries offers a translation. This helps identify multi-sign expressions that have known meanings and can guide you in setting up bracket structures.
The neighbouring lines (up to 2 before and 2 after the current line) are shown with frequent n-grams marked in curly braces. This reveals patterns that repeat across line boundaries and helps understand the thematic flow of the text.
Outside the gadget, you can mark n-grams in any text with
mark_ngrams() (see Vignette 1, Section 5.1).
A bar chart shows the probability of each grammatical type for each
sign in the current line, based on the dictionary. This is the same
visualization produced by plot_sign_grammar() (see Vignette
1, Section 5.2). Tall green bars suggest a noun (S), red bars suggest a
verb (V), and blue bars suggest an operator producing an attribute
(A).
The main interactive section is the translation area. Here you see the skeleton template with input fields for type and translation. This is where the actual translation work happens – assigning types, looking up dictionary entries, adjusting the bracket structure, and composing translations. The basic mechanics (green lookup button, brown compose button, bracket input, “Update Skeleton”) are described in Vignette 1, Section 6. Verb prefixes and suffixes are explained in Vignette 1, Section 4.3.
Let us demonstrate the translation process on line 8, which features verb prefixes.
i <- which(startsWith(text, "8)"))
cat(text[i], sep = "\n")
#> 8) 𒀭𒂗𒆠𒂗𒃶𒅅𒆷𒀭𒀀𒉣𒈾𒆤𒉈
cat(as.sign_name(text[i]), sep = "\n")
#> 8) AN.EN.KI.EN.GAN.IG.LA.AN.A.NUN.NA.KID.NELine 8 contains two sentences:
First sentence: 𒀭𒂗𒆠𒂗𒃶𒅅𒆷
Here, 𒀭𒂗𒆠 forms the subject (Enki), 𒂗 is the object (“cultural leader”), and 𒃶𒅅𒆷 is a complex verb with two prefixes:
| Sign | Type | Translation |
|---|---|---|
| an=AN=𒀭 | ☒S→S | the god of heaven who is S |
| en=EN=𒂗 | ☒S→S | the cultural leader of S |
| ki=KI=𒆠 | S | the Earth |
| en=EN=𒂗 | S | cultural leader |
| gan=GAN=𒃶 | ☒V→V | may V |
| ig=IG=𒅅 | ☒V→V | V with the task of establishing sustenance of human existence |
| la=LA=𒆷 | Vt | to equip S |
The verb builds up from the core outward: 𒆷 (Vt) is the core verb, 𒅅 wraps it with additional meaning, and 𒃶 adds modality. The final composed verb is: “may equip S with the task of establishing sustenance of human existence” (Vt).
The bracket structure ((𒀭𒂗𒆠)𒂗(𒃶𒅅𒆷)) groups the subject
(𒀭𒂗𒆠) and the verb (𒃶𒅅𒆷) as units within the
sentence. This grouping guides the skeleton hierarchy and makes the
compose button work correctly for each unit. Sign combinations that
should later be included in a dictionary must be written in
brackets.
Second sentence: 𒀭𒀀𒉣𒈾𒆤𒉈
This sentence demonstrates the operator type S☒→A, which
produces an attribute:
| Sign Type | Translation | |
|---|---|---|
| an=AN=𒀭 | ☒S→S | god of heaven with S |
| a=A=𒀀 | S | transformative power |
| nun=NUN=𒉣 | S | exaltedness |
| na=NA=𒈾 | S☒→S | being bound to S |
| ke4=KID=𒆤 | S☒→A | who is defined as S |
| ne=NE=𒉈 | V | to be used as a resource |
The sign sequence AN.A.NUN.NA.KID denotes the
Anunnaki, the gods of the Sumerian pantheon. The
compositional translation of this sign sequence is: “the gods of heaven
with transformative power who are defined as being bound to
exaltedness”. Here, KID (S☒→A) transforms the noun phrase
to its left into an attribute (A). The attribute then combines with the
remaining noun phrase (S + A -> S) before meeting the verb.
Structure: ((𒀭𒂗𒆠)𒂗(𒃶𒅅𒆷)). ((𒀭𒀀𒉣𒈾𒆤)𒉈).
|an-en-ki-en-gan-ig-la-an-a-nun-na-ke4-ne: SEN: Enki, the god of heaven
who is the cultural leader of the Earth may equip cultural leaders with
the task of establishing sustenance of human existence. The Anunnaki,
the gods of heaven with transformative power who are defined as being
bound to exaltedness are used as a resource.
|an-en-ki-en-gan-ig-la=AN.EN.KI.EN.GAN.IG.LA=𒀭𒂗𒆠𒂗𒃶𒅅𒆷: SEN: ...
| an-en-ki=AN.EN.KI=𒀭𒂗𒆠: S: Enki, the god of heaven who is the
cultural leader of the Earth
| an=AN=𒀭: ☒S→S: the god of heaven who is S
| en=EN=𒂗: ☒S→S: the cultural leader of S
| ki=KI=𒆠: S: the Earth
| en=EN=𒂗: S: cultural leader
| gan-ig-la=GAN.IG.LA=𒃶𒅅𒆷: Vt: may equip S with the task of
establishing sustenance of human existence
| gan=GAN=𒃶: ☒V→V: may V
| ig=IG=𒅅: ☒V→V: V with the task of establishing sustenance
of human existence
| la=LA=𒆷: Vt: to equip S
|an-a-nun-na-ke4-ne=AN.A.NUN.NA.KID.NE=𒀭𒀀𒉣𒈾𒆤𒉈: SEN: ...
| an-a-nun-na-ke4=AN.A.NUN.NA.KID=𒀭𒀀𒉣𒈾𒆤: S: The Anunnaki, the gods
of heaven with transformative power who are defined as being bound
to exaltedness
| an=AN=𒀭: ☒S→S: god of heaven with S
| a=A=𒀀: S: transformative power
| nun=NUN=𒉣: S: exaltedness
| na=NA=𒈾: S☒→S: being bound to S
| ke4=KID=𒆤: S☒→A: who is defined as S
| ne=NE=𒉈: V: to be used as a resource
The line files produced by translate() and
translate_line() use a pipe format where each entry
starting with | becomes a dictionary entry. When building a
dictionary from these files, some automatic normalization is applied. It
is helpful to understand these conventions when writing
translations:
Curly braces {specific meaning} in a
translation indicate a context-specific interpretation. For example,
“container {country}” means that the general compositional meaning is
“container” but in this context it refers to “country”. When composing
entries with the compose button, only the specific meaning inside the
curly braces is used for substitution.
Angle brackets <comment> in a
translation contain comments or annotations. The text inside angle
brackets is stripped out from translations. This can be used to add
explanatory notes, for example: “S <the agent of the transitive
verb>”.
Leading articles are removed. Nouns and noun phrases should be translated as they fit into the sentence, including articles where appropriate. When the dictionary is built, leading articles (“the”, “a”, “an”) at the beginning of a translation string are automatically stripped. This ensures clean dictionary entries while allowing natural English in the line files.
Verbs should be in base form. Verb translations should be in their base form, optionally preceded by “to” (e.g. “to create” or “create”). A leading “to” is automatically removed when building the dictionary.
Once you have translated several lines, the line files in the
lines/ folder can be combined into a dictionary. The
function make_dictionary() reads all line files and
aggregates the entries:
line_files <- list.files(ctx$line_folder, full.names = TRUE)
head(basename(line_files))
#> [1] "Line_1.txt" "Line_10.txt" "Line_11.txt" "Line_12.txt" "Line_13.txt"
#> [6] "Line_14.txt"
project_dic <- make_dictionary(line_files)The function counts how often each combination of sign name, type, and translation occurs across all files. Signs that appear frequently with the same meaning get higher counts, making them more reliable dictionary entries.
Let us inspect some entries:
look_up("AN", project_dic)
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────
#> Search: AN
#> ──────────────────────────────────────────────────────────────────────────────────────────
#>
#> Cuneiform: 𒀭
#> Sign Names: AN
#>
#> ▶ Translations:
#> [ 4] S god of heaven
#> [ 3] ☒S→S god of heaven who is S
#> [ 3] ☒S→S god of heaven with S
#> [ 1] ☒S→S god of heaven with the task of S
#> [ 1] S heaven
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────The sign AN appears with multiple types and meanings, each with a count reflecting how often that particular usage was attested in the translated lines.
A project dictionary built from a single text is most useful when
combined with a broader dictionary. The function
merge_dictionaries() combines two or more dictionaries:
dic1 <- read_dictionary()
#> ###---------------------------------------------------------------
#> ### Sumerian Dictionary
#> ###
#> ### Author: Robin Wellmann
#> ### Year: 2026
#> ### Version: 0.5
#> ### Watch for Updates: https://founder-hypothesis.com/en/sumerian-mythology/downloads/
#> ###---------------------------------------------------------------
merged_dic <- merge_dictionaries(dic1, project_dic)Translation entries that agree in sign name, type, and meaning are merged by summing their counts. Cuneiform and reading rows are taken from the first dictionary.
We can verify that the merged dictionary contains entries from both sources:
look_up("LAM", dic1)
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────
#> Search: LAM
#> ──────────────────────────────────────────────────────────────────────────────────────────
#>
#> Cuneiform: 𒇴
#> Sign Names: LAM
#>
#> ▶ Translations:
#> (no entries found)
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────
look_up("LAM", project_dic)
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────
#> Search: LAM
#> ──────────────────────────────────────────────────────────────────────────────────────────
#>
#> Cuneiform: 𒇴
#> Sign Names: LAM
#>
#> ▶ Translations:
#> [ 1] ☒S→S abundance of S
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────
look_up("LAM", merged_dic)
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────
#> Search: LAM
#> ──────────────────────────────────────────────────────────────────────────────────────────
#>
#> Cuneiform: 𒇴
#> Sign Names: LAM
#>
#> ▶ Translations:
#> [ 1] ☒S→S abundance of S
#>
#> ──────────────────────────────────────────────────────────────────────────────────────────A note on combining dictionaries: Merging is most meaningful when the underlying texts come from comparable periods and regions of Mesopotamia. The same sign can carry different meanings across time and place, so combining dictionaries from widely different epochs may produce misleading frequency counts.
The completed dictionary can be saved with metadata:
save_dictionary(
dic = merged,
file = "my_dictionary.txt",
author = "My Name",
year = "2026",
version = "1.0",
url = "https://example.com/dictionary"
)The saved dictionary can be loaded in future sessions with
read_dictionary("my_dictionary.txt") and used as the
primary dictionary for new translation projects.
The workflow described in this vignette forms a self-reinforcing cycle:
translate_line(), guided by the dictionary and n-gram
analysis.translate_line(), it automatically builds a project
dictionary from all saved lines (excluding the current one). This
project dictionary appears alongside the primary dictionary, providing
suggestions based on your own previous work.make_dictionary() and merge it with an existing one using
merge_dictionaries().With each translated line, the dictionary grows. Frequent signs and expressions accumulate higher counts, and the automatic pre-filling of translation templates becomes increasingly accurate. Over time, you build a comprehensive dictionary grounded in your own texts.