Mormula: Grammatically Annotated Mordvin Texts
Keywords: literary language, dialects
The Mormula corpus was created in the late 1970s. It contains texts in the two Mordvinic languages, Erzya and Moksha, which are spoken in the European part of Russia. Both dialect and literary language are represented in the corpus.
The size of the whole corpus is 244,368 words. The Erzya and Moksha parts of the corpus contain 129,535 and 114,833 words, respectively.
The corpus has part-of-speech (pos) and morphological annotation. The annotation has been manually created, and it has 216 different tags. These include 64 pos tags and 152 morphological tags, see Pos_table.pdf and Tags_table.pdf. The great number of tags reflects the morphological richness of the Mordvinic languages. Adding syntactic tags to the corpus is under work.
Corpus texts
The corpus texts are located in nine files, all of which beginning with “kk”. All the texts are provided with German or Finnish translations.
Erzya
kk11 = H. Paasonen 1894: Proben der mordwinischen Volksliteratur. Suomalais-Ugrilaisen Seuran aikakauskirja, 12. Pp. 1–152. Helsinki. • Erzya spells, offering prayers, riddles, proverbs and folktales (dialectal texts) • Translations in German • Size 14,613 words
kk21 = H. Paasonen & P. Ravila 1941: Mordwinische Volksdichtung III. Suomalais-Ugrilaisen Seuran toimituksia, 84. Pp. 3–211. Helsinki. • Erzya offering prayers and spells (dialectal texts) • German translations • Size 16,910 words
kk22 = the above source, pp. 215–343. • Erzya folktales (dialectal texts) • German translations • Size 13,610 words
kk31 = Ustno-poeticheskoe tvorchestvo mordovskogo naroda. Tom tretiy, Chast vtoraya. Erzyanskie skazki. Pp. 7–367. Saransk 1967. • Erzya folktales (literary Erzya) • Finnish translations (Russian parallel texts can be found in the original publication) • Size 55,449 words
kk41 = Syatko 1978: 1. Pp. 3–80. Saransk. • Syatko (‘sparkle’) is an Erzya periodical containing belletristic prose and poetry as well as articles with political and social content (literary language) • Finnish translations • Size 28,953 words
Moksha
kk51 = H. Paasonen & P. Ravila 1947: Mordwinische Volksdichtung IV. Suomalais-Ugrilaisen Seuran toimituksia, 91. Pp. 3–154. Helsinki. • Moksha songs (dialectal texts) • German translations • Size 9,102 words
kk52 = the above source, pp. 797–897. • Moksha folktales (dialectal texts) • German translations • Size 11,193 words
kk61 = Ustno-poeticheskoe tvorchestvo mordovskogo naroda. Tom tretiy, Chast pervaya. Mokshanskie skazki. Pp. 17–364. Saransk, 1966. • Moksha folktales (literary Moksha) • Finnish translations (Russian parallel texts can be found in the original publication) • Size 53,947 words
kk71 = V. Levin & F. Levin 1978: Guryan. Povestt i rasskast. Pp. 3–211. Saransk. • Moksha stories (literary Moksha) • Finnish translations • Size 40,591 words
For more information, see the Mormula corpus manual. You can download it from the main page of Finno-Ugric Corpora portal, https://finno-ugric-corpora.utu.fi/cqpweb/.
Details about the resource
- Language: Erzya, Moksha
- Form: written language
- Genre: fiction, newspaper writing, folklore
- Dataset size: 244,368 words
- part of speech
- morphology
The tags used can be found in the tables below
| Alho Alhoniemi | founder and original principal investigator (PI) |
| Jorma Luutonen | coordinator |
Available at
Contact person
| Jussi Ylikoski | volgaserver *at* utu.fi |