Mormula: Grammatically Annotated Mordvin Texts

Keywords: literary language, dialects

The Mormula corpus was created in the late 1970s. It contains texts in the two Mordvinic languages, Erzya and Moksha, which are spoken in the European part of Russia. Both dialect and literary language are represented in the corpus.

The size of the whole corpus is 244,368 words. The Erzya and Moksha parts of the corpus contain 129,535 and 114,833 words, respectively.

The corpus has part-of-speech (pos) and morphological annotation. The annotation has been manually created, and it has 216 different tags. These include 64 pos tags and 152 morphological tags, see Pos_table.pdf and Tags_table.pdf. The great number of tags reflects the morphological richness of the Mordvinic languages. Adding syntactic tags to the corpus is under work.

Corpus texts

The corpus texts are located in nine files, all of which beginning with “kk”. All the texts are provided with German or Finnish translations.

Erzya

kk11 = H. Paasonen 1894: Proben der mordwinischen Volksliteratur. Suomalais-Ugrilaisen Seuran aikakauskirja, 12. Pp. 1–152. Helsinki. • Erzya spells, offering prayers, riddles, proverbs and folktales (dialectal texts) • Translations in German • Size 14,613 words

kk21 = H. Paasonen & P. Ravila 1941: Mordwinische Volksdichtung III. Suomalais-Ugrilaisen Seuran toimituksia, 84. Pp. 3–211. Helsinki. • Erzya offering prayers and spells (dialectal texts) • German translations • Size 16,910 words

kk22 = the above source, pp. 215–343. • Erzya folktales (dialectal texts) • German translations • Size 13,610 words

kk31 = Ustno-poeticheskoe tvorchestvo mordovskogo naroda. Tom tretiy, Chast vtoraya. Erzyanskie skazki. Pp. 7–367. Saransk 1967. • Erzya folktales (literary Erzya) • Finnish translations (Russian parallel texts can be found in the original publication) • Size 55,449 words

kk41 = Syatko 1978: 1. Pp. 3–80. Saransk. • Syatko (‘sparkle’) is an Erzya periodical containing belletristic prose and poetry as well as articles with political and social content (literary language) • Finnish translations • Size 28,953 words

Moksha

kk51 = H. Paasonen & P. Ravila 1947: Mordwinische Volksdichtung IV. Suomalais-Ugrilaisen Seuran toimituksia, 91. Pp. 3–154. Helsinki. • Moksha songs (dialectal texts) • German translations • Size 9,102 words

kk52 = the above source, pp. 797–897. • Moksha folktales (dialectal texts) • German translations • Size 11,193 words

kk61 = Ustno-poeticheskoe tvorchestvo mordovskogo naroda. Tom tretiy, Chast pervaya. Mokshanskie skazki. Pp. 17–364. Saransk, 1966. • Moksha folktales (literary Moksha) • Finnish translations (Russian parallel texts can be found in the original publication) • Size 53,947 words

kk71 = V. Levin & F. Levin 1978: Guryan. Povestt i rasskast. Pp. 3–211. Saransk. • Moksha stories (literary Moksha) • Finnish translations • Size 40,591 words

For more information, see the Mormula corpus manual. You can download it from the main page of Finno-Ugric Corpora portal, https://finno-ugric-corpora.utu.fi/cqpweb/.

Details about the resource

Content
  • Language: Erzya, Moksha
  • Form: written language
  • Genre: fiction, newspaper writing, folklore
  • Dataset size: 244,368 words
Annotations
  • part of speech
  • morphology

The tags used can be found in the tables below

Authors
Alho Alhoniemifounder and original principal investigator (PI)
Jorma Luutonencoordinator
Availability

Contact person

Jussi Ylikoskivolgaserver *at* utu.fi