Diachronic Corpus of Literary Mordvin

Keywords: literary language, journalistic language

The Diachronic Corpus of Literary Mordvin contains newspaper articles from different periods of the development of Erzya and Moksha literary languages. The oldest texts are from the year 1920, and the newest from 2008.

Articles are divided into periods: 

  • 1920s-1937
  • 1938-1950s
  • 1960s-1970s
  • 2000s

They are also classified according to their content: 

  • politics and society (I)
  • economics (II)
  • culture and education (III)
  • fiction (IV)

The corpus contains 516 texts, 281 in Erzya, and 235 in Moksha. The total number of word tokens is ca. 336,000, 187,000 in Erzya, and 149,000 in Moksha.

The corpus can be used to study changes in Erzya and Moksha literary languages.

The corpus is accessible through Finno-Ugric Corpora portal.

Details about the resource

Content
  • Language: Erzya, Moksha
  • Form: written language
  • Genre: newspaper text
  • Dataset size: 516 texts, 336,000 word tokens
  • Timescale: 1920–2008
Authors
Jorma Luutonen 
Mihail Mosin 
Valentina Shtshankina
 
Availability

Contant person

Jussi Ylikoskivolgaserver *at* utu.fi