MokshEr Corpus
Keywords: literary language, journalistic language
The MokshEr Corpus contains a collection of newspaper and journal articles from the years 2002-2009, as well as a few works of fiction. The texts are unannotated.
The Erzya part of the corpus consists of 2,991 texts, containing ca. 2,785,000 word tokens. The size of the Moksha part is 1,300 texts, ca. 1,742,000 word tokens.
The Erzya texts orginate in the journal Syatko (2003–2004, 2006–2008), and the newspapers Erzyan pravda (2005–2008) and Erzyan mastor (2003–2009).
The Moksha texts have been taken from the journals Moksha (2002–2003, 2005–2007) and Yakster tyashtenya (2005), and the newspaper Mokshen pravda (2002–2005).
The corpus is accessible through Finno-Ugric Corpora portal.
Information about the resource
Content
- Languages: Erzya, Moksha
- Form: written language
- Genre: fiction, newspaper texts
- Dataset size: 4,291 texts, 4,527,000 word tokens
- Timescale: 2002–2009
Availability
Available at
Contact person
| Jussi Ylikoski | volgaserver *at* utu.fi |