Turku Izhevsk Corpus

Keywords: literary language, journalistic language

The Turku Izhevsk Corpus contains about 11,000 Udmurt texts from a newspaper and five journals:

  • Udmurt dunne 10,366 texts (years 1997,1998,1999, 2000, 2001)
  • Dzhetshbur 152 texts
  • Vordskem kyl 139 texts
  • Invozho 130 texts
  • Kenesh 116 texts
  • Kizili 116 texts

The number of word tokens in the corpus is ca. 4,232,000.

The corpus was created by the Research Unit for Volgaic Languages (University of Turku) in collaboration with the Language Department of the Udmurt Institute of History, Language and Literature (Izhevsk).

The corpus is accessible through Finno-Ugric Corpora portal.

For further details, see the attached pdf file.

Details about the resource

Content
  • Language: Udmurt
  • Form: written language
  • Genre: journalism
  • Dataset size: 11 000 texts, 4,232,000 word tokens
  • Timescale: 1997–2002
Authors
Jorma Luutonencoordinator
Availability

Contact person

Jussi Ylikoskivolgaserver *at* utu.fi

Detailed corpus description (pdf)