Turku Izhevsk Corpus

Keywords: literary language, journalistic language

The Turku Izhevsk Corpus contains about 11,000 Udmurt texts from a newspaper and five journals:

Udmurt dunne 10,366 texts (years 1997,1998,1999, 2000, 2001)
Dzhetshbur 152 texts
Vordskem kyl 139 texts
Invozho 130 texts
Kenesh 116 texts
Kizili 116 texts

The number of word tokens in the corpus is ca. 4,232,000.

The corpus was created by the Research Unit for Volgaic Languages (University of Turku) in collaboration with the Language Department of the Udmurt Institute of History, Language and Literature (Izhevsk).

The corpus is accessible through Finno-Ugric Corpora portal.

For further details, see the attached pdf file.

Details about the resource

Content

Language: Udmurt
Form: written language
Genre: journalism
Dataset size: 11 000 texts, 4,232,000 word tokens
Timescale: 1997–2002

Authors

Jorma Luutonen

coordinator

Availability

Available at

https://finno-ugric-corpora.utu.fi/cqpweb/

Contact person

Jussi Ylikoski

volgaserver *at* utu.fi

Detailed corpus description (pdf)

Turku Izhevsk Corpus description

Links

UTU-Digilang front page