Turku Izhevsk Corpus
Keywords: literary language, journalistic language
The Turku Izhevsk Corpus contains about 11,000 Udmurt texts from a newspaper and five journals:
- Udmurt dunne 10,366 texts (years 1997,1998,1999, 2000, 2001)
- Dzhetshbur 152 texts
- Vordskem kyl 139 texts
- Invozho 130 texts
- Kenesh 116 texts
- Kizili 116 texts
The number of word tokens in the corpus is ca. 4,232,000.
The corpus was created by the Research Unit for Volgaic Languages (University of Turku) in collaboration with the Language Department of the Udmurt Institute of History, Language and Literature (Izhevsk).
The corpus is accessible through Finno-Ugric Corpora portal.
For further details, see the attached pdf file.
Details about the resource
Content
- Language: Udmurt
- Form: written language
- Genre: journalism
- Dataset size: 11 000 texts, 4,232,000 word tokens
- Timescale: 1997–2002
Authors
| Jorma Luutonen | coordinator |
Availability
Available at
Contact person
| Jussi Ylikoski | volgaserver *at* utu.fi |