Turku Chuvash Corpus (TuChC)

Keywords: literary language, journalistic language, academic language

The Turku Chuvash Corpus v1.1 is a collection Chuvash texts of various genres. The size of the corpus is 219 texts, containing ca. 1,237,000 word tokens.

The distribution of texts into genres, and approximate word token counts are as follows:

Fictional prose: 45 texts; 295,000 word tokens
Poetry: 14 ; 13,000
Translations: 4 ; 3,000
Journalism: 78 ; 157,000
Scholarly: 44 ; 560,000
Bible: 31 ; 209,000
Miscellaneous: 3 ; 500

The texts were collected by Jorma Luutonen and Eduard Fomin in the years 2003–2009.

The corpus is accessible through Finno-Ugric Corpora portal.

Details about the resource

Content

Language: Chuvash
Form: written language
Genre: fiction, journalism, poetry, scientific texts, religious texts
Dataset size: 219 texts, 1,237,000 word tokens

Authors

Jorma Luutonen	coordinator
Eduard Fomin

Availability

Available at

https://finno-ugric-corpora.utu.fi/cqpweb/

Contact person

Jussi Ylikoski

volgaserver *at* utu.fi

Links

UTU-Digilang front page