Turku Chuvash Corpus (TuChC)
Keywords: literary language, journalistic language, academic language
The Turku Chuvash Corpus v1.1 is a collection Chuvash texts of various genres. The size of the corpus is 219 texts, containing ca. 1,237,000 word tokens.
The distribution of texts into genres, and approximate word token counts are as follows:
- Fictional prose: 45 texts; 295,000 word tokens
- Poetry: 14 ; 13,000
- Translations: 4 ; 3,000
- Journalism: 78 ; 157,000
- Scholarly: 44 ; 560,000
- Bible: 31 ; 209,000
- Miscellaneous: 3 ; 500
The texts were collected by Jorma Luutonen and Eduard Fomin in the years 2003–2009.
The corpus is accessible through Finno-Ugric Corpora portal.
Details about the resource
Content
- Language: Chuvash
- Form: written language
- Genre: fiction, journalism, poetry, scientific texts, religious texts
- Dataset size: 219 texts, 1,237,000 word tokens
Authors
| Jorma Luutonen | coordinator |
| Eduard Fomin |
Availability
Available at
Contact persons
| Jussi Ylikoski | volgaserver *at* utu.fi |