Turku Chuvash Corpus (TuChC)

Keywords: literary language, journalistic language, academic language

The Turku Chuvash Corpus v1.1 is a collection Chuvash texts of various genres. The size of the corpus is 219 texts, containing ca. 1,237,000 word tokens.

The distribution of texts into genres, and approximate word token counts are as follows:

  • Fictional prose: 45 texts; 295,000 word tokens
  • Poetry: 14 ; 13,000
  • Translations: 4 ; 3,000
  • Journalism: 78 ; 157,000
  • Scholarly: 44 ; 560,000
  • Bible: 31 ; 209,000
  • Miscellaneous: 3 ; 500

The texts were collected by Jorma Luutonen and Eduard Fomin in the years 2003–2009.

The corpus is accessible through Finno-Ugric Corpora portal.

Details about the resource

Content
  • Language: Chuvash
  • Form: written language
  • Genre: fiction, journalism, poetry, scientific texts, religious texts
  • Dataset size: 219 texts, 1,237,000 word tokens
Authors
Jorma Luutonencoordinator
Eduard Fomin 
Availability

Contact persons

Jussi Ylikoskivolgaserver *at* utu.fi