Turku Komi-Permyak Corpus (TuKPC)
Keywords: literary language, journalistic language, academic language
Turku Komi-Permyak Corpus v1.0 is a collection of Komi-Permyak literature, mostly fiction (prose, poetry, drama) but some scholarly and journalistic texts are also included.
The corpus contains ca. 3,011,000 word tokens.
The corpus was created by Jorma Luutonen on the basis of texts collected by Enye Lav. It is accessible through Finno-Ugric Corpora portal.
Details about the resource
- Language: Komi-Permyak
- Form: written language
- Genre: fiction, journalistic texts, poetry, drama, scientific texts
- Dataset size: 3,011,000 word tokens
| Jorma Luutonen | coordinator |
| Enye Lav | researcher |
Available at
Contact persons
| Jussi Ylikoski | volgaserver *at* utu.fi |
Reference instructions
A reference to the corpus should contain the following parts: 1) name of the corpus; 2) abbreviation of the name of the text; and 3) line number in the text.
The name of the corpus is Turku Komi-Permyak Corpus, abbreviated TuKPC.
No fixed abbreviations for the corpus text names are available. You can form your own abbreviations on the basis of the following information.
If you have access to the corpus through the Finno-Ugric Corpora portal, you can find information about a specific text in the following way. When you have made a query, click the text identification code (for example, A67) on the left, in the column "Text". The "nice name" of the text, containing the name of the publication, is shown.
The original fixed line numbers of the corpus text files can be seen inside the query result lines in the form "8323:", "8324:", etc.. They can be used to specify location in a certain text.