ArkiSyn Database of Finnish Conversational Discourse
Keywords: spoken language, everyday conversation
The project aims to produce a morphosyntactically annotated corpus of everyday Finnish-language conversations in order to facilitate grammatical research that is based on a large corpus of everyday interactions. The corpus enables the comparative research of morphosyntactic phenomena in conversational data and other types of language use. The project promotes the availability and accessibility of language corpora.
The project is funded by the Kone Foundation (2013–2019) as part of their language programme. The project has received additional funding from the Turku University Foundation and the FIN-CLARIN consortium.
Details about the resource
- Language: Finnish
- Form: audio, transcriptions
- Genre: everyday conversation
- Dataset size: 29 hours of audio, 26 texts, 44,606 sentences, 13,473 words, 278,909 word tokens
- lemmatization
- part of speech
- morphology
- syntax
| Marja-Liisa Helasvuo | principal investigator |
| Mikael Varjo | |
| Kukka-Maaria Wessman | |
| Klaus Kurki | |
| Ilari Sairanen |
Contact person
| Marja-Liisa Helasvuo | mlhelas *at* utu.fi |
Usage license
Other notices
Permanent address of dataset
Reference instructions
University of Turku, School of Languages and Translation Studies (2017). ArkiSyn Database of Finnish Conversational Discourse, Helsinki Korp Version [data set]. Kielipankki. http://urn.fi/urn:nbn:fi:lb-2017022801