ArkiSyn Database of Finnish Conversational Discourse

Keywords: spoken language, everyday conversation

The project aims to produce a morphosyntactically annotated corpus of everyday Finnish-language conversations in order to facilitate grammatical research that is based on a large corpus of everyday interactions. The corpus enables the comparative research of morphosyntactic phenomena in conversational data and other types of language use. The project promotes the availability and accessibility of language corpora.

The project is funded by the Kone Foundation (2013–2019) as part of their language programme. The project has received additional funding from the Turku University Foundation and the FIN-CLARIN consortium.

Details about the resource

Content
  • Language: Finnish
  • Form: audio, transcriptions
  • Genre: everyday conversation
  • Dataset size: 29 hours of audio, 26 texts, 44,606 sentences, 13,473 words, 278,909 word tokens
Annotations
  • lemmatization
  • part of speech
  • morphology
  • syntax
Authors
Marja-Liisa Helasvuoprincipal investigator
Mikael Varjo 
Kukka-Maaria Wessman 
Klaus Kurki 
Ilari Sairanen 
Availability

Contact person

Marja-Liisa Helasvuomlhelas *at* utu.fi

Usage license

CC BY-ND

Other notices

contains personal data
Referring

Permanent address of dataset

Reference instructions

University of Turku, School of Languages and Translation Studies (2017). ArkiSyn Database of Finnish Conversational Discourse, Helsinki Korp Version [data set]. Kielipankki. http://urn.fi/urn:nbn:fi:lb-2017022801