The Corpus of Academic Finnish (LAS1)
Keywords: academic language
A text corpus is a large and systematically gathered collection of text that contains examples from natural language. A digital corpus allows linguists to analyse language digitally and to perform searches on vocabulary, grammar and the contexts of language use with the help of computers and other digital devices.
The University of Turku has a longstanding tradition in the production of text corpora, including corpora for dialects, standard language and Finnish as a second language (LAS2). The Corpus of Academic Finnish, a subproject of the larger Digilang project, aims to create an additional digital corpus composed of two subcorpora: the LAS1 corpus that consists of Master’s theses by native Finnish writers and another corpus that consists of research papers written in Finnish. The aim of these corpora is to offer a large collection of academic Finnish language texts that represent all fields of academic research for the use of teaching and research.
For further information about the project, please contact Professor Ilmari Ivaska.
Details about the resource
- Language: Finnish
- Form: written language
- Genre: theses
- Dataset size: 22,365 sentences, 317,282 words, 404,933 word tokens
- lemmatisation
- morphology
- syntax
The resource is manually annotated using the Syntax Archive's guidelines.
| Elisa Reunanen | project researcher |
| Markku Nikulin | project researcher |
| Kirsti Siitonen | the founder and a member of steering committee |
Contact persons
| Elisa Reunanen | etreun *at* utu.fi |
| Markku Nikulin | marnik *at* utu.fi |
| Nobufumi Inaba | ninaba *at* utu.fi |
Reference instructions
LAS1 = The Corpus of Academic Finnish. University of Turku, School of Languages and Translation Studies, Department of Finnish and Finno-Ugric Languages