UTU-Digilang

UTU-Digilang is an infrastructure that brings together digital language resources developed at the University of Turku. UTU-Digilang makes these resources accessible to researchers, students and everyone interested in languages. The resources have been developed at the School of Languages and Translation Studies as well as the Department of Computing.

The University of Turku has a long history of developing digital language resources. These include text corpora and voice recordings focused on historical and newer language. UTU-Digilang connects them to new technologies such as resources detailing writing processes and text corpora that describe Internet language. Bringing together resources developed in different departments showcases the variety of linguistic research at the University of Turku and offers new viewpoints for research. 

UTU-Digilang resources

The table below presents an overview of UTU-Digilang resources. More information can be found on the pages linked to the names of each resource. The pages will open in new tabs.

If you want your resource added to UTU-Digilang, please fill the Webropol form.

NameAbbreviationLanguageKeywordsContent
A database of pronouns and animal characters in original and translated Finnish picture books for children  Finnishliterary language, fictionwritten language, numeric data
ArkiSyn Database of Finnish Conversational DiscourseArkisynFinnishspoken language, everyday conversationaudio, transcriptions
Diachronic Corpus of Literary Meadow Mari Meadow Mariliterary language, journalistic languagewritten language
Diachronic Corpus of Literary Mordvin Erzya, Mokshaliterary language, journalistic languagewritten language
Electronic Word Lists: Mari, Mordvin, Udmurt, Komi, Chuvash, Tatar Mari, Mordvin, Udmurt, Komi, Chuvash, Tatarword listsword lists
FinCOREFinCOREFinnishInternet languagewritten language
'Finland - Past and Present' Corpus (parallel texts) Finnish, Russian, Erzya, Moksha, Meadow Mari, Udmurt, Komiparallel textswritten language
Finnish Dialect Syntax Archive Finnishdialectsaudio, transcriptions
Finnish Internet Parsebank FinnishInternet languagewritten language
Finnish referative constructions and subordinate että clauses in the Bible translations 1548–2020, coded verses  Finnishliterary language, Old Literary Finnishwritten language
FreCOREFreCOREFrenchInternet languagewritten language
Linguistic Variation in the Province of Satakunta in the 21st CenturySapuFinnishdialects, spoken languageaudio, transcriptions
LOG: Post-editing Finnish Finnisheye tracking, keystroke logging, machine translation, post-editing, TransLoglogs
LOG: Writing English Englishkeystroke logging, ScriptLogwritten language, logs
LOG: Writing Finnish Finnishkeystroke logging, ScriptLogwritten language, logs
LOG: Writing French Frenchkeystroke logging, ScriptLoglogs
LOG: Writing German Germankeystroke logging, ScriptLogwritten language, logs
LOG: Writing Swedish Swedishkeystroke logging, ScriptLogwritten language, logs
MarKo Corpus (Mari texts)MarKoMeadow Mari, Hill Mariliterary language, journalistic language, academic languagewritten language
MokshEr CorpusMokshErErzya, Mokshaliterary language, journalistic languagewritten language
Mormula: Grammatically Annotated Mordvin TextsMormulaErzya, Mokshaliterary language, dialectswritten language
Multilingual writers’ writing processes: graph-theory based visualisation of formulaic sequences and fluency patternsKISUVIFinnish, English, French, German, Swedish, a few texts in Spanish, Estonian and Japanesekeystroke logging, learner language, university level, subsequent interviewwritten language, video, transcriptions
Namibian teachers' beliefs and practices English audio, transcriptions
New version of the digitized Dialect Atlas of Finnish by Lauri Kettunen Dialect Atlas of FinnishFinnishdialects, spoken language, speaker areasgeographic information, information about language variants by parish, polygons and coordinates of speaker areas
Register Classified OSCAROSCARArabic, English, Spanish, French, Hindi, Portuguese, Swahili, Urdu, ChineseInternet languagewritten language
SweCORESweCORESwedishInternet languagewritten language
The Advanced Finnish Learners' CorpusLAS2Finnishacademic language, longitudinal corpuswritten language
The Corpus of Academic FinnishLAS1Finnishacademic languagewritten language
The Corpus of Prosodic Variation of FinnishProsovarFinnishdialects, spoken language, prosody, elicited recording tasksaudio
The Finnish Language Recording Archive of UTU: Collection of Dialect RecordingsTYSKÄFinnishdialectsaudio
The Finnish Language Recording Archive of UTU: Culture history recordingsTYSKÄFinnish audio
The Finnish Language Recording Archive of UTU: Recordings of Turku ColloquialTYSKÄFinnishdialects, spoken languageaudio, transcriptions
The Morpho-Syntactic Database of Mikael Agricola's Works FinnishOld Literary Finnishwritten language
Turku Chuvash CorpusTuChCChuvashliterary language, journalistic language, academic languagewritten language
Turku Izhevsk Corpus Udmurtliterary language, journalistic languagewritten language
Turku Komi-Permyak CorpusTuKPCKomi-Permyakliterary language, journalistic language, academic languagewritten language
Turku Onchyko Corpus Meadow Mariliterary language, journalistic language, academic languagewritten language
Turku 'Pavlik Morozov' Corpus Russian, Finnish, Erzya, Moksha, Meadow Mari, Hill Mari, Udmurt, Komi-Permyak, Komi, Khanty, Mansi, Hungarian, Chuvash, Tatarparallel texts, literary languagewritten language
Turku Tatar CorpusTuTaCTatarliterary language, journalistic languagewritten language
Uralic Typological AtlasUraTypUralic languageslanguage typology, cross-linguistic comparisontypological question list, numerical data