UTU-Digilang
Digital language resources and language technology tools
UTU-Digilang combines the University of Turku’s long tradition of digital language resource creation and upkeep with new technology and top-class language technology resources.
UTU-Digilang is an infrastructure that brings together digital language resources developed at the University of Turku. UTU-Digilang makes these resources accessible to researchers, students and everyone interested in languages. The resources have been developed at the School of Languages and Translation Studies as well as the Department of Computing.
The University of Turku has a long history of developing digital language resources. These include text corpora and voice recordings focused on historical and newer language. UTU-Digilang connects them to new technologies such as resources detailing writing processes and text corpora that describe Internet language. Bringing together resources developed in different departments showcases the variety of linguistic research at the University of Turku and offers new viewpoints for research.
UTU-Digilang resources
The table below presents an overview of UTU-Digilang resources. More information can be found on the pages linked to the names of each resource. The pages will open in new tabs.
If you want your resource added to UTU-Digilang, please fill the Webropol form.
| Name | Abbreviation | Language | Keywords | Content |
|---|---|---|---|---|
| A database of pronouns and animal characters in original and translated Finnish picture books for children | Finnish | literary language, fiction | written language, numeric data | |
| ArkiSyn Database of Finnish Conversational Discourse | Arkisyn | Finnish | spoken language, everyday conversation | audio, transcriptions |
| Diachronic Corpus of Literary Meadow Mari | Meadow Mari | literary language, journalistic language | written language | |
| Diachronic Corpus of Literary Mordvin | Erzya, Moksha | literary language, journalistic language | written language | |
| Electronic Word Lists: Mari, Mordvin, Udmurt, Komi, Chuvash, Tatar | Mari, Mordvin, Udmurt, Komi, Chuvash, Tatar | word lists | word lists | |
| FinCORE | FinCORE | Finnish | Internet language | written language |
| 'Finland - Past and Present' Corpus (parallel texts) | Finnish, Russian, Erzya, Moksha, Meadow Mari, Udmurt, Komi | parallel texts | written language | |
| Finnish Dialect Syntax Archive | Finnish | dialects | audio, transcriptions | |
| Finnish Internet Parsebank | Finnish | Internet language | written language | |
| Finnish referative constructions and subordinate että clauses in the Bible translations 1548–2020, coded verses | Finnish | literary language, Old Literary Finnish | written language | |
| FreCORE | FreCORE | French | Internet language | written language |
| Linguistic Variation in the Province of Satakunta in the 21st Century | Sapu | Finnish | dialects, spoken language | audio, transcriptions |
| LOG: Post-editing Finnish | Finnish | eye tracking, keystroke logging, machine translation, post-editing, TransLog | logs | |
| LOG: Writing English | English | keystroke logging, ScriptLog | written language, logs | |
| LOG: Writing Finnish | Finnish | keystroke logging, ScriptLog | written language, logs | |
| LOG: Writing French | French | keystroke logging, ScriptLog | logs | |
| LOG: Writing German | German | keystroke logging, ScriptLog | written language, logs | |
| LOG: Writing Swedish | Swedish | keystroke logging, ScriptLog | written language, logs | |
| MarKo Corpus (Mari texts) | MarKo | Meadow Mari, Hill Mari | literary language, journalistic language, academic language | written language |
| MokshEr Corpus | MokshEr | Erzya, Moksha | literary language, journalistic language | written language |
| Mormula: Grammatically Annotated Mordvin Texts | Mormula | Erzya, Moksha | literary language, dialects | written language |
| Multilingual writers’ writing processes: graph-theory based visualisation of formulaic sequences and fluency patterns | KISUVI | Finnish, English, French, German, Swedish, a few texts in Spanish, Estonian and Japanese | keystroke logging, learner language, university level, subsequent interview | written language, video, transcriptions |
| Namibian teachers' beliefs and practices | English | audio, transcriptions | ||
| New version of the digitized Dialect Atlas of Finnish by Lauri Kettunen | Dialect Atlas of Finnish | Finnish | dialects, spoken language, speaker areas | geographic information, information about language variants by parish, polygons and coordinates of speaker areas |
| Register Classified OSCAR | OSCAR | Arabic, English, Spanish, French, Hindi, Portuguese, Swahili, Urdu, Chinese | Internet language | written language |
| SweCORE | SweCORE | Swedish | Internet language | written language |
| The Advanced Finnish Learners' Corpus | LAS2 | Finnish | academic language, longitudinal corpus | written language |
| The Corpus of Academic Finnish | LAS1 | Finnish | academic language | written language |
| The Corpus of Prosodic Variation of Finnish | Prosovar | Finnish | dialects, spoken language, prosody, elicited recording tasks | audio |
| The Finnish Language Recording Archive of UTU: Collection of Dialect Recordings | TYSKÄ | Finnish | dialects | audio |
| The Finnish Language Recording Archive of UTU: Culture history recordings | TYSKÄ | Finnish | audio | |
| The Finnish Language Recording Archive of UTU: Recordings of Turku Colloquial | TYSKÄ | Finnish | dialects, spoken language | audio, transcriptions |
| The Morpho-Syntactic Database of Mikael Agricola's Works | Finnish | Old Literary Finnish | written language | |
| Turku Chuvash Corpus | TuChC | Chuvash | literary language, journalistic language, academic language | written language |
| Turku Izhevsk Corpus | Udmurt | literary language, journalistic language | written language | |
| Turku Komi-Permyak Corpus | TuKPC | Komi-Permyak | literary language, journalistic language, academic language | written language |
| Turku Onchyko Corpus | Meadow Mari | literary language, journalistic language, academic language | written language | |
| Turku 'Pavlik Morozov' Corpus | Russian, Finnish, Erzya, Moksha, Meadow Mari, Hill Mari, Udmurt, Komi-Permyak, Komi, Khanty, Mansi, Hungarian, Chuvash, Tatar | parallel texts, literary language | written language | |
| Turku Tatar Corpus | TuTaC | Tatar | literary language, journalistic language | written language |
| Uralic Typological Atlas | UraTyp | Uralic languages | language typology, cross-linguistic comparison | typological question list, numerical data |