Hanna-Mari Kupari profile picture
Doctoral Researcher, Digital Language Studies, Chinese, French, German, Italian, Spanish
filosofian maisteri - Master of Arts
Medieval Latin with corpus linguistics methods


Arcanuminkuja 1

Areas of expertise

Medieval Latin
corpus linguistics
automatic morpho-syntactic parsing


I am a doctoral researcher in digital language studies at the University of Turku, funded by the Emil Aaltonen Foundation. In my work I combine medieval data with the state of the art in machine learning. I have a Master's degree in Classical Philology with a major in Latin. I am particularly interested in the study of grammar, quantitative methods and aspects of local history.

I am interested in science communication and have worked as an associate editor of the online journal Hiiskuttua.

For some years now I have been an active member of the Tohtoriverkosto society.


University of Tartu, Estonia

A practical workshop on automatic morpho-syntactic annotation of large language corpora using the Universal Dependencies framework, spring 2024. A five-session practical workshop for PhD students and staff on automatic parsing. Topics covered: theory, terminology, parsing tools, building your own treebank in practice.

Course github:



Digital resources course at the University of Tartu. Treebanks and automatic linguistic annotation for Classical Languages, spring 2024. One lecture for undergraduate students.

University of Turku, Finland

Digital Interaction Lecture Course, spring 2024. Using computer-assisted methods for parsing grammar. One lecture.

Corpus Linguistics and Language Technology for undergraduates, fall 2023. Five lectures. Topics covered: student project, ethics and large language models, named-entity recognition, sentiment analysis, automatic morpho-syntactic parsing, reprsenting language as vectors and supervised and unsupervised machine learning.

Linguistic landscapes course for undergraduates, spring 2023. One lecture 2023-03-15 with professor Marko Lamberg "Historiallisten kirjallisten lähteiden näkökulmia kielimaisemiin Turussa".


Modern methods for medieval texts

In my digital humanities doctoral dissertation I am researching the medieval apostolic penitentiary documents and the Registrum Ecclesiae Aboensis copybook with corpus linguistics methods. I explore language use and linguistic variation (i.e. register analysis) of Medieval Latin with metadata enriched and morpho-syntactically annotated corpora. My work promotes open-access research and I publish all my code, data and results along with my publications.

Member of TurkuNLP and TUCEMEMS research groups.


My work is made possible by the Emil Aaltonen säätiö -fund 2022 to 2024, Turku University Foundation travel grant 2023, University of Turku research grants 2022 and 2021, The Finnish Cultural Foundation Varsinais-Suomi Regional Fund grant 2021, Uskelan opintorahastosäätiö 2020. I have also received Turku University Foundation Villa Tammekann grants (Tartu, Estonia) 2023 and 2024. January 2024 I spent at the Finnish Institute in Rome working on my PhD and visited the penitentiary archive and libraries.


Sort by:

FinGPT: Large Generative Models for a Small Language (2023)

Conference on Empirical Methods in Natural Language Processing
Luukkonen Risto, Komulainen Ville, Luoma Jouni, Eskelinen Anni, Kanerva Jenna, Kupari Hanna-Mari, Ginter Filip, Laippala Veronika, Muennighoff Niklas, Piktus Aleksandra, Wang Thomas, Tazi Nouamane, Scao Le Teven, Wolf Thomas, Suominen Osma, Sairanen Samuli, Merioksa Mikko, Heinonen Jyrki, Vahtola Aija, Antao Samuel, Pyysalo Sampo
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))