Doctoral Researcher, Digital Language Studies, Chinese, French, German, Italian, Spanish
Filosofian maisteri – Master of Arts
Digital Language Studies
Areas of expertise
corpus linguistics
computational linguistics
Late Modern English
register studies
I am a doctoral researcher in Digital Language Studies. My interests revolve around using computational methods with historical language.
In my research, I am interested in using automatic text classification methods to model noisy historical data. More specifically, I focus on predicting and modelling registers (text varieties) from large Late Modern English datasets with machine learning methods.
Other research projects I've participated in:
- Project researcher, Prosovar project (Digilang)
- Project researcher, Universal Parsebanks project (Digilang)
- Project assistant, Structuring Language Use Across Multilingual Web Corpora
Towards Automatic Register Classification in Unrestricted Databases of Historical English (2024)
(Vertaisarvioitu artikkeli kokoomateoksessa (A3))Intersecting Register and Genre: Understanding the Contents of Web-Crawled Corpora (2024)
4th International Conference on Natural Language Processing for Digital Humanities
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa)
Filosofian tohtoreiden urapolkupuheenvuorot: moniosaaminen voimavarana (2024)
Hiiskuttua: Turun yliopiston humanistisen tiedekunnan verkkolehti
(Artikkeli ammattilehdessä tai kirjoitus ammatillisessa blogissa (D1))
From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations (2024)
(A4 Vertaisarvioitu artikkeli konferenssijulkaisussa)
In search of founding era registers: automatic modeling of registers from the corpus of Founding Era American English (2023)
Digital Scholarship in the Humanities
(A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä )
Explainable Publication Year Prediction of Eighteenth Century Texts with the BERT Model (2022)
Workshop on Computational Approaches to Historical Language Change
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))
Beyond the English web: Zero-shot cross-lingual and lightweight monolingual classification of registers (2021)
European Chapter of the Association for Computational Linguistics
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))
From Web Crawl to Clean Register-Annotated Corpora (2020)
Web as Corpus Workshop
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))