Sampo Pyysalo profile picture
Sampo
Pyysalo
University Research Fellow, Data analytics

Areas of expertise

natural language processing
machine learning
scientific text mining

Biography

I am a researcher in the TurkuNLP group (https://turkunlp.org/) and Research Fellow at the Department of Computing, University of Turku. My work focuses on machine learning for natural language processing, with particular application domains including scientific text mining, Finnish language technology, and large language models.

After defending my PhD thesis in computer science at the University of Turku, I held researcher positions at the University of Tokyo, University of Manchester and University of Cambridge before returning to the University of Turku in 2019.

Teaching

My current teaching focuses on the natural language processing study module shared between the departments of Languages and Computing, with courses ranging from introductory to a course on deep learning for natural language processing.

Research

The primary focus of my research is on natural language processing using machine learning approaches, with recent emphasis on deep learning methods and large language models. I have been working on scientific text mining as an application area for nearly 20 years, with specific focus on the English biomedical literature, and have in recent years also addressed a variety of tasks in the processing of Finnish text as well as multi- and cross-lingual applications. My work covers the full range of natural language processing development from initial task design to the development of practical applications and organizing community challenges, including also running manual annotation efforts and developing annotation tools and machine learning methods for various natural language processing tasks.

Publications

Sort by:

FinGPT: Large Generative Models for a Small Language (2023)

Conference on Empirical Methods in Natural Language Processing
Luukkonen Risto, Komulainen Ville, Luoma Jouni, Eskelinen Anni, Kanerva Jenna, Kupari Hanna-Mari, Ginter Filip, Laippala Veronika, Muennighoff Niklas, Piktus Aleksandra, Wang Thomas, Tazi Nouamane, Scao Le Teven, Wolf Thomas, Suominen Osma, Sairanen Samuli, Merioksa Mikko, Heinonen Jyrki, Vahtola Aija, Antao Samuel, Pyysalo Sampo
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))

Scaling Data-Constrained Language Models (2023)

Conference on Neural Information Processing Systems, Advances in Neural Information Processing Systems
Muennighoff Niklas, Rush Alexander M., Barak Boaz, Le Scao Teven, Piktus Aleksandra, Tazi Nouamane, Pyysalo Sampo, Wolf Thomas, Raffel Colin A.
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))

Towards better structured and less noisy Web data: Oscar with Register annotations (2022)

International Conference on Computational Linguistics, International Conference on Computational Linguistics
Laippala Veronika, Salmela Anna, Rönnqvist Samuel, Aji Alham Fikri, Chang Li-Hsin, Dhifallah Asma, Goulart Larissa, Kortelainen Henna, Pàmies Marc, Prina Dutra Deise, Skantsi Valtteri, Sutawika Lingtang, Pyysalo Sampo
(Vertaisarvioitu artikkeli konferenssijulkaisussa (A4))