Finnish Internet Parsebank

Keywords: Internet language

An automatically syntax annotated sample of the Finnish-language Internet that contains approximately 3.7 billion tokens. 

Search tool for morpho-syntactic features: depsearch-depsearch.rahtiapp.fi/ds_demo/

Search tool for words and their context: http://bionlp-www.utu.fi/nse/

More information about the corpus: https://turkunlp.org/finnish_nlp.html#parsebank

Details about the resource

Content
  • Language: Finnish
  • Form: written language
  • Genre: Internet language
  • Dataset size: 3.7 billion words
Annotations
  • syntax
  • morphology
Authors
Juhani Luotolahti 
Jenna Kanerva 
Veronika Laippala 
Sampo Pyysalo 
Filip Ginter 
Availability

Contact person

Veronika Laippalamavela *at* utu.fi
Referring

Reference instructions

J. Luotolahti; J. Kanerva; V. Laippala; S. Pyysalo; F. Ginter. Towards Universal Web Parsebanks. Proceedings of the International Conference on Dependency Linguistics (Depling’15). 2015