Syntax Analysis of a Billion Words for Research Purposes

13.05.2014

Veronika Laippala and Filip Ginter from the University of Turku held a keynote presentation called Internet into a Language Data – A Syntax Analysis of a Billion Words and then What? in the Finnish Conference of Linguistics.

​Researcher Jorma Laaksonen presented the CoBaSiL project in the Finnish Conference of Linguistics.

​The research aims at collecting the whole Finnish internet into a language data corpus, a collection of texts, that is freely available for researchers to use.

The corpus can be used, for example, to study the syntax of the language or the similarity between words.

– So far, we have analysed 115 million sentences, 1.5 billion words and 6 million web pages, and we aim at finishing the project in 2016, Ginter said.

New Means for Annotating Finnish Sign Language

The topics in the workshops on Friday included brand new research on Finnish Sign Language.

Jorma Laaksonen from Aalto University discussed the Content-based video analysis and annotation of Finnish Sign Language (CoBaSiL) project, which can be utilised in the linguistic research of sign language.

For instance, the signer’s mouth position, openness of eyes and head movements have been analysed in the study.

Academic Research Fellow Tommi Jantunen from the University of Jyväskylä has studied the duration of signs, when they begin and end, from the perspective of annotating sign language.

Traditionally, the duration of the sign has been considered consistent with the duration of the hand movement, but, increasingly, the duration of other structural features of sign language are being taken into account.

According to Jantunen, certain structural features, such as mouth position, handshape and orientation are not only ready before the hand movement, but also remain after it has come to an end.

The theme of the Finnish Conference of Linguistics held at the University of Turku 8.–10.5. was Language and Linguistics in a technological world.

The 41st conference presented versatile points of view into the latest linguistic research.

More information: http://www.utu.fi/fi/sivustot/ktp2014/Sivut/home.aspx
The web pages of Laippala and Ginter’s project: https://github.com/TurkuNLP/Finnish-dep-parser

Text: Aura Jaakkola

Created 13.05.2014 | Updated 13.05.2014