SweCORE
Keywords: Internet language
This corpus is a sample of the Swedish-language searchable Internet. The texts have been manually annotated by register. The annotation follows the taxonomy presented by Douglas Biber and Jesse Egbert (see Biber, D., & Egbert, J. (2018). Register Variation Online. Cambridge University Press.), which consists of 8 main registers and 33 subregisters that aim to cover all linguistic variation on the Internet.
The annotated texts have been split into the files train.tsv, dev.tsv and test.tsv in the folder data/SweCORE. In the TSV files each row has the register given to the text in the first column and the text itself in the second column. In total the corpus includes 2,182 annotated texts.
Details about the resource
- Language: Swedish
- Form: written language
- Genre: Internet language
- Dataset size: 2,182 texts
- register
Each text has been manually annotated with 1–2 registers
| Veronika Laippala | |
| Jesse Egbert | |
| Douglas Biber | |
| Sampo Pyysalo | |
| Saara Hellström | |
| Anna Salmela | |
| Liina Repo | |
| Samuel Rönnqvist | |
| Miika Oinonen |
Contact person
| Veronika Laippala | mavela *at* utu.fi |