FreCORE
Keywords: Internet language
This corpus is a sample of the French-language searchable Internet. The texts have been manually annotated by register. The annotation follows the taxonomy presented by Douglas Biber and Jesse Egbert (see Biber, D., & Egbert, J. (2018). Register Variation Online. Cambridge University Press.), which consists of 8 main registers and 33 subregisters that aim to cover all linguistic variation on the Internet.
The annotated texts have been split into the files train.tsv, dev.tsv and test.tsv in the folder data/FreCORE. In the TSV files each row has the register given to the text in the first column and the text itself in the second column. In total the corpus includes 1,818 annotated texts.
Details about the resource
- Language: French
- Form: written language
- Genre: Internet language
- Dataset size: 1,818 texts
register
Each text has been manually annotated with 1–2 registers.
| Veronika Laippala | |
| Jesse Egbert | |
| Douglas Biber | |
| Sampo Pyysalo | |
| Saara Hellström | |
| Anna Salmela | |
| Liina Repo | |
| Samuel Rönnqvist | |
| Miika Oinonen |
Contact person
| Veronika Laippala | mavela *at* utu.fi |