Royal Society Corpus Universal Dependency Parsed

Universit<C3><A4>t des Saarlandes SFB 1102 CLARIN B Centre

Description

This is a Universal Dependency parsed version of the The Royal Society Corpus (RSC) 6.0 Open

In the preparation of the corpus, "good sentences" were extracted from RSC V6.0 Open, excluding sentences with the following features (a) beginning with a word in lower case and the sentence preceding them (incomplete), (b) sentences with less than 8 tokens (too short), (c) as well as sentences lacking a verb (verbless), (d) being in a language different from English.

The downloadable corpus has the following annotations

Citation

Persistent identifier http://hdl.handle.net/21.11119/0000-000A-A556-B

Krielke, Marie-Pauline, Luigi Talamo, Jörg Knappen, and Mahmoud Fawzi (2022). Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German. Conference: LREC 2022 At: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.514.pdf

Licence

Creative Commons License

The Royal Society Corpus UD Parsed is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Download

The fully annotated corpus in a zip file containing the corpus as one vrt file (236MB). Checksums MD5: eb99c0e87cd81d3cd6512060916d6489, sha1: 3012437b13a03435c1b4b35620a6ab5561488dce.

Tools

Links to the corpus building tools

To the Hompage of the UdS CLARIN-D repository | Terms of use | Impressum