This is a Universal Dependency parsed version of the The Royal Society Corpus (RSC) 6.0 Open
In the preparation of the corpus, "good sentences" were extracted from RSC V6.0 Open, excluding sentences with the following features (a) beginning with a word in lower case and the sentence preceding them (incomplete), (b) sentences with less than 8 tokens (too short), (c) as well as sentences lacking a verb (verbless), (d) being in a language different from English.
The downloadable corpus has the following annotations
Persistent identifier http://hdl.handle.net/21.11119/0000-000A-A556-B
Krielke, Marie-Pauline, Luigi Talamo, Jörg Knappen, and Mahmoud Fawzi (2022). Tracing Syntactic Change in the Scientific Genre: Two Universal Dependency-parsed Diachronic Corpora of Scientific English and German. Conference: LREC 2022 At: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.514.pdf
The Royal Society Corpus UD Parsed is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The fully annotated corpus in a zip file containing the corpus as one vrt file (236MB). Checksums MD5: eb99c0e87cd81d3cd6512060916d6489, sha1: 3012437b13a03435c1b4b35620a6ab5561488dce.
Links to the corpus building tools
To the Hompage of the UdS CLARIN-D repository | Terms of use | Impressum