For corpus query and analysis the RSC is encoded in CQP format (cf. IMS Open Corpus Workbench (CWB)).

## CQPWeb

The corpus can be searched via the CQPweb server of the Department of Linguistics and Language Technology at Saarland University. Registration with your e-mail address is required but free.

Sample queries:

The CWB requires a simple XML as an input format. In the so-called vrt-format (vertical text format) annotations on the token level (positional attributes, e.g. word, pos, lemma) are represented in a one-word-per-line with TAB deliminated columns for each positional attribute. Annotations beyond token level (structural attributes, e.g. texts, sentences, pages) are represented as SGML-tags with possible attribute-value pairs. Metadata, e.g., are encoded as attributes of the <text>-element. More information on the annotation of the corpus can be found here

You can download the corpus as a compressed VRT file (419 MB), which can be imported into the Corpus Workbench or CQPweb.

$md5sum Royal_Society_Corpus_v2.0.2_final.zip 9dc54d20820a6507ac3a3957a24a5131 Royal_Society_Corpus_v2.0.2_final.zip You can download the OCR correction tools as archive of sed source files (15 kB). $ md5sum rsc-tool-2.0.tar.gz
b410be474c8df77ac952eb0ee1246bbc  rsc-tool-2.0.tar.gz