The corpus is available for download and can be searched online.
The corpus can be searched via the CQPweb server of the Department of Language Science and Technology at Saarland University. Registration with your e-mail address is required but free.
Sample queries:
Helpful links:
Access to the visualizations is password-protected.
Please ask for credentials.
The corpus can be downloaded in several file formats:
.vrt
.txt
.tei.xml
.tcf.xml
.vrt
is the default file format containing all available annotations.
The other file formats are provided as a convenience only and may be incomplete (they contain all tokens though).
The text metadata can be downloaded separately.
More information on the annotation of the corpus can be found on a separate page.
In the so-called vrt-format (vertical text format) annotations on the token level (positional attributes, e.g. word, pos, lemma) are represented in a one-word-per-line with TAB deliminated columns for each positional attribute. Annotations beyond token level (structural attributes, e.g. texts, sentences, pages) are represented as XML-tags with possible attribute-value pairs. Metadata, e.g., are encoded as attributes of the <text>
-element. Files in vrt-format can be imported into the Corpus Workbench or CQPweb.
Checksums (md5sum
):
33e50f29c2137a4152c6b996f83ee08f Royal_Society_Corpus_open_v6.0.4_corpus.tei.xml.zip
999fb9aefad9ea47be4e1b0cb9494632 Royal_Society_Corpus_open_v6.0.4_corpus.vrt.zip
40e02025587649e9f23d5e83760b9230 Royal_Society_Corpus_open_v6.0.4_meta.tsv.zip
55940b45ba6bb7f330f7aeba20234f15 Royal_Society_Corpus_open_v6.0.4_texts_tcf.zip
09f425cd79aa72f5ff4f2cd87af85061 Royal_Society_Corpus_open_v6.0.4_texts_tei.zip
a38ea641fb2665db22a161bb1e9c97d4 Royal_Society_Corpus_open_v6.0.4_texts_txt.zip
b1f16a23637b2a48228b88ad6beb982e Royal_Society_Corpus_open_v6.0.4_texts_vrt.zip
Each release of the RSC was assigned a PID.
The Royal Society Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
If you use the Royal Society Corpus in your research, please refer to:
Fischer, Stefan, Jörg Knappen, Katrin Menzel, and Elke Teich. 2020. “The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study.” In Proceedings of the 12th Language Resources and Evaluation Conference, 794–802. Marseille, France: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.99.
Kermes, Hannah, Stefania Degaetano-Ortlieb, Ashraf Khamis, Jörg Knappen, and Elke Teich. 2016. “The Royal Society Corpus: From Uncharted Data to Corpus.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation, 1928–31. Portorož, Slovenia: European Language Resources Association. https://www.aclweb.org/anthology/L16-1305.