The corpus is available for download and can be searched online.
Access to the visualizations is password-protected.
Please ask for credentials.
The corpus can be downloaded in several file formats:
.vrt is the default file format containing all available annotations.
The other file formats are provided as a convenience only and may be incomplete (they contain all tokens though).
The text metadata can be downloaded separately.
More information on the annotation of the corpus can be found on a separate page.
In the so-called vrt-format (vertical text format) annotations on the token level (positional attributes, e.g. word, pos, lemma) are represented in a one-word-per-line with TAB deliminated columns for each positional attribute. Annotations beyond token level (structural attributes, e.g. texts, sentences, pages) are represented as XML-tags with possible attribute-value pairs. Metadata, e.g., are encoded as attributes of the
<text>-element. Files in vrt-format can be imported into the Corpus Workbench or CQPweb.
33e50f29c2137a4152c6b996f83ee08f Royal_Society_Corpus_open_v6.0.4_corpus.tei.xml.zip 999fb9aefad9ea47be4e1b0cb9494632 Royal_Society_Corpus_open_v6.0.4_corpus.vrt.zip 40e02025587649e9f23d5e83760b9230 Royal_Society_Corpus_open_v6.0.4_meta.tsv.zip 55940b45ba6bb7f330f7aeba20234f15 Royal_Society_Corpus_open_v6.0.4_texts_tcf.zip 09f425cd79aa72f5ff4f2cd87af85061 Royal_Society_Corpus_open_v6.0.4_texts_tei.zip a38ea641fb2665db22a161bb1e9c97d4 Royal_Society_Corpus_open_v6.0.4_texts_txt.zip b1f16a23637b2a48228b88ad6beb982e Royal_Society_Corpus_open_v6.0.4_texts_vrt.zip
Each release of the RSC was assigned a PID.
The Royal Society Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
If you use the Royal Society Corpus in your research, please refer to:
Fischer, Stefan, Jörg Knappen, Katrin Menzel, and Elke Teich. 2020. “The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study.” In Proceedings of the 12th Language Resources and Evaluation Conference, 794–802. Marseille, France: European Language Resources Association. https://www.aclweb.org/anthology/2020.lrec-1.99.
Kermes, Hannah, Stefania Degaetano-Ortlieb, Ashraf Khamis, Jörg Knappen, and Elke Teich. 2016. “The Royal Society Corpus: From Uncharted Data to Corpus.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation, 1928–31. Portorož, Slovenia: European Language Resources Association. https://www.aclweb.org/anthology/L16-1305.