The corpus is available for download and can be searched online.


The corpus can be searched via the CQPweb server of the Department of Language Science and Technology at Saarland University. Registration with your e-mail address is required but free.

Sample queries:

Helpful links:


Access to the visualizations is password-protected.
Please ask for credentials.



The corpus can be downloaded in several file formats:

  • vertical text format (CWB/CQPweb) .vrt
  • plain text format .txt
  • TEI format (Text Encoding Initiative) .tei.xml
  • TCF format (WebLicht Text Corpus Format) .tcf.xml

.vrt is the default file format containing all available annotations.

The other file formats are provided as a convenience only and may be incomplete (they contain all tokens though).

The text metadata can be downloaded separately.

More information on the annotation of the corpus can be found on a separate page.

In the so-called vrt-format (vertical text format) annotations on the token level (positional attributes, e.g. word, pos, lemma) are represented in a one-word-per-line with TAB deliminated columns for each positional attribute. Annotations beyond token level (structural attributes, e.g. texts, sentences, pages) are represented as XML-tags with possible attribute-value pairs. Metadata, e.g., are encoded as attributes of the <text>-element. Files in vrt-format can be imported into the Corpus Workbench or CQPweb.


Checksums (md5sum):


Release History

  • v6.0.4 Open: Version 6.0.3 with additional topic annotation on texts
  • v6.0.3 Open: new long-term release
  • v4.0.1: more file formats (same data)
  • v4.0.0: new long-term release
  • v2.0.2: first long-term release

Persistent Identifier

Each release of the RSC was assigned a PID.


Creative Commons License

The Royal Society Corpus is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

If you use the Royal Society Corpus in your research, please refer to:

Fischer, Stefan, Jörg Knappen, Katrin Menzel, and Elke Teich. 2020. “The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study.” In Proceedings of the 12th Language Resources and Evaluation Conference, 794–802. Marseille, France: European Language Resources Association.

Kermes, Hannah, Stefania Degaetano-Ortlieb, Ashraf Khamis, Jörg Knappen, and Elke Teich. 2016. “The Royal Society Corpus: From Uncharted Data to Corpus.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation, 1928–31. Portorož, Slovenia: European Language Resources Association.

CLARIN-D German Research Foundation (DFG) German Federal Ministry of Education and Research