The Old Bailey Corpus is a sociolinguistically, pragmatically and textually annotated corpus based on the Proceedings of the Old Bailey. These speech-related texts document Late Modern English as used in London’s Central Criminal Court. The Proceedings of the Old Bailey were published from 1674 to 1913 and constitute a large body of Late Modern English texts. The 2163 volumes contain almost 200,000 trials, totalling ca. 134 million words. Since the proceedings were taken down in shorthand by scribes in the courtroom, the verbatim passages are arguably as near as we can get to the spoken word of the period. The material thus offers the rare opportunity of analyzing spoken language in a period that has been neglected both with regard to the compilation of primary linguistic data and the description of the structure, variability, and change of English.

With 24.4 million spoken words the Old Bailey Corpus, version 2.0 is 10 million words larger than version 1.0. It consists of 637 selected Proceedings of the Old Bailey and contains speech-related texts from 1720 to 1913. The Old Bailey Corpus is one of the largest diachronic collections of spoken English with detailed utterance-level sociolinguistic annotation. Almost 200 years of spoken Late Modern English were tagged for the following sociobiographic, pragmatic and textual parameters:

Its detailed sociobiographical, pragmatic and textual annotation make the Old Bailey Corpus an ideal text collection for fine-tuned, multivariate studies, including historical sociolinguistic approaches. Because of its size, the Old Bailey Corpus is a valuable resource for the analysis of low-frequency features. There are two ways to access the Old Bailey Corpus:

For an overview of the corpus see the OBC Manual tab. For detailed background information on the Old Bailey and the publication history of the Proceedings consult the excellent Old Bailey Proceedings Online.

We are indebted to Robert Shoemaker (University of Sheffield), Tim Hitchcock (University of Sussex) and Sharon Howard, who kindly provided us with digitalized transcripts of the Proceedings. We gratefully acknowledge the support of the German Science Foundation (DFG, HU 884/6-1, HU 884/6-2), the German Federal Ministry of Education and Research, the German section of the Common Language Resources and Technology Infrastructure (CLARIN-D) and the CLARIN-D Service Centre of Saarland University in creating and hosting the Old Bailey Corpus.


Creative Commons License
All versions of the Old Bailey Corpus are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.