Definition

The Text Encoding Initiative (TEI) is a standard for the representation of textual material in digital form through the means of text encoding. This standard is the collaborative product of a community of scholars, chiefly from the humanities, social sciences, and linguistics who are organized in the TEI Consortium.

TEI Structure

TEI makes use of XML as its governing metalanguage. This means that all TEI metadata are expressed as XML elements and thus comply with the World Wide Web Consortium XML Recommendation. Information (plain text) is contained in XML elements, delimited by start tags (e.g.: <TEI>) and end tags (e.g.: </TEI>). Additional information to these XML elements can be given in attributes, consisting of a name (e.g.: xml:lang) and a value (e.g.: de).

XML comments are delimited by start markers (<!–) and end markers (–>). A full TEI document consists of:

This common structure is mandatory for all TEI documents.
It is also customary to specify the TEI namespace http://www.tei-c.org/ns/1.0 on it, using the xmlns attribute. A namespace is an XML concept. Its function is to identify the vocabulary from which a group of element names are drawn, using a standard identifier resembling a web address. The namespace for all TEI elements is http://www.tei-c.org/ns/1.0

This basic structural pair is contained by a <TEI> element:

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <!---...-->
  </teiHeader>
  <text>
    <!--...-->
  </text>
</TEI>

TEI Header

The TEI header (<teiHeader>) is mandatory and contains descriptive meta-information about the document. The <teiHeader> minimally contains a description of the electronic file inside a (<fileDesc>). The latter element consists of three mandatory components:

  • the title statement (<titleStmt>), providing information about the title (<title>), author (<author>) and others responsible for the electronic text
  • the publication statement (<publicationStmt>), providing publication details about the electronic text in a structured way or as prose inside a paragraph (<p>)
  • a description of the source (<sourceDesc>), documenting bibliographic details about the electronic text’s material source (if any) in a structured way or in a prose paragraph (<p>)
<teiHeader>
  <fileDesc>
    <titleStmt>
      <title>The Strange Adventures of Dr. Burt Diddledygook: a machine-readable transcription</title>
      <respStmt>
        <resp>editor</resp>
        <name xml:id="JS">John Smith</name>
      </respStmt>
    </titleStmt>
    <publicationStmt>
      <p>Not for distribution.</p>
    </publicationStmt>
    <sourceDesc>
      <p>Transcribed from the diaries of the late Dr. Roy Offire.</p>
    </sourceDesc>
  </fileDesc>
</teiHeader>

TEI Text

The actual text (<text>) contains a single text of any kind. This commonly contains the actual text and other encodings. A text <text> minimally contains a text body (<body>). The body contains lower-level text structures like paragraphs (<p>), or different structures for text genres other than prose: lines for poetry, speeches for drama.

TEI Body

<text>
  <body>
    <p>For the first time in twenty-five years, Dr Burt Diddledygook decided not to turn up to the annual meeting of the Royal Academy of Whoopledywhaa (RAW). It was a sunny day in late September 1960 bang on noontime and Dr Burt was looking forward to a stroll in the park instead. He hoped his fellow members of theRAW weren't even going to notice his absence.</p>
  </body>
</text>

TEI Front

Next to the <body>, a text can optionally contain front matter which may be encoded with <front>. Clear examples are title pages, headers, prefaces, or dedications. With exception of the title page, for which the TEI defines specific elements, front matter should be encoded using the same elements as the rest of a text. This means that there are no dedicated elements to encode prefaces, dedications, abstracts, frontispieces etc. Instead, either numbered or un-numbered divisions <div> with an attribute @type are used to distinguish the different components of a <front>. The following suggested values for the @type attribute may be used for this purpose:

  • preface: a foreword or preface addressed to the reader
  • ack: a formal declaration of acknowledgement by the author
  • dedication: a formal offering or dedication of a text by the author
  • abstract: a summary of the content of a text as continuous prose
  • contents: a table of contents. A element should be used to mark its structure
  • frontispiece: a pictorial frontispiece, possibly including some text
<front>
  <div type="dedication">
    <p>In memory of Mary Smith.</p>
  </div>
  <div type="contents">
    <head>Table of Contents</head>
    <list>
      <item>I. The Decision</item>
      <item>II. The Fuss</item>
      <item>III. The Celebration</item>
    </list>
  </div>
</front>

TEI Back

All back matter to a text may be grouped within <back>. As is the case with <front>, either numbered or un-numbered divisions **<div> with a @type attribute are used to distinguish the different components. The following attribute values may be supplied for the @type in order to distinguish various kinds of division characteristic of back matter:

  • appendix: an appended self-contained section of a work, often providing additional information or text
  • glossary: contains a list of terms and their explanations
  • notes: a section in which textual or other kinds of notes are gathered together
  • bibliogr: contains a list of bibliographical citations
  • index: any form of index to the work
  • colophon: a statement appearing at the end of a book describing the conditions of its physical production
<back>
  <div type="colophon">
    <p>Typeset in Haselfoot 37 and Henry 8. Printed and bound by Stryer, Germany.</p>
  </div>
</back>

Full TEI <text> Example

<text>
  <front>
    <div type="dedication">
      <p>In memory of Mary Smith.</p>
    </div>
    <div type="contents">
      <head>Table of Contents</head>
      <list>
        <item>I. The Decision</item>
        <item>II. The Fuss</item>
        <item>III. The Celebration</item>
      </list>
    </div>
  </front>
  <body>
    <p>For the first time in twenty-five years, Dr Burt Diddledygook decided not to turn up to the annual meeting of the Royal Academy of Whoopledywhaa (RAW). It was a sunny day in late September 1960 bang on noontime and Dr Burt was looking forward to a stroll in the park instead. He hoped his fellow members of the RAW weren't even going to notice his absence.</p>
  </body>
  <back>
    <div type="colophon">
      <p>Typeset in Haselfoot 37 and Henry 8. Printed and bound by Stryer, Germany.</p>
    </div>
  </back>
</text>

TEI Composite Text

Apart from simple texts, TEI provides means to encode composite texts, either by grouping structurally related texts in a <group> element inside <text>, or treating them as a corpus of diverse texts, using <teiCorpus> as the outermost element.

Empty framework of a basic TEI document structure

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>
          <!--Title-->
        </title>
      </titleStmt>
      <publicationStmt>
        <p>
          <!--Publication Information-->
        </p>
      </publicationStmt>
      <sourceDesc>
        <p>
          <!--Information about the source-->
        </p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <!--Some structural division, paragraph, line group, speech, ...-->
    </body>
  </text>
</TEI>

Example of a basic TEI document structure

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>The Strange Adventures of Dr. Burt Diddledygook: a machine-readable transcription</title>
         <respStmt>
           <resp>editor</resp>
           <name xml:id="JS">John Smith</name>
        </respStmt>
      </titleStmt>
      <publicationStmt>
        <p>Not for distribution.</p>
      </publicationStmt>
      <sourceDesc>
        <p>Transcribed from the diaries of the late Dr Marky Mark.</p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <front>
      <div type="dedication">
        <p>In memory of Mary Smith.</p>
      </div>
      <div type="contents">
        <head>Table of Contents</head>
          <list>
            <item>I. The Decision</item>
            <item>II. The Fuss</item>
            <item>III. The Celebration</item>
          </list>
      </div>
    </front>
    <body>
      <p>For the first time in twenty-five years, Dr Marky Mark decided not to turn up to the annual meeting of the Royal Academy of Mounth (RAM). It was a sunny day in late September 1980 bang on noontime and Dr Mark was looking forward to a stroll in the park instead. He hoped his fellow members of the RAM weren't even going to notice his absence.</p>
    </body>
    <back>
      <div type="colophon">
        <p>Typeset in Haselfoot 37 and Henry 8. Printed and bound by Stryer, Germany.</p>
      </div>
    </back>
  </text>
</TEI>

TEI Guidelines

TEI Guidelines should:

TEI Schema

Minimal TEI customization file:


<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>A TBE customisation</title>
        <author>The TBE Crew</author>
      </titleStmt>
      <publicationStmt>
        <p>for use by whoever wants it</p>
      </publicationStmt>
      <sourceDesc>
        <p>created on Thursday 24th July 2008 10:20:17 AM by the form at http://www.tei-c.org/Roma/</p>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <front>
      <divGen type="toc"/>
    </front>
    <body>
      <p>My TEI Customization starts with modules tei, core, header, and textstructure</p>
      <schemaSpec ident="TBEcustom" docLang="en" xml:lang="en" prefix="">
        <moduleRef key="tei"/>
        <moduleRef key="header"/>
        <moduleRef key="core"/>
        <moduleRef key="textstructure"/>
      </schemaSpec>
    </body>
  </text>
</TEI>