a mandatory component of a TEI file for documenting a digital text object's metadata

The TEI Header (<teiHeader>) is a key component of a TEI file. It assists both humans (typically librarians or archivists) and computer systems in identifying important information about a TEI file as a digital textual object through the format of metadata.1 As a (data) summary of the entire TEI document, the header lives at the top of the TEI structure, preceding the encoded body of the work that lives under <text>:

Because the TEI Header must be responsive toward a widerange of potentially encodable document genres and project purposes, its metadata structure is inherently a complex component of the TEI file. And for those just getting started with the TEI, it can initially be difficult to digest and develop. Have no fear! The aim of this guide is to provide beginnning level practioners an entry point for working with the TEI Header, as well as for practitioners at all levels a demonstration of how TAPAS manages and reveals your header metadata.

"Front Matter" of a TEI File?

One way we might make initial sense of the form and purpose of the TEI Header (our individual purposes for it as well) is to compare it to the front matter of a printed text. While there are differences between these two types of metadata sections, the TEI Header performs something of a similar function. Consider the title page, dedication, table of contents, or copyright page of say a novel. These information pages may or may not have much to do with the narrative plot itself, but they do greatly inform how we access, store, share, and interpret the text. Each page or section serves to provide us with ostensibly transparent information of the work as a work: its history of publication, community of reception, material conditions of production, in some cases information on character, setting, and plot, and to some degree as well, authorial intentionality. To note, the TEI header is not front matter in the TEI structure, as front matter (<front>) is itself a possible and separate component of a TEI file, located within the <text> section that follows the <teiHeader>. Still, where front matter functions as an introduction to a material textual object, the TEI Header offers similarly an introduction to the TEI file as a digitized textual object. It is helpful to remember that the Header may include some data (bibliographic, generic) about the text that follows it, but it is primarly interested in recording information about the digital object itself and as whole.

> > Now let's review some basic teiHeader Components.

A sample Header structure

(you can analyze, download, or share this file from the TAPAS Commons)

Header Sections

To restate, we look to the TEI Header section for a record of a text's linguistic and formal characteristics, source information, publication standards, and persons or organizations involved in the creation of the encoded document. It is as well the first section of the TEI file structure and a required component of any valid TEI file.

The information provided within TEI Headers can be simple or complex depending on the document and its intended purposes. However, it includes a few required and suggested sections. Toggle a sample header file that includes all required and suggested sections as outlined by the TEI-C guidelines on headers.

The <teiHeader> contains the following 5 sections, where only <fileDesc>is required:

Jumpt to:

(1) <fileDesc>

(View Section Snippet)

Description |

<fileDesc> is a required section of the TEI Header. It records bibliographic information of the encoded document using the subSections listed below.

Notes |

For <fileDesc> to be valid, it must include at a minumum <titleStmt>, <publicationStmt>and <sourceDesc> subsections

subSections |

(1.1) <titleStmt> is a mandatory section of fileDesc. It groups elements for recording information about the title of a work and those responsible, at various levels, for the creation or editorial work of its content. This section includes:

(1.1.1) <title> declares the title of the TEI file. Note this may be different from the source title of the encoded work in <text> section. For example, if we were encoding Oscar Wilde's The Importance of Being Earnest, we might name our TEI file something like, "Oscar Wilde's The Importance of Being Earnest": a TEI edition."

To record the original source title (and other source text data), we can use <bibl> under <sourceDesc>. We would include the original source creator under <author> in next section below. We would record information on the TEI edition's creator/encoder (if different than source author) under <respStmt>.

(1.1.2) <author> declares the original source author (a person or organization) responsible for producing the work. We can repeat <author> for works where there is more than one primary creator. We can also nest <persName> in <author> to associate persons listed in personograpy data; similarly, use <orgName> for organizations listed in orgography data where an organization is responsible as creator.

(1.1.3) <sponser> declares any person(s)/group(s) responsible for providing intellectual sponsorship for the creation of the work

(1.1.4) <funder> declares any person(s)/group(s) responsible for providing money in the creation of the work

(1.1.5) <editor> declares any person(s) responsible for an editorial role in the case of document being part of an edition

(1.1.6) <principal> declares any person(s) responsible for principal research in the case where the document is the product of formal research

(1.1.7) <respStmt> groups elements for recording information on contributors involved in the creation of the TEI edition (e.g., those on an encoding team involved in the encoding of the TEI edition); or anyone not already named in above sections

(1.2) <editionStmt> is optional, but recommended by TEI-C guidelines. It is used to group information on a single edition of text, where there may exist other editions of the work elsewhere, with their own editionStmt's.

(1.2.1) <edition> should note whether it is a first edition, new edition, revised edition, etc., and note the date of released to mark distinction from other known editions. The @n attribute on edition can assist in declaring version number across multiple existing editions of the work: <edition n="1"> marks a first edition.

(1.2.2) <respStmt> records primary contributor(s) of edition; repeat this section where necessary to record other contributors to the edition.

(1.2.3) <name> declares person responsible for the changes made to this edition.

(1.2.4) <resp> describe role of edits

(1.3) <extent> is optional. It is used for documenting the size of the textual medium; in the case of the TEI document, the file size.

As a minimum structure, we can declare the file size between <extent> tags. We can supply byte size, or use terms to describe physical (disk, tape, etc.) or logical (pages, paragraphs, sentences, etc.) units. <p> tags are not used between parent tags:

<extent>12 paragraphs</extent>

Approximate declarations are acceptable if precise file size is not known: using such terms as "less than," "between," "over."

<extent>Less than 1 Mb</extent>

(1.3.1) <measure> with @unit and @quantity attributes declares machine friendly measurements (For more information on file size naming, see this reference):

<measure unit="MiB" quantity="1.5"> About 1.5 megabytes </measure>

(1.4) <publicationStmt> is required. It groups information concerning the publication or distribution of the digital TEI object. Alternatively, we use <sourceDesc> -- last section of <titleStmt> -- to record publication data of the TEI file's source text, where TEI edition is not original source, or "born digital."

This section must contain at least one of the following tags: <publisher>, <distributor>, or <authority>. (the code snippet includes all three.)

(1.4.1) <publisher> the person or ogranization responsible for the creation of TEI edition, and who maintains rights over its production and/or distribution. if "unknown," then "Publisher Unknown."

(1.4.2) <address> (repeat <addrLin> as needed) or <pubPlace>

(1.4.3) <date> may be used for both marking the publication date of the TEI document or following <availability> to mark availability statement date if different than publication date

(1.4.4) <distribution> identifies organization responsible for the distribution of the TEI edition

(1.4.5) <availability> statement clarifies a general copyright and fair use policy governing reuse of the TEI edition. May include just a simple paragraph statement and/or a <license> making use of @target attribute to link directly to license.

(1.5) <seriesStmt> (Optional) records the relation between separate documents understood as each part of a series (e.g., collected works: essays, lectures, articles, document volumes, etc.) by declaring its parent work/identity. This section accepts either a block of text wrapped in <p> or a the following section tags (not both)

(1.5.1) <title> declares the title of the series of which this document is considered to be a part.

(1.5.2) <respStmt> (containing expected <name> and <resp> tags) declares the name of person or entity responsible for series, e.g., editor or organization, and identified role.

(1.5.3) <biblScope> declares the bibliographic scope of the TEI document as included within the series, for example as a list of page numbers, or a named subdivision of a larger work. Using @unit, @from, @to, for example, we can state,

<biblScope unit="page" from="25" to="29">pp 25-29</biblScope>

(1.5.4) <idno> is an identifier tag in the TEI used to mark an ISSN, ISBN, or on idno.

<idno type="ISBN">978-1-234567-89-1</idno>

(1.6) <notesStmt> (Optional) compiles together notes about the encoded document that are not recorded elsewhere within the header.

As notes are somewhat open-ended, we suggest reviewing the Guidelines to assess what kinds of information you might put here based on your document and/or project purposes. The Guidelines offer a nicely detailed overview of the possible handlings of note section, with links to more explanation where possible, and help demonstrate how note data may be more appropriately placed outside note elsewhere in TEI header.

At a minimum, apply a container made up of a parent <notesStmt> containing <note>. You may repeat note as needed.

(1.7) <sourceDesc> (Required) records the source from which an electronic text was derived or generated (if true).

Typically this comes in the form of a bibliographic description using one of the following containers:

<bibl>, <biblStruct>, <listBibl>(review the differences between in the Guidelines). In the case where the TEI file is born digital, we can state "born digital" using <p>

(2) <encodingDesc>

(View Section Snippet)

Description |

<encodingDesc> is a recommended section of the TEI Header. It records information on the encoding practices used and relationship between encoded work and, where true, its source text; this section can range from a brief statement on encoding approach under <projectDesc> to a more detailed overview of editorial decisions and handling of source text under <editorialDesc>.

Notes |

For <encodingDesc> to be valid, it must include at a minumum 1 or more <p> tags, or for more detailed and structured encoding descriptions, one or more of the element sections discussed below:

subSections |

(2.1) <projectDesc> a prose description on purpose of TEI edition, project objectives, teams, etc., wrapped in <p>.

(2.2) <samplingDecl> a prose description about the selection process behind choosing text(s) or textual fragments that comprise the encoded document in the case where encoded document is of a source text. Often this will include information on sample size, methods, purpose, and more general discussion of textual source.

(2.3) <editorialDecl> a prose description of the editorial practices employed in the encoding of a document in which one may either use <p> or one or more of the following elements:

(2.3.1) <correction> prose description of corrections, if any, made to the source text in creation of TEI edition

(2.3.2) <normalization> prose description on any normalization decisions made on source text in creation of TEI edition

(2.3.3) <quotation> prose description on handling of quotation markers of source text in creation of TEI edition

(2.3.4) <hyphenation> prose description on handling hyphenation encoding of source text in creation of TEI edition

(2.3.5) <segmentation> prose description on segmentation of source text in creation of TEI edition

(2.3.6) <stdVals> prose description of approach to standardizing date and number values in TEI edition

(2.3.7) <interpretation> prose description of any interpretive information added in the creation of TEI edition

(2.3.8) <punctuation> prose description of handling of punctuation of source text in creation of TEI edition

(2.4) <tagsDecl> documents use of tags in TEI edition using,

(2.4.1) <rendition> renditional display of element in TEI edition, where @selector points to elements in <text> with applied rendition; @scheme is "css"

<rendition xm:id="medium" scheme="css" selector="front div">

a really great example can be found in guidelines here

(2.4.2) <namespace> declares namespace to which tag (<tagUsage>) belongs

(2.4.3) <tagUsage> details usage of tag; repeat element for multiple tag descriptions

(2.5) <styledefDecl> declares the name of supplied value of @scheme attribute on rendition:

<styledefDecl scheme="css" schemeVersion="2.1"/>
<rendition xm:id="medium" scheme="css" selector="front div">

(2.6) <refsDecl> documentation on how canonical references are formed for TEI edition

(see here)

(2.7) <classDecl> declares taxonomies for classification schemes in TEI edition

(see here)

(2.8) <geoDecl> documents notation used for geographic coordinates in TEI edition

(see here)

(2.9) <schemaSpec> declares a schema specification for including ODD file in header of TEI document

(see here)

(2.10) <schemaRef> to point to external ODD

(see here)

(3) <profileDesc>

(View Section Snippet)

Description |

<profileDesc> is a recommended section of the TEI Header. It provides a profile description of the encoded file, or as the guidelines state: "provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting."

Notes |

Available sections under <profileDesc> are listed below:

subSections |

This section is in progress. Here is a link to profileDesc in the TEI Guidelines. Post your questions to the TAPAS Header forum.

(3.1) <handNotes>

(3.2) <creation>

(3.3) <langUsage>

(3.4) <abstract>

(3.5) <textDesc>

(3.6) <settingDesc>

(3.7) <particDesc>

(4) <xenoData>

(View Section Snippet)

Description |

<xenoData> is a possible section of the TEI Header.

Notes |

For <xenoData> sections are listed below:

subSections |

This section is in progress. Here is a link to profileDesc in the TEI Guidelines. Post your questions to the TAPAS Header forum.


(5) <revisionDesc>

(View Section Snippet)

Description |

<revisionDesc> is a highly recommended section of the TEI Header.

Notes |

<revisionDesc> sections are listed below:

subSections |

This section is in progress. Here is a link to profileDesc in the TEI Guidelines. Post your questions to the TAPAS Header forum.


Here you can learn about the different ways TAPAS manages your TEI Header. This section will develop over time as we release new features of TAPAS that take advantage of TEI header metadata. If you have questions about TAPAS handling of TEI Header metadata, please let us know!

View of Metadata tab in TAPAS Reader

(View this file in the TAPAS Commons)