IDS-Logo
[Logo: DeReKo]

Programmbereich Korpuslinguistik - Projekt Korpusausbau

I5: The IDS text model

Derivation from TEI P5 via ODD file

C. M. Sperberg-McQueen
31 December 2012
Modifications and Extensions by IDS 2012-present

Contents


This document provides a formal definition of I5, a derivation of the IDS/XCES vocabulary as a customization of TEI P5. It contains the various specGrp elements needed to specify a customization of TEI, together with accompanying prose explaining the logic of the customization.

IDS/XCES [IDS 2006] is a DTD for corpus materials developed at the Institut für deutsche Sprache in Mannheim. It is based on XCES, an XML version of the Corpus Encoding Standard (CES) [Ide 1998], [Ide/Bonhomme/Romary 2000], which in turn was based on version TEI P3 of the Text Encoding Initiative Guidelines [ACH/ACL/ALLC 1994].

The primary goal is to provide a definition of the IDS/XCES vocabulary on the basis of TEI P5 [TEI 2007], and not (via XCES and CES) on the basis of TEI P3. TEI P3 customization involved the preparation of DTD files in tightly prescribed forms containing declarations which overrode the default declarations for the entities, elements, and attributes concerned. TEI P5 customization involves the preparation of an ODD (for ‘one document does it all’) document which describes changes to the base TEI vocabulary using a specialized vocabulary defined in chapter 22 of TEI P5.

A secondary goal is to document the structure of the customization, specifying what is included without change from TEI P5, what is excluded, and what is changed. Some differences between TEI P3 and IDS/XCES originated with CES or XCES and others were introduced when IDS/XCES was adapted from XCES; since those have different significance for further development and maintenance of the vocabulary, those two sets of differences are distinguished here. Another secondary goal is to provide at least rudimentary documentation for all elements in the vocabulary.

The first section below describes the vocabulary's use of the required TEI modules; the next section describes use of optional modules. There follows a section describing elements added by IDS/XCES, and a driver section which gathers together all the ODD fragments included earlier in the document. A final section describes some conformance and design issues which may need attention.

The brief descriptions of elements in the TEI and CES/XCES vocabularies are taken from the documentation for those encoding schemes; thanks are due to the authors and publishers of that documentation. Descriptions are included in the appendix for elements suppressed from modules which are otherwise included, in order to simplify review of the design and consideration of possible changes.

1 Required TEI Modules

This section of this ODD file includes a number of TEI P5 modules; eventually it will also describe differences between the P5 version of the elements involved and the older IDS/XCES versions.

Spec fragment required-modules
Include spec fragment specgroup-tei.
Include spec fragment specgroup-core.
Include spec fragment specgroup-header.
Include spec fragment specgroup-textstructure.

This spec fragment is referred to by top-level schema fragment ids_v2a.

A note on notation: this document is not a description of the ODD file which generates the I5 version of the TEI P5 vocabulary; it is the ODD document. Blocks labeled “Spec fragment”, like the one just shown, are used to specify selections from and modifications to the TEI P5 vocabulary. As may be seen, such spec fragments may include cross references to other spec fragments elsewhere in the document, which are included by reference in the set of modifications; ODD documents are thus a specialized form of ‘literate programming’ as defined by the computer scientist Donald Knuth and used in the publication of his TeX and MetaFont programs. The literate-programming structure allows the formal specification of changes to be embedded in prose documentation intended to explain what is happening.

1.1 The tei module

The tei module is required for any TEI profile. The following specification fragment includes the tei module and makes appropriate modifications to it.
Spec fragment specgroup-tei
Include module tei.

Redefine one macro.

Include spec fragment specgroup-tei-redefinitions.
Spec fragment specgroup-tei-redefinitions

redefine macro.limitedContent.

<<macroSpec module = "tei" ident = "macro.limitedContent" mode = "change">>
Contains
(#PCDATA | %model.limitedPhrase; | %model.inter; | s)*
<</macroSpec>>

This spec fragment is referred to by specgroup-tei.

This spec fragment is referred to by required-modules.

1.2 The core module

The tei and core modules are required for any TEI profile. The following specification fragment includes the core module and makes appropriate modifications to it.
Spec fragment specgroup-core
Include module core.

Delete unneeded elements.

Include spec fragment specgroup-core-deletions.

Rename some elements.

Include spec fragment specgroup-core-renamings.

Redefine some elements.

Include spec fragment specgroup-core-redefinitions.

This spec fragment is referred to by required-modules.

Numerous elements in the TEI core module are suppressed; for short descriptions of these elements see the appendix.
Spec fragment specgroup-core-deletions

Suppression of unused elements in core module.

Drop element add.
Drop element binaryObject.
Drop element cb.
Drop element choice.
Drop element del.
Drop element divGen.
Drop element expan.
Drop element headItem.
Drop element headLabel.
Drop element index.
Drop element listBibl.
Drop element measureGrp.
Drop element meeting.
Drop element milestone.
Drop element postBox.
Drop element postCode.
Drop element rs.
Drop element said.
Drop element series.
Drop element sic.
Drop element soCalled.
Drop element street.
Drop element unclear.

This spec fragment is referred to by specgroup-core.

Some elements in the core modules are renamed in obvious ways. The teiCorpus element is renamed idsCorpus, and its content model is adjusted: idsCorpus contains a sequence of idsDoc elements, not idsText (~ TEI) elements, so the default content model is not appropriate.
Spec fragment specgroup-core-renamings

Renaming elements in core module:

Change element teiCorpus: rename as “idsCorpus”.
Change contents to
(idsHeader, idsDoc+)
Attributes
 
type
Type
an XSD string value
version
(redefine this attribute)
Type
an XSD string value
TEIform
Type
an XSD string value
Default value
teiCorpus.2

This spec fragment is referred to by specgroup-core.

The remainder of the elements in the core module are included in the IDS/XCES vocabulary; that remainder includes the elements listed below. (For the most part, these elements are also included in CES and XCES, but editor, gloss, lb, orig, and pb are not in XCES but are added back into the vocabulary by IDS/XCES.)

Some elements are included without change from TEI, at least in the sense that the same parameter entity or pattern names are used in the declarations. (The extension of some element classes in IDS/XCES does of course mean the effective content model is not actually the same. But we do not need to supply a different model in this ODD file.)

For other elements, IDS/XCES declares a content model which is a restriction of the content model in TEI P5. One simple way to move I5 closer to TEI P5 would be to drop these restrictions and use the TEI P5 declarations for these elements unchanged.

Finally, for some elements IDS/XCES declares a content model which extends or modifies the declarations in TEI P5. Sometimes the change consists merely in the addition of one or more attributes, or adding the element as a member of this or that class. In other cases the content model is rewritten.

The following specification fragment indicates which elements are changed from the TEI P5 declarations.
Spec fragment specgroup-core-redefinitions

Redefinition of elements in core module.

Include spec fragment add_abbr_to_token_class.
Include spec fragment redefine_analytic.
Include spec fragment add_date_to_token_class.
Include spec fragment redefine_gap.
Include spec fragment add_hi_to_basic_class.
Include spec fragment redefine_imprint.
Include spec fragment redefine_l_part_attribute.
Include spec fragment add_lb_to_ids.milestones.
Include spec fragment redefine_lg_part_attribute.
Include spec fragment redefine_list.
Include spec fragment redefine_monogr.
Include spec fragment add_name_to_token_class.
Include spec fragment add_num_to_token_class.
Include spec fragment add_attributes_to_orig.
Include spec fragment add_TEIform_to_pb.
Include spec fragment ptr_as_milestone.
Include spec fragment add_attributes_to_q.
Include spec fragment redefine_sp.
Include spec fragment add_term_to_token_class.
Include spec fragment add_time_to_token_class.
Include spec fragment redefine_corr.
Include spec fragment redefine_reg.
Include spec fragment redefine_foreign.
Include spec fragment redefine_respStmt.
Include spec fragment redefine_gloss.
Include spec fragment redefine_p.
Include spec fragment redefine_head.
Include spec fragment redefine_item.
Include spec fragment redefine_emph.
Include spec fragment redefine_desc.

This spec fragment is referred to by specgroup-core.

Where possible, the list below notes the nature of the changes made.
  • abbr (abbreviation): contains an abbreviation of any sort.
    Spec fragment add_abbr_to_token_class
    Change element abbr:
    Classes
    (add) model.token
    (add) model.basic
    Attributes
     
    expan
    (add this attribute)
    gives an expansion of the abbreviation.
    Example:
    <abbr expan = "Deutsche Volkspartei 
    (1918-1933)">DVP</abbr>
    Example:
    <abbr expan = "Unabhängige 
    Sozialdemokratische Partei">USPD</abbr>
    Example:
    <abbr expan = "Deutsche 
    Demokratische Partei (1918-1930)">DDP</abbr>
    Example:
    <abbr expan = 
    "Arbeiter-und-Bauern-Fakultät">ABF</abbr>
    Example:
    <abbr expan = "Deutsche 
    Akademie der Künste (Berlin/Ost)">DAK</abbr>
    Example:
    <abbr expan = "Deutsche
    Akademie der Wissenschaften
    (Berlin/Ost)">DAW</abbr>
    Example:
    <abbr expan = "Deutsches 
    Pädagogisches Zentralinstitut">DPZI</abbr>
    Example:
    <abbr expan = "Deutscher 
    Schriftstellerverband">DSV</abbr>

    This spec fragment is referred to by specgroup-core-redefinitions.

  • address: contains a postal address, for example of a publisher, an organization, or an individual. (Not present in samples.)
  • analytic (analytic level): contains bibliographic elements describing an item (e.g. an article or poem) published within a monograph or journal and not as an independent publication. IDS/XCES modifies the content model to fit the three-level structure of IDS corpora.
    Spec fragment redefine_analytic
    Change element analytic:
    Change contents to
    (h.title+, (h.author | editor)*, (biblScope | biblNote)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)*)

    This spec fragment is referred to by specgroup-core-redefinitions.

    CES restricts the TEI content model and renames some elements; IDS/XCES extends the CES definition. The model given here is the same as for monogr.
  • author: in a bibliographic reference, contains the name(s) of the author(s), personal or corporate, of a work; for example in the same form as that provided by a recognized bibliographic name authority.
  • bibl (bibliographic citation): contains a loosely-structured bibliographic citation of which the sub-components may or may not be explicitly tagged.
  • biblScope (scope of citation): defines the scope of a bibliographic reference, for example as a list of page numbers, or a named subdivision of a larger work.
  • biblStruct (structured bibliographic citation): contains a structured bibliographic citation, in which only bibliographic sub-elements appear and in a specified order.
  • corr (correction): contains the correct form of a passage apparently erroneous in the copy text. IDS-XCES adds the attribute @sic (which gives the original form) as e.g. occurring in hi1bb.xces.
    Spec fragment redefine_corr
    Change element corr:
    Attributes
     
    sic
    (add this attribute)
    from CES Dokumentation: "gives the original form"
    Example:
    ToDo

    This spec fragment is referred to by specgroup-core-redefinitions.

  • date: contains a date in any format. CES declares this element as a member of the token class.
    Spec fragment add_date_to_token_class
    Change element date:
    Classes
    (add) model.token

    This spec fragment is referred to by specgroup-core-redefinitions.

  • item: adding att.typed for wiki talk HLU 2020-01-24 : Note that in the current TEI, desc has already @type
    Spec fragment redefine_desc
    Change element desc:
    Classes
    (add) att.typed
    (add) model.descLike

    This spec fragment is referred to by specgroup-core-redefinitions.

  • distinct: identifies any word or phrase which is regarded as linguistically distinct, for example as archaic, technical, dialectal, non-preferred, etc., or as forming part of a sublanguage.
  • editor: secondary statement of responsibility for a bibliographic item, for example the name of an individual, institution or organization, (or of several such) acting as editor, compiler, translator, etc.
  • item: adding att.typed for wiki talk HLU 2020-01-24
    Spec fragment redefine_emph
    Change element emph:
    Classes
    (add) att.typed

    This spec fragment is referred to by specgroup-core-redefinitions.

  • foreign: identifies a word or phrase as belonging to some language other than that of the surrounding text. IDS-XCES allows it to contain q, e.g. in loz-div-pub.xces
    Spec fragment redefine_foreign
    Change element foreign:
    Change contents to
    (#PCDATA | %model.phrase; | %model.global; | q)*

    This spec fragment is referred to by specgroup-core-redefinitions.

  • gap: indicates a point where material has been omitted in a transcription, whether for editorial reasons described in the TEI header, as part of sampling practice, or because the material is illegible, invisible, or inaudible. (In TEI P5, the description of the gap has moved from a desc attribute to a desc child; we revert this change for compatibility with existing data.)
    Spec fragment redefine_gap
    Change element gap:
    Change contents to
    Empty.
    Attributes
     
    desc
    (add this attribute)
    gives a description of the omitted material.
    Example:
    <p>
      <s type="manual">die anspruchsvolle Alternative 
        für Leichtraucher.</s>
      <s type="manual">Astor mild im Rauch 
        nikotinarm<gap desc="Zigarettenschachtel" 
        reason="omitted"/>.</s>
      <s type="manual">zum Anbieten und Verschenken 
        Astor mild-Kassette 48 Cigaretten DM 6,- 
        20 Astor mild DM 2,50.</s>
        <!--* ... *-->
    </p>
    Example:
    <s>Weit treffender haben aber jene 
    (die Griechen) die unteilbare Subsistenz 
    einer vernünftigen Natur mit dem Wort 
    'o'<gap desc="GREEKSMALLLETTERSTIGMA" 
            reason="omitted"/>benannt« 
    (Boethius, zitiert nach 
    Brasser 1999: 52).</s>

    This spec fragment is referred to by specgroup-core-redefinitions.

  • gloss: identifies a phrase or word used to provide a gloss or definition for some other word or phrase.
    Spec fragment redefine_gloss
    Change element gloss:
    Attributes
     
    target
    (change this attribute)
    the target of the pointer
    Type
    an XSD IDREF value

    The original TEI target attribute of the class att.pointing comes out as of type CDATA from the ODD2DTD. (According to the TEI guidelines the type is 'datapointer' which stands for a single URI, which btw. correctly caused it to come out as xs:anyURI in the generated i5.xsd). But in the ids-xces.dtd it was IDREF for target at gloss. Hence I include it here with an explicit specification of the IDREF type.

    This spec fragment is referred to by specgroup-core-redefinitions.

  • head (heading): contains any type of heading, for example the title of a section, or the heading of a list, glossary, manuscript description, etc. Changed 2020-01-21 such that it can contain sigend, for wiki talk.
    Spec fragment redefine_head
    Change element head:
    Change contents to
    (#PCDATA | %model.gLike; | %model.phrase; | %model.inter; | %model.global; | lg | %model.lLike; | %model.floatP.cmc;)*

    This spec fragment is referred to by specgroup-core-redefinitions.

  • hi (highlighted): marks a word or phrase as graphically distinct from the surrounding text, for reasons concerning which no claim is made.
    Spec fragment add_hi_to_basic_class
    Change element hi:
    Classes
    (add) model.basic
    Change contents to
    (#PCDATA | %model.gLike; | %model.phrase; | %model.inter; | %model.global; | lg | %model.lLike; | %model.floatP.cmc;)*

    This spec fragment is referred to by specgroup-core-redefinitions.

  • imprint: groups information relating to the publication or distribution of a bibliographic item. CES redefines this to use the pubDate element instead of date.
    Spec fragment redefine_imprint
    Change element imprint:
    Change contents to
    (pubPlace | publisher | pubDate)*

    This spec fragment is referred to by specgroup-core-redefinitions.

  • item: contains one component of a list. . Redefined to contain signed as well for wiki talk HLU 2020-01-02.
    Spec fragment redefine_item
    Change element item:
    Change contents to
    (#PCDATA | %model.gLike; | %model.phrase; | %model.inter; | %model.divPart; | %model.global; | %model.floatP.cmc;)*

    This spec fragment is referred to by specgroup-core-redefinitions.

  • l (verse line): contains a single, possibly incomplete, line of verse. CES redefined the meaning and values of the part attribute.
    Spec fragment redefine_l_part_attribute
    Change element l:
    Attributes
     
    part
    (redefine this attribute)
    indicates whether the verse line is metrically complete.
    Values
    y
    the line is metrically complete.
    n
    the line is metrically incomplete.
    u
    metricality is not known or inapplicable.

    Given the attribute name part, the value y might seem intuitively to mean “Yes, this is a partial line, not a full line,” but the CES documentation glosses y and n as shown.

    The attribute appears not to be actively used in any IDS samples, in any case; all values given are the default u.

    This spec fragment is referred to by specgroup-core-redefinitions.

  • label: contains the label associated with an item in a list; in glossaries, marks the term being defined.
  • lb (line break): marks the start of a new (typographic) line in some edition or version of a text. IDS-XCES adds @TEIform to the attribute list as used in fsp.xces and gr1.xces.
    Spec fragment add_lb_to_ids.milestones
    Change element lb:
    Classes
    (add) model.ids.milestones
    Attributes
     
    TEIform
    (add this attribute)
    TEIform
    Default value
    pb

    This spec fragment is referred to by specgroup-core-redefinitions.

  • lg (line group): contains a group of verse lines functioning as a formal unit, e.g. a stanza, refrain, verse paragraph, etc.
    Spec fragment redefine_lg_part_attribute
    Change element lg:
    Attributes
     
    part
    (redefine this attribute)
    indicates whether the verse line group is metrically complete.
    Values
    y
    the line is metrically complete.
    n
    the line is metrically incomplete.
    u
    metricality is not known or inapplicable.

    The attribute appears not to be actively used in any IDS samples; all values given are the default u.

    This spec fragment is referred to by specgroup-core-redefinitions.

  • list: contains any sequence of items organized as a list. IDS/XCES allows milestones and xptr elements among the children.
    Spec fragment redefine_list
    Change element list:
    Change contents to
    (head?, (item | (label, %model.ids.milestones;*, item) | %model.ids.milestones;)*)

    This spec fragment is referred to by specgroup-core-redefinitions.

  • measure: contains a word or phrase referring to some quantity of an object or commodity, usually comprising a number, a unit, and a commodity name.
  • mentioned: marks words or phrases mentioned, not used. (Not present in samples.)
  • monogr (monographic level): contains bibliographic elements describing an item (e.g. a book or journal) published as an independent item (i.e. as a separate physical object). CES restricts the TEI content model and renames some elements; IDS/XCES extends the CES definition. The model given here is the same as for analytic.
    Spec fragment redefine_monogr
    Change element monogr:
    Change contents to
    (h.title+, (h.author | editor)*, (biblScope | biblNote)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)*)

    This spec fragment is referred to by specgroup-core-redefinitions.

  • name (name, proper noun): contains a proper noun or noun phrase.
    Spec fragment add_name_to_token_class
    Change element name:
    Classes
    (add) model.token

    This spec fragment is referred to by specgroup-core-redefinitions.

  • note: contains a note or annotation.
  • num (number): contains a number, written in any form.
    Spec fragment add_num_to_token_class
    Change element num:
    Classes
    (add) model.token
    (add) model.basic

    This spec fragment is referred to by specgroup-core-redefinitions.

  • orig (original form): contains a reading which is marked as following the original, rather than being normalized or corrected. The reg attribute was dropped in TEI P5 and must be restored. CES also adds a regalt attribute which must be defined.
    Spec fragment add_attributes_to_orig
    Change element orig:
    Attributes
     
    reg
    (add this attribute)
    gives a regularized (normalized) form of the text.
    regalt
    (add this attribute)
    gives an alternate form of the regularized (normalized) text.
    Example:
    <s type="manual">der warnt - 
    wider die eiserne Regel des 
    <orig reg="Wahlkampfes" 
          regalt="Wahlkrampfes">Wahlk(r)ampfes</orig>, 
    einen Gegner durch Nichtnennung zu strafen - 
    davor, PDS zu wählen.</s>

    This spec fragment is referred to by specgroup-core-redefinitions.

  • p (paragraph): marks paragraphs in prose.
    Spec fragment redefine_p
    Change element p:
     
    (paragraph) marks paragraphs in prose. In the case of CMC documents, notably Wiki talk pages, it is necessary that signed may also appear inside paragraphs. In a Wiki talk page, users insert their signature as part of the paragraph. The only change to the original content model of p is that signed is additionally allowed inside p.
    Change contents to
    (#PCDATA | %model.gLike; | %model.phrase; | %model.inter; | %model.global; | lg | %model.lLike; | %model.floatP.cmc;)*
    Example:

    Usenet news message

    <<egXML>>
    <<p>>Wer die ruhrtour Mailingliste noch nicht kennt, der schaut bitte weiter unten nach!<</p>>
    <</egXML>>

    This spec fragment is referred to by specgroup-core-redefinitions.

  • pb (page break): marks the boundary between one page of a text and the next in a standard reference system.
    Spec fragment add_TEIform_to_pb
    Change element pb:
    Classes
    (add) model.ids.milestones
    Attributes
     
    TEIform
    (add this attribute)
    Default value
    pb

    This spec fragment is referred to by specgroup-core-redefinitions.

  • ptr (pointer): defines a pointer to another location. IDS/XCES adds this element to the ids.milestones class.
    Spec fragment ptr_as_milestone
    Change element ptr:
    Classes
    (add) att.text
    (drop) att.global
    (add) model.ids.milestones
    Attributes
     
    target
    (change this attribute)
    the target of the pointer
    Type
    an XSD IDREFS value

    The original TEI target attribute of the class att.pointing comes out as of type CDATA from the ODD2DTD. (According to the TEI guidelines the type is 'datapointer' which stands for a single URI, which btw. correctly caused it to come out as xs:anyURI in the generated i5.xsd). But in the ids-xces.dtd it was IDREFS. Hence I include it here with an explicit specification of the IDREFS type.

    This spec fragment is referred to by specgroup-core-redefinitions.

  • pubPlace (publication place): contains the name of the place where a bibliographic item was published.
  • publisher: provides the name of the organization responsible for the publication or distribution of a bibliographic item.
  • q (separated from the surrounding text with quotation marks): contains material which is marked as (ostensibly) being somehow different than the surrounding text, for any one of a variety of reasons including, but not limited to: direct speech or thought, technical terms or jargon, authorial distance, quotations from elsewhere, and passages that are mentioned but not used. CES adds several attributes.
    Spec fragment add_attributes_to_q
    Change element q:
    Attributes
     
    next
    (change this attribute)
    points to the next element of a virtual aggregate of which the current element is part. Specifically, for q elements, gives the ID of a subsequent q element which contains a continuation of the same quotation.
    Type
    an XSD IDREF value

    In TEI P5, this attribute is URI-valued; in P3 (and IDS/XCES), it is IDREF-valued.

    prev
    (change this attribute)
    points to the previous element of a virtual aggregate of which the current element is part. Specifically, for q elements, gives the ID of a preceding q element which contains the immediately preceding portion of the same quotation.
    Type
    an XSD IDREF value

    In TEI P5, this attribute is URI-valued; in P3 (and IDS/XCES), it is IDREF-valued.

    direct
    may be used to indicate whether the quoted matter is regarded as direct or indirect speech.
    Values
    y
    speech or thought is represented directly.
    n
    speech or thought is represented indirectly, e.g. by use of a marked verbal aspect.
    unspecified
    no claim is made.
    broken
    indicates whether this quotation or piece of dialog is broken between two or more q elements (linked using the next and prev attributes).
    Values
    y
    quotation is broken across two or more elements.
    n
    quotation is not broken across multiple elements.
    unspecified
    no claim is made.

    This spec fragment is referred to by specgroup-core-redefinitions.

  • quote (quotation): contains a phrase or passage attributed by the narrator or author to some agency external to the text.
  • ref (reference): defines a reference to another location, possibly modified by additional text or comment. IDS-XCES adds @orig.
  • reg (regularization): contains a reading which has been regularized or normalized in some sense.
    Spec fragment redefine_reg
    Change element reg:
    Attributes
     
    orig
    (add this attribute)
    for the original
    Example:
    ToDo

    This spec fragment is referred to by specgroup-core-redefinitions.

  • respStmt (statement of responsibility): supplies a statement of responsibility for the intellectual content of a text, edition, recording, or series, where the specialized elements for authors, editors, etc. do not suffice or do not apply.
    Spec fragment redefine_respStmt
    Change element respStmt:
    Change contents to
    (resp*, %model.nameLike.agent;+)

    This spec fragment is referred to by specgroup-core-redefinitions.

  • sp (speech): An individual speech in a performance text, or a passage presented as such in a prose or verse text. CES restricts this content model severely; IDS/XCES brings it back closer to the TEI form, and adds the class of IDS milestones to the legal content.
    Spec fragment redefine_sp
    Change element sp:
    Change contents to
    (speaker | p | quote | poem | stage | %model.ids.milestones;)*
    Attributes
     
    role
    (add this attribute)
    Role in plenary debate such as presidency or ordinary mp
    Type
    #PCDATA
    name
    (add this attribute)
    Name of speaker
    Type
    #PCDATA
    parliamentary_group
    (add this attribute)
    Parliamentary group (German "Fraktion") of speaker
    Type
    #PCDATA
    party
    (add this attribute)
    Partyof speaker
    Type
    #PCDATA
    Example:
    <sp who="Korzcak">
      <speaker/>
      <p>
        <s>Sie irren sich,
          <stage>erwiderte Korczak,</stage>
          nicht jeder ist ein Schuft<stage>, 
          und er schlug die Waggontür 
          hinter sich zu</stage>.</s>
      </p>
    </sp>

    This spec fragment is referred to by specgroup-core-redefinitions.

  • speaker: A specialized form of heading or label, giving the name of one or more speakers in a dramatic text or fragment.
  • stage (stage direction): contains any kind of stage direction within a dramatic text or fragment.
  • term: contains a single-word, multi-word, or symbolic designation which is regarded as a technical term.
    Spec fragment add_term_to_token_class
    Change element term:
    Classes
    (add) model.token

    This spec fragment is referred to by specgroup-core-redefinitions.

  • time: contains a phrase defining a time of day in any format.
    Spec fragment add_time_to_token_class
    Change element time:
    Classes
    (add) model.token

    This spec fragment is referred to by specgroup-core-redefinitions.

  • title: contains a title for any kind of work.

1.3 The header module

The header module is also essential. We include it here:
Spec fragment specgroup-header
Include module header.
Include spec fragment specgroup-header-renamings.
Include spec fragment specgroup-header-deletions.
Include spec fragment specgroup-header-redefinitions.

This spec fragment is referred to by required-modules.

The teiHeader element is renamed to idsHeader, and we add some attributes (the status attribute was in TEI P3 but seems to have disappeared from P5):
Spec fragment specgroup-header-renamings

Renaming teiHeader as idsHeader ...

Change element teiHeader: rename as “idsHeader”.
Attributes
 
pattern
(add this attribute)
status
(add this attribute)
Default value
new
Values
new
update
version
(add this attribute)
TEIform
(add this attribute)
Default value
teiHeader

This spec fragment is referred to by specgroup-header.

Some elements in the TEI header module are suppressed; for short descriptions see the appendix.
Spec fragment specgroup-header-deletions

Deleting unused elements in header module ...

Drop element appInfo.
Drop element application.
Drop element authority.
Drop element cRefPattern.
Drop element geoDecl.
Drop element handNote.
Drop element interpretation.
Drop element namespace.
Drop element notesStmt.
Drop element principal.
Drop element refState.
Drop element rendition.
Drop element scriptNote.
Drop element seriesStmt.
Drop element sponsor.
Drop element stdVals.
Drop element typeNote.

This spec fragment is referred to by specgroup-header.

The IDS/XCES vocabulary includes the elements listed below from the TEI header module, sometimes with content models which extend or otherwise modify the content models of TEI P5 in such a way that instances of the revised element type are not valid against the unmodified TEI P5 schema, and sometimes without change. In several cases, CES changes a content model from requiring a sequence of paragraphs to allowing just character data. In other cases, specialized child elements are added to the content model.
Spec fragment specgroup-header-redefinitions

Modifying some elements and classes in header module ...

Include spec fragment modify_declarable_class.
Include spec fragment redefine_availability.
Include spec fragment redefine_creation.
Include spec fragment redefine_correction_as_PCDATA.
Include spec fragment redefine_edition.
Include spec fragment redefine_editionStmt_as_PCDATA.
Include spec fragment redefine_editorialDecl.
Include spec fragment redefine_language.
Include spec fragment redefine_normalization_as_PCDATA.
Include spec fragment redefine_projectDesc_as_PCDATA_or_p.
Include spec fragment redefine_publicationStmt.
Include spec fragment redefine_quotation_as_PCDATA.
Include spec fragment redefine_samplingDecl_as_PCDATA.
Include spec fragment redefine_segmentation_as_PCDATA.
Include spec fragment redefine_sourceDesc.
Include spec fragment redefine_tagsDecl.
Include spec fragment redefine_taxonomy.
Include spec fragment redefine_textClass.
Include spec fragment redefine_titleStmt.
Include spec fragment redefine_encodingDesc.
Include spec fragment redefine_catRef.
Include spec fragment define_xenoData.
Include spec fragment define_meta.

This spec fragment is referred to by specgroup-header.

In the attribute class declarable, CES renames one attribute from default to Default, and changes its values from yes and no to y and n.
Spec fragment modify_declarable_class
Change class att.declarable.
Attributes
 
default
(delete this attribute)
Default
(add this attribute)
Default value
n
Values
y
n

This spec fragment is referred to by specgroup-header-redefinitions.

The elements included from the header module are these.
  • availability: supplies information about the availability of a text any restrictions on its use or distribution, its copyright status, etc.
    Spec fragment redefine_availability
    Change element availability:
    Change contents to
    ((#PCDATA | %model.availabilityPart; | %model.pLike;)*)
    Attributes
     
    region
    (add this attribute)
    Default value
    world
    label
    (add this attribute)
    Values
    (closed list)
    QAO-NC
    QAO-NC-LOC:ids
    QAO-NC-LOC:ids-NU:1
    ACA-NC
    ACA-NC-LC
    CC-BY-SA
    Example:

    An example:

    <availability region = "world" status = "unknown"/>

    This spec fragment is referred to by specgroup-header-redefinitions.

The IDS/XCES vocabulary includes the following elements from the TEI header module with content models which restrict the content models of TEI P5.
  • biblFull (fully-structured bibliographic citation): contains a fully-structured bibliographic citation, in which all components of the TEI file description are present.
  • catDesc (category description): describes some category within a taxonomy or text typology, either in the form of a brief prose description or in terms of the situational parameters used by the TEI formal textDesc.
  • catRef (category reference): specifies one or more defined categories within some taxonomy or text typology.
    Spec fragment redefine_catRef
    Change element catRef:
    Attributes
     
    target
    (change this attribute)
    the target of the pointer
    Type
    an XSD IDREFS value

    The original TEI target attribute of the class att.pointing comes out as of type CDATA from the ODD2DTD. (According to the TEI guidelines the type is 'datapointer' which stands for a single URI, which btw. correctly caused it to come out as xs:anyURI in the generated i5.xsd). But in the ids-xces.dtd it was IDREFS. Hence I include it here with an explicit specification of the IDREFS type.

    This spec fragment is referred to by specgroup-header-redefinitions.

  • category: contains an individual descriptive category, possibly nested within a superordinate category, within a user-defined taxonomy.
  • change: summarizes a particular change or correction made to a particular version of an electronic text which is shared between several researchers.
  • classCode
  • classDecl (classification declarations): contains one or more taxonomies defining any classificatory codes used elsewhere in the text.
  • correction (correction principles): states how and under what circumstances corrections have been made in the text.
    Spec fragment redefine_correction_as_PCDATA
    Change element correction:
    Change contents to
    character data

    This spec fragment is referred to by specgroup-header-redefinitions.

  • creation: contains information about the creation of a text.
    Spec fragment redefine_creation
    Change element creation:
    Change contents to
    (creatDate, creatRef?, creatRefShort?)

    This spec fragment is referred to by specgroup-header-redefinitions.

  • distributor: supplies the name of a person or other agency responsible for the distribution of a text.
  • edition: describes the particularities of one edition of a text.
    Spec fragment redefine_edition
    Change element edition:
    Change contents to
    (further, kind, appearance)

    This spec fragment is referred to by specgroup-header-redefinitions.

  • editionStmt (edition statement): groups information relating to one edition of a text.
    Spec fragment redefine_editionStmt_as_PCDATA
    Change element editionStmt:
    Change contents to
    character data
    Attributes
     
    version

    This spec fragment is referred to by specgroup-header-redefinitions.

  • editorialDecl (editorial practice declaration): provides details of editorial principles and practices applied during the encoding of a text. CES changes the set of children for this element from P3 (suppressing interpretation and stdVals and adding transduction and conformance); IDS/XCES adds pagination to the children.
    Spec fragment redefine_editorialDecl
    Change element editorialDecl:
    Change contents to
    (pagination | correction | quotation | hyphenation | segmentation | transduction | normalization | conformance)+
    Attributes
     
    version

    This spec fragment is referred to by specgroup-header-redefinitions.

  • encodingDesc (encoding description): documents the relationship between an electronic text and the source or sources from which it was derived. IDS-XCES additionally allows an empty encodingDesc as used in dck.xces .
    Spec fragment redefine_encodingDesc
    Change element encodingDesc:
    Change contents to
    (%model.encodingDescPart; | %model.pLike;)*

    This spec fragment is referred to by specgroup-header-redefinitions.

  • extent: describes the approximate size of a text as stored on some carrier medium, whether digital or non-digital, specified in any convenient units.
  • fileDesc (file description): contains a full bibliographic description of an electronic file.
  • hyphenation: summarizes the way in which hyphenation in a source text has been treated in an encoded version of it.
  • idno (identifier): supplies any form of identifier used to identify some object, such as a bibliographic item, a person, a title, an organization, etc. in a standardized way.
  • keywords: contains a list of keywords or phrases identifying the topic or nature of a text.
  • langUsage (language usage): describes the languages, sublanguages, registers, dialects, etc. represented within a text.
  • language: characterizes a single language or sublanguage used within a text. TEI P5 uses an ident attribute, not id, to give the language code; IDS/XCES follows P3 in this.
    Spec fragment redefine_language
    Change element language:
    Attributes
     
    ident
    (delete this attribute)
    id
    (change this attribute)
    Supplies a language code constructed as defined in BCP 47 which is used to identify the language documented by this element, and which is referenced by the global attributes lang and xml:lang.
    Type
    an XSD ID value
    Example:
    <language id="de" usage="100">Deutsch</language>

    Note that for technical reasons it is not possible to assign the type ID to both the id and xml:id attributes. This version of I5 assigns the ID type to attribute id.

    This spec fragment is referred to by specgroup-header-redefinitions.

  • normalization: indicates the extent of normalization or regularization of the original source carried out in converting it to electronic form.
    Spec fragment redefine_normalization_as_PCDATA
    Change element normalization:
    Change contents to
    character data

    This spec fragment is referred to by specgroup-header-redefinitions.

  • profileDesc (text-profile description): provides a detailed description of non-bibliographic aspects of a text, specifically the languages and sublanguages used, the situation in which it was produced, the participants and their setting.
  • projectDesc (project description): describes in detail the aim or purpose for which an electronic file was encoded, together with any other relevant information concerning the process by which it was assembled or collected.
    Spec fragment redefine_projectDesc_as_PCDATA_or_p
    Change element projectDesc:
    Change contents to
    (#PCDATA | %model.pLike;)*

    This spec fragment is referred to by specgroup-header-redefinitions.

  • publicationStmt (publication statement): groups information concerning the publication or distribution of an electronic or other text.
    Spec fragment redefine_publicationStmt
    Change element publicationStmt:
    Change contents to
    ((distributor, pubAddress, telephone*, fax*, eAddress*, idno*, availability, pubDate, pubPlace*) | p+)
    Example:

    An example (actually, all of the publicationStmt elements in the available samples look like this, or else have no contents in any of their children):

    <publicationStmt>
      <distributor
        >Institut für Deutsche Sprache</distributor>
      <pubAddress
        >Postfach 10 16 21, D-68016 Mannheim</pubAddress>
      <telephone
        >+49 (0)621 1581 0</telephone>
      <availability region="world" 
        status="unknown"/>
      <pubDate/>
    </publicationStmt>

    This spec fragment is referred to by specgroup-header-redefinitions.

  • quotation: specifies editorial practice adopted with respect to quotation marks in the original.
    Spec fragment redefine_quotation_as_PCDATA
    Change element quotation:
    Change contents to
    character data

    This spec fragment is referred to by specgroup-header-redefinitions.

  • refsDecl (references declaration): specifies how canonical references are constructed for this text.
  • revisionDesc (revision description): summarizes the revision history for a file.
  • samplingDecl (sampling declaration): contains a prose description of the rationale and methods used in sampling texts in the creation of a corpus or collection.
    Spec fragment redefine_samplingDecl_as_PCDATA
    Change element samplingDecl:
    Change contents to
    (#PCDATA | p)*

    This spec fragment is referred to by specgroup-header-redefinitions.

  • segmentation: describes the principles according to which the text has been segmented, for example into sentences, tone-units, graphemic strata, etc.
    Spec fragment redefine_segmentation_as_PCDATA
    Change element segmentation:
    Change contents to
    character data

    This spec fragment is referred to by specgroup-header-redefinitions.

  • sourceDesc (source description): describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitized text, or a phrase such as "born digital" for a text which has no previous existence.
    Spec fragment redefine_sourceDesc
    Change element sourceDesc:
    Change contents to
    (%model.pLike;*, (biblFull | biblStruct)+, reference*)

    This spec fragment is referred to by specgroup-header-redefinitions.

  • tagUsage: supplies information about the usage of a specific element within a text.
  • tagsDecl (tagging declaration): provides detailed information about the tagging applied to a document. TEI P5 requires that the tagUsage elements in the tagsDecl element be wrapped in a namespace element; IDS/XCES follows P3.
    Spec fragment redefine_tagsDecl
    Change element tagsDecl:
    Change contents to
    tagUsage+

    This spec fragment is referred to by specgroup-header-redefinitions.

  • taxonomy: defines a typology used to classify texts either implicitly, by means of a bibliographic citation, or explicitly by a structured taxonomy.
    Spec fragment redefine_taxonomy
    Change element taxonomy:
    Change contents to
    (category+ | ((h.bibl | biblStruct), category*))

    This spec fragment is referred to by specgroup-header-redefinitions.

  • textClass (text classification): groups information which describes the nature or topic of a text in terms of a standard classification scheme, thesaurus, etc.
    Spec fragment redefine_textClass
    Change element textClass:
    Change contents to
    (catRef | classCode | h.keywords)*

    This spec fragment is referred to by specgroup-header-redefinitions.

  • titleStmt (title statement): groups information about the title of a work and those responsible for its intellectual content. IDS/XCES adjusts the content model here to use the specialized title elements it defines for the different corpus levels. An I5 corpus level-specific {c|d|t}.title element is obligatory, in addition, original TEI title elements with their suitable attributes may be specified, e.g. for specifying subtitles
    Spec fragment redefine_titleStmt
    Change element titleStmt:
    Change contents to
    (((korpusSigle, c.title) | (dokumentSigle, d.title) | (textSigle, t.title) | (x.title)), %model.respLike;*)

    This spec fragment is referred to by specgroup-header-redefinitions.

  • xenoData
    Spec fragment define_xenoData
    <xenoData> element (new)
    Classes
    (add) model.teiHeaderPart
    Contains
    meta+

    This spec fragment is referred to by specgroup-header-redefinitions.

  • meta
    Spec fragment define_meta
    <meta> element (new)
    Contains
    character data
    Attributes
     
    name
    (add this attribute)
    name of feature
    Type
    #PCDATA
    project
    (add this attribute)
    name of project or collection
    Type
    #PCDATA
    type
    (add this attribute)
    string or text or keyword or integer
    Values
    (closed list)
    string
    string
    text
    text
    keyword
    keyword
    integer
    integer
    uri
    uri
    attachment
    only retrievable
    date
    date
    desc
    (add this attribute)
    description
    Type
    #PCDATA

    This spec fragment is referred to by specgroup-header-redefinitions.

1.4 The textstructure module

The TEI textstructure module is also included:
Spec fragment specgroup-textstructure
Include module textstructure.
Include spec fragment specgroup-text-structure-renamings.
Include spec fragment specgroup-text-structure-deletions.
Include spec fragment specgroup-text-structure-changes.

This spec fragment is referred to by required-modules.

The TEI element is renamed idsText:
  • TEI (TEI document): contains a single TEI-conformant document, comprising a TEI header and a text, either in isolation or as part of a teiCorpus element. This corresponds in essential ways to the IDS idsText element.
Spec fragment specgroup-text-structure-renamings

Renaming ...

Change element TEI: rename as “idsText”.

This spec fragment is referred to by specgroup-textstructure.

Several elements in this module are suppressed. Descriptions of these elements are given in the appendix.
Spec fragment specgroup-text-structure-deletions

Suppressing unused elements ...

Drop element argument.
Drop element div1.
Drop element div2.
Drop element div3.
Drop element div4.
Drop element div5.
Drop element div6.
Drop element div7.
Drop element docDate.
Drop element floatingText.
Drop element group.
Drop element imprimatur.

This spec fragment is referred to by specgroup-textstructure.

The IDS/XCES vocabulary includes the elements listed below from the TEI textstructure module. A few of these are included in CES and XCES, with declarations which extend or otherwise modify those in the TEI. Others are omitted from CES and XCES and have been added back into the vocabulary by IDS. In some cases, the IDS/XCES declaration extends that of TEI.
Spec fragment specgroup-text-structure-changes
Include spec fragment modify_attributes_for_div.
Include spec fragment redefine_opener.
Include spec fragment redefine_signed.

This spec fragment is referred to by specgroup-textstructure.

  • back (back matter): contains any appendixes, etc. following the main part of a text.
  • body (text body): contains the whole body of a single unitary text, excluding any front or back matter.
  • byline: contains the primary statement of responsibility given for a work on its title page or at the head or end of the work.
  • closer: groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter.
  • dateline: contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer.
  • div (text division): contains a subdivision of the front, body, or back of a text. IDS/XCES eliminates the internal structure of the declarations in TEI and CES and allows a mixture of children in any order. We additionally alow the element posting from the DeRik TEI-proposal for CMC.
    Spec fragment modify_attributes_for_div
    Change element div:
    Change contents to
    (opener | head | byline | p | sp | u | %model.inter; | caption | figure | note | %model.divLike; | closer | %model.ids.milestones; | dateline)*
    Attributes
     
    complete
    (add this attribute)
    indicates whether the section is complete or a sample.
    Type
    an XSD NMTOKEN value
    Default value
    y
    Values
    (closed list)
    y
    The text section is complete.
    n
    The text section is incomplete (typically because it's a sample.
    type
    (redefine this attribute)
    the type of the text section.
    Type
    #PCDATA

    The most frequent values include: “section”, “Zeitung”, “book”, “Enzyklopädie-Artikel”, “Agenturmeldungen”, “figures”, “marginnotes”, “Rede”, “Zeitschrift”, “footnotes”, “Roman”, “content”, “preface”.

    Other values include: “abstract”, “Anmerkung”, “Ansprache”, “appendix”, “Aufruf”, “Aufsatz”, “Ausgabenvermerk”, “Beschluss”, “bibliography”, “Brief”, “captions”, “dedication”, “endnotes”, “Erklärung”, “Erzählung”, “Erzählungen”, “Fabeln”, “Forderung”, “Geschichte”, “glossary”, “Handzettel”, “Information”, “Interview”, “Kolumnen”, “Kriminalroman”, “Kurzgeschichten”, “Merkblatt”, “Nachwort”, “Novelle”, “postface”, “Predigt”, “Protokoll”, “Referat”, “Sachbuch”, “Sachbuch, Ratgeber”, “Schilderung”, “Sprechchöre und Transparente”, “Vorlesung”, “Vortrag”, “Wissenschaftszeitung”, and “Zeitungsartikel”.

    what
    (add this attribute)
    Subject of section in plenary debate according to GermaParlTEI
    Type
    #PCDATA
    desc
    (add this attribute)
    Descriptionof section in plenary debate according to GermaParlTEI
    Type
    #PCDATA

    This spec fragment is referred to by specgroup-text-structure-changes.

  • docAuthor (document author): contains the name of the author of the document, as given on the title page (often but not always contained in a byline).
  • docEdition (document edition): contains an edition statement as presented on a title page of a document.
  • docImprint (document imprint): contains the imprint statement (place and date of publication, publisher name), as given (usually) at the foot of a title page.
  • docTitle (document title): contains the title of a document, including all its constituents, as given on a title page.
  • epigraph: contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page.
  • front (front matter): contains any prefatory matter (headers, title page, prefaces, dedications, etc.) found at the start of a document, before the main body.
  • opener: groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter. IDS-XCES adds gap to the content model.
    Spec fragment redefine_opener
    Change element opener:
    Change contents to
    (#PCDATA | %model.phrase; | dateline | keywords | salute | list | %model.global.edit; | pb | lb)*
    Attributes
     
    type
    (add this attribute)
    indicates the type of opener.
    Type
    an XSD NMTOKEN value
    Default value
    unspecified
    Values
    (closed list)
    lead
    unspecified

    This spec fragment is referred to by specgroup-text-structure-changes.

  • salute (salutation): contains a salutation or greeting prefixed to a foreword, dedicatory epistle, or other division of a text, or the salutation in the closing of a letter, preface, etc.
  • signed (signature): contains the closing salutation, etc., appended to a foreword, dedicatory epistle, or other division of a text.
    Spec fragment redefine_signed
    Change element signed:
     
    (signature) contains the closing salutation, etc., appended to a foreword, dedicatory epistle, or other division of a text, or appearing freely within paragraphs, sentences, quotations or the post as a whole especially of an email, or of a user contribution on a Wikipedia talk page.
    Classes
    (add) model.floatP.cmc
    Attributes
     
    type
    (add this attribute)
    Values
    (closed list)
    signed
    indicates that the corresponding posting was explicitly signed by a registered user using a user signature mark up (e.g. ~~~~).
    unsigned
    indicates that the corresponding posting was marked by either a registered or unregistered user using the Unsigned or Help template.
    user_contribution
    "user_contribution" indicates that the corresponding posting was marked using a [[Special:Contributions/IP]] link (e.g by an unregistered user)
    special_contribution
    added 2019-06-14 This is actually the same as "user_contribution" "special_contribution" indicates that the corresponding posting was marked using a [[Special:Contributions/IP]] link (e.g by an unregistered user)

    This spec fragment is referred to by specgroup-text-structure-changes.

  • text: contains a single text of any kind, whether unitary or composite, for example a poem or drama, a collection of essays, a novel, a dictionary, or a corpus sample.
  • titlePage (title page): contains the title page of a text, appearing within the front or back matter.
  • titlePart: contains a subsection or division of the title of a work, as indicated on a title page.

2 Optional TEI modules

This section lists the optional TEI modules incorporated in whole or part into the IDS/XCES vocabular.

Spec fragment optional-modules
Include spec fragment specgroup-analysis.
Include spec fragment specgroup-corpus.
Include spec fragment specgroup-figures.
Include spec fragment specgroup-namesdates.
Include spec fragment specgroup-linking.
Include spec fragment specgroup-spoken.

This spec fragment is referred to by top-level schema fragment ids_v2a.

2.1 The analysis module

The IDS/XCES vocabulary includes the following elements from the TEI analysis module:
  • s (s-unit): contains a sentence-like division of a text. CES and the existing IDS/XCES DTD define this using the parameter entity phrase.seq, but this relies on a different meaning for the phrase class than is present in TEI P5.
    Spec fragment redefine_s
    Change element s:
    Change contents to
    (#PCDATA | %model.phrase; | %model.global; | q | list | stage | %model.floatP.cmc; | quote | poem)*
    Attributes
     
    broken
    indicates whether this sentence is broken between two or more s elements (linked using the next and prev attributes).
    Values
    y
    sentence is represented by multiple s elements.
    n
    sentence is represented by a single s element.
    unspecified
    no claim is made.

    This attribute appears not to be in use in the IDS samples.

    This spec fragment is referred to by specgroup-analysis.

  • w (word): represents a grammatical (not necessarily orthographic) word.
    Spec fragment add_w_to_token_class
    Change element w:
    Classes
    (add) model.token
    Attributes
     
    pos
    contains a POS value
    Type
    #PCDATA
    orig
    (original) gives the original string or is the empty string when the element does not appear in the source text.
    Type
    #PCDATA
    head
    (add this attribute)
    head indicator as in conll-u
    Type
    teidata.count
    deprel
    (add this attribute)
    Type
    #PCDATA
    msd
    (add this attribute)
    Type
    #PCDATA
    join
    When present, it provides information on whether the token in question is adjacent to another, and if so, on which side. The definition of this attribute is adapted from ISO MAF (Morpho-syntactic Annotation Framework), ISO 24611:2012.
    Values
    (closed list)
    no
    The token is not adjacent to another
    left
    There is no whitespace on the left side of the token
    right
    There is no whitespace on the left side of the token
    both
    There is no whitespace on either side of the token
    overlap
    The token overlaps with another; other devices (specifying the extent and the area of overlap) are needed to more precisely locate this token in the character stream.

    This spec fragment is referred to by specgroup-analysis.

The remaining elements in this module are suppressed:
Spec fragment specgroup-analysis
Include module analysis.
Include spec fragment redefine_s.
Include spec fragment add_w_to_token_class.
Drop element c.
Drop element cl.
Drop element interp.
Drop element interpGrp.
Drop element m.
Drop element pc.
Drop element phr.
Drop element span.
Drop element spanGrp.

This spec fragment is referred to by optional-modules.

See the appendix for brief descriptions.

2.2 The corpus module

The IDS/XCES vocabulary includes the following elements from the TEI corpus module:
  • particDesc (participation description): describes the identifiable speakers, voices, or other participants in any kind of text.

    textDesc (text description): provides a description of a text in terms of its situational parameters.

    Spec fragment redefine_textDesc
    Change element textDesc:
    Change contents to
    (textType?, textTypeRef?, textTypeArt*, textDomain?, column?)

    This spec fragment is referred to by specgroup-corpus.

The remaining elements in the module are suppressed.
Spec fragment specgroup-corpus
Include module corpus.
Include spec fragment redefine_textDesc.
Drop element channel.
Drop element constitution.
Drop element derivation.
Drop element domain.
Drop element factuality.
Drop element interaction.
Drop element locale.
Drop element preparedness.
Drop element purpose.

This spec fragment is referred to by optional-modules.

2.3 The figures module

The IDS/XCES vocabulary includes the following elements from the TEI figures module:
  • cell: contains one cell of a table.
  • figDesc (description of figure): contains a brief prose description of the appearance or content of a graphic figure, for use when documenting an image without displaying it.
  • row: contains one row of a table.
  • table: contains text displayed in tabular form, in rows and columns.
The following elements in the TEI figures module are redefined:
  • figure: groups elements representing or containing graphic information such as an illustration or figure. IDS-XCES adds ptr to the content model.
    Spec fragment redefine_figure
    Change element figure:
    Change contents to
    (%model.headLike; | ptr | %model.common; | figDesc | %model.graphicLike; | %model.global; | %model.divBottomPart; | %model.divWrapper; | %model.descLike;)*
    Example:
    ToDo

    This spec fragment is referred to by specgroup-figures.

Figure is redefined.
Spec fragment specgroup-figures
Include module figures.
Include spec fragment redefine_figure.

This spec fragment is referred to by optional-modules.

2.4 The spoken module

As a conservative extension, the I5 vocabulary now includes the element u from the spoken module, for an inclusion of transcripts of spoken language in DeReKo.

The following specification fragment includes the spoken module and makes appropriate modifications to it.
Spec fragment specgroup-spoken
Include module spoken.

The following elements are deleted from the module spoken:

Drop element annotationBlock.
Drop element broadcast.
Drop element equipment.
Drop element incident.
Drop element kinesic.
Drop element pause.
Drop element recording.
Drop element recordingStmt.
Drop element scriptStmt.
Drop element shift.
Drop element transcriptionDesc.
Drop element vocal.
Drop element writing.

This spec fragment is referred to by optional-modules.

2.5 The namesdates module

As a conservative extension, the I5 vocabulary now includes 11 elements from the namesdates module. The rest of the module might as well be included, but are suppressed because they do not seem to be needed for now.

The following specification fragment includes the namesdates module and makes appropriate modifications to it.
Spec fragment specgroup-namesdates
Include module namesdates.

The following elements are deleted from the module namesdates:

Drop element addName.
Drop element affiliation.
Drop element age.
Drop element birth.
Drop element bloc.
Drop element climate.
Drop element death.
Drop element district.
Drop element education.
Drop element event.
Drop element faith.
Drop element floruit.
Drop element genName.
Drop element geo.
Drop element geogFeat.
Drop element langKnowledge.
Drop element langKnown.
Drop element listEvent.
Drop element listNym.
Drop element listPlace.
Drop element listRelation.
Drop element location.
Drop element nameLink.
Drop element nationality.
Drop element nym.
Drop element occupation.
Drop element offset.
Drop element org.
Drop element personGrp.
Drop element place.
Drop element population.
Drop element relation.
Drop element relationGrp.
Drop element residence.
Drop element roleName.
Drop element settlement.
Drop element sex.
Drop element socecStatus.
Drop element state.
Drop element terrain.
Drop element trait.

This spec fragment is referred to by optional-modules.

2.6 The linking module

In DeReKo, the elements ref, ptr, and xptr are used for linking. ref is already included in I5 through the core module. The elements xref and xptr were declared in the linking module in TEI P3 and P4, but they are no longer part of TEI P5. From the TEI P5 linking module, only the elements timeline and seg are taken. They are needed for the encoding of CMC documents

The following specification fragment includes the linking module and makes and delete all elemente except seg, timeline, and when.

Spec fragment specgroup-linking
Include module linking.

The following elements are deleted from the module linking:

Drop element ab.
Drop element anchor.
Drop element alt.
Drop element altGrp.
Drop element join.
Drop element joinGrp.
Drop element link.
Drop element linkGrp.

Choosing the linking module automatically includes the linking attributes corresp, synch, sameAs, copyOf, next, prev, exclude, and select. All linking attributes are also att.global, thus can appear almost anywhere.

This spec fragment is referred to by optional-modules.

2.7 TEI modules not included

The following optional TEI modules are not included in this customization:
  • The certainty module (for recording points of uncertainty and dispute).

  • The dictionaries module (for print or electronic dictionaries).

  • The drama module. (N.B. the caption element of TEI P5 which is included in this module has nothing to do with the caption element introduced by CES as an extension of TEI, and retained by IDS/XCES.)

  • The gaiji module (for extending the Unicode / ISO 10646 universal character set).

  • The fs module (for the representation of feature structures for linguistic or other analysis).

  • The msdescription module for description of manuscript materials.

  • The nets module (for representation of graphs, networks, and trees).

  • The tagdocs module (for documentation of XML vocabularies).

  • The textcrit module (for the representation of text-critical apparatus as used in scholarly editions).

  • The transcr module (for markup of transcriptions of original source material).

  • The verse module (for markup of metrical phenomena in verse).

3 Models from cmc-core

Spec fragment cmc-core

The module cmc, the classes model.floatP.cmc and att.lobal.cmc

<<moduleSpec ident = "cmc">>
 
Elements, Attributes, and Models for Computer-mediated communication (CMC). This module collects all cmc-specific extensions. Hence these will only be added to a TEI schema when this module is selected in a customisation.
<</moduleSpec>>
Add class model.floatP.cmc.
 
includes the TEI element signed (see at the definition of signed) to allow it to occur freely and multiply within the elements that have model.floatP.cmc as part of their content model. In the original TEI, signed is p-like, i.e. restricted to occur between paragraphs only, reflecting the more rigid structure of written letters. This extension is needed to mark user signatures in Wikipedia talk which occur within p, s,head,hi, and item
Add class att.global.cmc.
 
For the time being, this class contains only the attribute creation. This class provides the additional global attribute creation for associating information about how the element content was created in a CMC environment.
Attributes
 
creation
(add this attribute)
Marks how the content of the respective element was generated in a CMC environment.
Values
(closed list)
human
The content of the respective element was "naturally" typed or spoken by a human user
template
The content of the respective element was generated after a human user activated a template for its insertion
system
The content of the respective element was generated by the system, i.e. the CMC environment
bot
The content of the respective element was generated by a bot, i.e. a non-human agent, mostly external to the CMC environment.
unspecified
How the content of the respective element was generated is unknown or unspecified.
Example:

automatic system message in chat: user moves on to another chatroom

<<egXML>>
<<post type = "event" creation = "system" who = "#system" rend = "color:blue">>
<<p>>
<<name type = "nickname" corresp = "#A02">>McMike<</name>>
geht in einen anderen Raum:
<<name type = "roomname">>Kreuzfahrt<</name>>
<</p>>
<</post>>
<</egXML>>
Example:

automatic system message in chat: user enters a chatroom

<<egXML>>
<<post type = "event" creation = "system">>
<<p>>
<<name type = "nickname" corresp = "#A08">>c_bo<</name>>
betritt den Raum. <</p>>
<</post>>
<</egXML>>
Example:

automatic system message in chat: user changes his font color

<<egXML>>
<<post type = "event" creation = "system" rend = "color:red">>
<<p>>
<<name type = "nickname" corresp = "#A08">>c_bo<</name>>
hat die Farbe gewechselt. <</p>>
<</post>>
<</egXML>>
Example:

An automatic signature of user including an automatic timestamp (Wikipedia discussion, anonymized). The specification of creation at the inner element signed is meant to override the specification at the outer element post. This is generally possible when the outer creation value is "human".

<<egXML>>
<<post type = "standard" gemeration = "human" indentLevel = "2" synch = "t00394407" who = "WU00005582">>
<<p>> Kurze Nachfrage: Die Hieros für den Goldnamen stammen auch von Beckerath gem. Literatur ? Grüße --
<<signed creation = "template">>
<<gap reason = "signatureContent">><</gap>>
<<time creation = "template" who = "WU00005582">>18:50, 22. Okt. 2008 (CEST)<</time>>
<</signed>>
<</p>>
<</post>>
<</egXML>>
Example:

Usenet news message: a client-generated line that introduces a quotation from a previous message (similar to email):

<<egXML>>
<<cit type = "replyCit">>
<<bibl type = "introQuote" creation = "template">>Am 03.04.2015 um 09:46 schrieb [_NAME_]:<</bibl>>
<<quote>> <</quote>>
<</cit>>
<</egXML>>
Example:

Wikipedia talk page, user signature

<<egXML>>
<<signed creation = "template" who = "WU00018921">>
<<gap reason = "signatureContent">><</gap>>
<<time creation = "template" who = "WU00018921">>12:01, 12. Jun. 2009 (CEST)<</time>>
<</signed>>
<</egXML>>

This spec fragment is referred to by top-level schema fragment ids_v2a.

4 Elements added by IDS

The elements described in this section are not direct equivalents of any individual TEI element. They fall into several categories, which are different primarily for purposes of vocabulary maintenance:
  • elements taken over without change from CES and XCES
  • elements taken over from TEI P3 which are no longer present in P5
  • elements added by IDS
  • elements defined by DERIK
  • elements defined by TEI Correspondence SIG
The following specification fragment includes each of these groups in turn:
Spec fragment IDS-additions
Include spec fragment specgroup-XCES-unchanged.
Include spec fragment specgroup-XCES-changed.
Include spec fragment specgroup-renamings.
Include spec fragment specgroup-TEI-P3.
Include spec fragment specgroup-IDS-specific.
Include spec fragment specgroup-DERIK.
Include spec fragment specgroup-Correspondence.

This spec fragment is referred to by top-level schema fragment ids_v2a.

4.1 Elements taken over without change from CES and XCES

A number of elements in IDS/XCES are taken over from the CES and XCES vocabularies.
  • annotation: provides information about one external annotation document associated with the text. (Not present in samples.)
    Spec fragment ids-header-annotation
    <annotation> element (new)
     
    provides information about one external annotation document associated with the text.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    type
    indicates the type of annotation.
    Type
    #PCDATA
    Values
    (open list)
    segment
    annotation file contains segmentation into words and sentences.
    gram
    annotation file contains morpho-syntactic category information for the words in the text.
    align
    annotation file contains alignment links to a parallel translation.
    ann.loc
    provides information (path/file name, URL, etc.) about the location of the annotation file.
    Type
    #PCDATA
    trans.loc
    for annotation file containing alignment information, provides information (path/file name, URL, etc.) about the location of the file containing the aligned text.
    Type
    #PCDATA

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • annotations (in file ids.xheader.elt): child of profileDesc. Groups annotation elements. (Not present in samples.)
    Spec fragment ids-header-annotations
    <annotations> element (new)
     
    groups information about annotation documents associated with the text.
    Classes
    att.global
    Contains
    annotation+

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • biblNote: a descriptive note supplying additional information of any kind relating to a bibliographic item described within a corpus or text header.
    Spec fragment ids-header-biblNote
    <biblNote> element (new)
     
    child of analytic and monogr. #PCDATA, but otherwise roughly equivalent to TEI note.
    Classes
    att.global
    Contains
    character data
    Example:
    <biblNote n="1">Die Datengrundlage 
    der Tagebücher selbst (17. Juni bis 31. Dezember 1945) 
    bildet: Klemperer, Victor: So sitze ich denn zwischen 
    allen Stühlen, Bd. 1, Tagebücher 1945-1949, Hrsg.: 
    Nowojski, Walter; unter Mitarbeit von Christian Löser. - 
    Berlin: Aufbau-Verlag</biblNote>
                          
    Example:
    <biblNote n="1">ID:5FDC73, 
    2007.01.01 10:11</biblNote>

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • byteCount: contains the count of bytes in the file containing the text together with its markup. (Not present in samples.)
    Spec fragment ids-header-byteCount
    <byteCount> element (new)
     
    child of extent; #PCDATA.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    units
    Default value
    kb
    Values
    bytes
    kb
    mb
    gb

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • changeDate: gives the date of a change (as child of change). (Not present in samples.)
    Spec fragment ids-header-changeDate
    <changeDate> element (new)
     
    child of change; #PCDATA; context-dependent specialization of TEI date.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    value
    Type
    an XSD date value

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • conformance: provides the CES level of conformance for the text or corpus. (Not present in samples.)
    Spec fragment ids-header-conformance
    <conformance> element (new)
     
    child of editorialDecl; #PCDATA plus level attribute.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    level
    Default value
    0
    Values
    0
    1
    2
    3

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • eAddress: gives an electronic address of the person or institution who distributes the text or corpus. Note that more than one occurrence of this tag can appear, so that multiple addresses (possibly of different types) can be included. (Not present in samples.)
    Spec fragment ids-header-eAddress
    <eAddress> element (new)
     
    child of publicationStmt, provides electronic address of distributor; #PCDATA.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    type
    Type
    an XSD string value
    Default value
    email

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • extNote: a descriptive note supplying additional information of any kind relating to an extent information provided within a corpus or text header. (Not present in samples.)
    Spec fragment ids-header-extNote
    <extNote> element (new)
     
    child of extent; provides additional information about extent of document. #PCDATA.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • fax: gives the fax number of the person or institution who distributes the text or corpus, in format conformant to ITU-T/CCITT Recommendation E.123. (Not present in samples.)
    Spec fragment ids-header-fax
    <fax> element (new)
     
    child of publicationStmt, provides fax number of distributor in CCITT E.123 form; #PCDATA.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • h.author in a bibliographic reference, contains the name of an author (personal or corporate) of a work; a context-specific renaming of TEI author element. CES specifies that names should be given in a canonical form, with surnames preceding forenames, but IDS practice is not consistent in this regard.
    Spec fragment ids-header-h.author
    <h.author> element (new)
     
    child of analytic and monogr; context-specific renaming of TEI author element; #PCDATA.
    Classes
    att.global
    Contains
    character data
    Example:
    <h.author>Matthias Kunert</h.author>
    Example:
    <h.author>Fronk, Eleonore; 
    Andreas, Werner</h.author>

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • h.bibl: character data only, suitable for very simple citations.
    Spec fragment ids-header-h.bibl
    <h.bibl> element (new)
     
    child of taxonomy; #PCDATA (sic).
    Classes
    att.global
    Contains
    character data
    Example:
    <taxonomy id="topic">
      <h.bibl>Thementaxonomie (siehe 
        http://www.ids-mannheim.de/kl/projekte/methoden/te.html)
      </h.bibl>
      <category id="topic.fiktion">
        <catDesc>Fiktion</catDesc>
        <category id="topic.fiktion.vermischtes">
          <catDesc>Fiktion:Vermischtes</catDesc>
        </category>
      </category>
      ...
    </taxonomy>

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • h.item: (as child of change element) specifies the nature of the change(s). One or more occurrences of this element may appear within each change element. Context-dependent renaming of standard TEI item. (Not present in samples.)
    Spec fragment ids-header-h.item
    <h.item> element (new)
     
    child of change; context-dependent renaming of standard TEI item.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • h.keywords: contains a list of keywords or phrases identifying the topic or nature of a text, each of which is tagged as a term. (Renaming of TEI keywords, plus modified content model.)
    Spec fragment ids-header-h.keywords
    <h.keywords> element (new)
     
    (in file ids.xheader.elt): child of textClass. Contains a list of keywords or phrases identifying the topic or nature of a text, each of which is tagged as a term. (Renaming of TEI keywords, plus modified content model.)
    Classes
    att.global
    Contains
    keyTerm+
    Example:
    <h.keywords>
      <keyTerm>Bau/Leiharbeit</keyTerm>
    </h.keywords>

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • h.title: the title of the electronic file, including alternative titles or subtitles. Context-specific renaming of TEI title.
    Spec fragment ids-header-h.title
    <h.title> element (new)
     
    child of analytic and monogr; context-specific renaming of TEI title; #PCDATA.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    type
    Type
    an XSD NMTOKEN value
    Default value
    main
    Values
    main
    sub
    abbr
    level
    Type
    an XSD NMTOKEN value
    Values
    m
    a
    Example:
    <h.title type="main">IKB verschiebt 
    Halbjahresbericht erneut</h.title>

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • keyTerm (in file ids.xheader.elt): child of h.keywords, encloses one keyword term describing the text. Context-specific renaming of standard TEI element term.
    Spec fragment ids-header-keyTerm
    <keyTerm> element (new)
     
    child of h.keywords, encloses one keyword term describing the text. Context-specific renaming of standard TEI element term.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    type
    indicates the type of keyTerm (person, country)
    Type
    #PCDATA
    subtype
    indicates the subtype of keyTerm
    Type
    #PCDATA

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • pubAddress (in file ids.xheader.elt): child of publicationStmt, provides address of distributor. Context-specific specialization of TEI address element; #PCDATA.
    Spec fragment ids-header-pubAddress
    <pubAddress> element (new)
     
    child of publicationStmt, provides address of distributor. Context-specific specialization of TEI address element; #PCDATA.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • pubDate (in file ids.xheader.elt): child of publicationStmt, provides date of publication. Context-specific specialization of TEI date element; #PCDATA.
    Spec fragment ids-header-pubDate
    <pubDate> element (new)
     
    child of publicationStmt, provides date of publication. Context-specific specialization of TEI date element; #PCDATA.
    Classes
    att.global
    att.datable.iso
    Contains
    character data
    Attributes
     
    type
    Type
    an XSD NMTOKEN value
    Values
    year
    month
    day
    time

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • respName (in file ids.xheader.elt): child of respStmt (where it is a context-dependent renaming of name) and change. Contains #PCDATA only.
    Spec fragment ids-header-respName
    <respName> element (new)
     
    child of respStmt (where it is a context-dependent renaming of name) and change. Contains #PCDATA only.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • respType (in file ids.xheader.elt): child of respStmt; context-specific renaming of standard TEI resp
    Spec fragment ids-header-respType
    <respType> element (new)
     
    child of respStmt; context-specific renaming of standard TEI resp
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • telephone (in file ids.xheader.elt): child of publicationStmt, provides telephone number of distributor in CCITT E.123 form; #PCDATA.
    Spec fragment ids-header-telephone
    <telephone> element (new)
     
    child of publicationStmt, provides telephone number of distributor in CCITT E.123 form; #PCDATA.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • transduction: (as child of editorialDecl) describes the principles according to which the text has been transduced, either in transcribing it from audio tape to written form, or in converting from an electronic original.
    Spec fragment ids-header-transduction
    <transduction> element (new)
     
    child of editorialDecl; #PCDATA plus level attribute.
    Classes
    att.header
    att.declarable
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • translation (in file ids.xheader.elt): child of translations. Gives information about one translation of the text.
    Spec fragment ids-header-translation
    <translation> element (new)
     
    child of translations. Gives information about one translation of the text.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    trans.loc
    Type
    an XSD string value

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • translations (in file ids.xheader.elt): child of profileDesc; groups translation elements.
    Spec fragment ids-header-translations
    <translations> element (new)
     
    child of profileDesc; groups translation elements.
    Classes
    att.global
    Contains
    (translation, translator?)+

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • translator (in file ids.xheader.elt): identifies the translator responsible for one translation.
    Spec fragment ids-header-translator
    <translator> element (new)
     
    identifies the translator responsible for one translation.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • wordCount: (as child of extent) contains the count of words in the text. (Not present in samples.)
    Spec fragment ids-header-wordCount
    <wordCount> element (new)
     
    child of extent; #PCDATA.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • writingSystem (in file ids.xheader.elt): child of wsdUsage; describes one character set used in the document; can point to an external writing system declaration. (This element appears to be a survival from the SGML version of CES; in XML, character set issues are typically handled at a different level.) (Not present in samples.)
    Spec fragment ids-header-writingSystem
    <writingSystem> element (new)
     
    child of wsdUsage; describes one character set used in the document; can point to an external writing system declaration.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-XCES-unchanged.

  • wsdUsage: groups information describing the character set(s) used within a text. (Not present in samples.)
    Spec fragment ids-header-wsdUsage
    <wsdUsage> element (new)
     
    child of profileDesc; groups writingSystem elements.
    Classes
    att.global
    Contains
    (translation, translator?)+

    This spec fragment is referred to by specgroup-XCES-unchanged.

CES also defined some new classes for attribute inheritance and content models.
Spec fragment ces-attribute-classes
Add class att.header.
Classes
att.global
Add class att.text.
Classes
att.global
Attributes
 
wsd
Type
an XSD string value
Add class model.token.
 
identies a group of low-level tokens and similar elements

This spec fragment is referred to by specgroup-XCES-unchanged.

CES moves the rend attribute from the list of global attributes to the text class; for now, we follow TEI here.

The following specification fragment incorporates all the descriptions for the elements just mentioned:
Spec fragment specgroup-XCES-unchanged
Include spec fragment ces-attribute-classes.
Include spec fragment ids-header-annotation.
Include spec fragment ids-header-annotations.
Include spec fragment ids-header-biblNote.
Include spec fragment ids-header-byteCount.
Include spec fragment ids-header-changeDate.
Include spec fragment ids-header-conformance.
Include spec fragment ids-header-eAddress.
Include spec fragment ids-header-extNote.
Include spec fragment ids-header-fax.
Include spec fragment ids-header-h.author.
Include spec fragment ids-header-h.bibl.
Include spec fragment ids-header-h.item.
Include spec fragment ids-header-h.keywords.
Include spec fragment ids-header-h.title.
Include spec fragment ids-header-keyTerm.
Include spec fragment ids-header-pubAddress.
Include spec fragment ids-header-pubDate.
Include spec fragment ids-header-respName.
Include spec fragment ids-header-respType.
Include spec fragment ids-header-telephone.
Include spec fragment ids-header-transduction.
Include spec fragment ids-header-translation.
Include spec fragment ids-header-translations.
Include spec fragment ids-header-translator.
Include spec fragment ids-header-wordCount.
Include spec fragment ids-header-writingSystem.
Include spec fragment ids-header-wsdUsage.

This spec fragment is referred to by IDS-additions.

4.2 Elements defined by (X)CES and modified by IDS/XCES

  • caption: (1) a heading, title etc. attached to a picture or diagram (2) a "pull quote" or other text about or extracted from a text and superimposed upon it to draw attention to it. This element was added by CES; it conflicts (possibly unintentionally) with a different element also named caption defined by TEI P3 for text displayed in a film (or text in a screenplay intended for such display). IDS/XCES modifies the CES version by allowing IDS milestones in the content.
    Spec fragment ids-doc-caption
    <caption> element (new)
     
    (1) a heading, title etc. attached to a picture or diagram; (2) a ‘pull quote’ or other text about or extracted from a text and superimposed upon it to draw attention to it. IDS-XCES adds ptr to the content model.
    Classes
    att.text
    Contains
    (head*, (p | %model.inter; | %model.ids.milestones;)+)
    Attributes
     
    type
    categorizes the caption
    Default value
    unspec
    Values
    byline
    caption containing authorship of an article
    display
    extra-textual caption (displayed box, etc.)
    attached
    caption describing a figure, photograph, etc.
    unspec
    not specified or unknown
    Example:
    <caption id="zi3.62406-0-c1" type="display">
      <quote>
        <s>"die Käufer werden dem Champagner
           treu bleiben, auch wenn er wieder 
           teurer wird"</s>
        <s>
          <title>Claude Taittinger
           Generaldirektor von Champagne 
           Taittinger</title>
        </s>
      </quote>
    </caption>

    This spec fragment is referred to by specgroup-XCES-changed.

  • poem (in file ids.xesdoc.dtd): contains a poem, or an extract from a poem, appearing within or between paragraphs; an inter-level element. IDS changes CES's definition to allow milestone elements among the children.
    Spec fragment ids-doc-poem
    <poem> element (new)
     
    contains a poem appearing within or between paragraphs; an inter-level element.
    Classes
    att.text
    model.inter
    Contains
    (head?, (lg | l | %model.ids.milestones;)+)
    Example:
    <p>
      ...
      <s>Denn auch für diesen hilflosen, 
      aber unbestechlichen Chronisten der 
      dunkelsten deutschen Jahre erweisen 
      die Verse Brechts ihre Gültigkeit:</s>
    </p>
    <poem>
      <lg part="u">
        <l part="u">
          <s>[...]</s>
        </l>
      </lg>
      <lg part="u">
        <l part="u">
          <s>Ihr, die ihr auftauchen werdet 
             aus der Flut</s>
        </l>
        <l part="u">
          <s>In der wir untergegangen sind</s>
        </l>
        ...
      </lg>
      ...
    </poem>

    This spec fragment is referred to by specgroup-XCES-changed.

IDS/XCES also redefines the base.seq parameter entity in such a way that it becomes not a sequence but an element class. To try to avoid confusion, it is here renamed basic.
Spec fragment ids-basic-class
Add class model.basic.
 
identies a group of basic phrase-level elements allowed in some cases where other phrase-level elements are not allowed.
Add class model.ids.milestones.
 
identifies a set of elements used by IDS/XCES as milestones, distinct from the built-in milestone classes of TEI P5.

This spec fragment is referred to by specgroup-XCES-changed.

Spec fragment specgroup-XCES-changed
Include spec fragment ids-doc-caption.
Include spec fragment ids-doc-poem.
Include spec fragment ids-basic-class.

This spec fragment is referred to by IDS-additions.

4.3 Renamings of TEI elements

Several IDS/XCES elements are esssentially renamings (or context-dependent renamings) of TEI elements.

  • idsCorpus (in file ids.xesdoc.dtd): renaming of teiCorpus, with slight difference in content model
  • idsHeader (in file ids.xheader.elt): renaming of teiHeader
  • idsDoc (in file ids.xesdoc.dtd): intermediate level between idsText and idsCorpus; conceptually similar to the TEI element, and structurally similar to teiCorpus but (just for that reason) cannot be declared as a renaming of either. HLU: For the time being, idsDoc is not put in the namespace http://www.ids-mannheim.de/i5 so that in an i5 document, it will work like idsCorpus, idsText and idsHeader. The latter are technical renamings of original TEI elements (using altIdent) and therefore cannot be put in a namespace other than the TEI namespace)
    Spec fragment ids-doc-idsDoc
    <idsDoc> element (new)
     
    contains a single document within an IDS corpus; may contain one or several texts.
    Classes
    att.global
    Contains
    (idsHeader, idsText+)
    Attributes
     
    type
    Type
    an XSD string value
    Default value
    text
    version
    Type
    #PCDATA
    TEIform
    Type
    an XSD NMTOKEN value
    Default value
    TEI.2

    This spec fragment is referred to by specgroup-renamings.

  • idsText (in file ids.xesdoc.dtd): renaming of TEI element
Spec fragment specgroup-renamings
Include spec fragment ids-doc-idsDoc.

This spec fragment is referred to by IDS-additions.

4.4 TEI P3 elements no longer in P5

Some elements and attributes used by IDS/XCES were taken over from TEI P3, but are no longer present in TEI P5: xptr, xref, dateRange, and timeRange.

Of these, only xptr appears in the samples, so for now that's the only one we define.
Spec fragment p3-xptr-element
<xptr> element (new)
i.e.
external pointer
 
defines a pointer to a location outside the current document.
Classes
model.ptrLike
att.xPointer
att.typed
att.declaring
(add) model.ids.milestones
Contains
Empty.
Attributes
 
TEIform
(add this attribute)
Default value
ptr
Example:
<xptr targType = "pb" targOrder = "u" doc = "korpref.bio" from =
                    "TK1.00018-5-PB5" to = "DITTO" TEIform = "xptr"/>

This spec fragment is referred to by specgroup-TEI-P3.

TEI P5 has dropped the id attribute from the global class, and the targOrder attribute from the att.pointing class; these must be restored.
Spec fragment specgroup-TEI-P3
Include spec fragment p3-xptr-element.
Change class att.global.
Classes
(add) att.global.cmc
Attributes
 
xml:id
(change this attribute)
<<altIdent>>id<</altIdent>>
Change class att.pointing.
Attributes
 
targOrder
(add this attribute)
where more than one identifier is supplied as the value of the target attribute, this attribute specifies whether the order in which they are supplied is significant.
Values
y
Yes: the order in which IDREF values are specified as the value of a target attribute should be followed when combining the targeted elements.
n
No: the order in which IDREF values are specified as the value of a target attribute has no significance when combining the targeted elements.
u
Unspecified: the order in which IDREF values are specified as the value of a target attribute may or may not be significant.
targType
(add this attribute)
specifies the kinds of elements to which this pointer may point.
Type
a list of XSD values

If this attribute is supplied, every element specified as a target must be of one or other of the types specified. An application may choose whether or not to report failures to satisfy this constraint as errors, but may not access an element of the right identifier but the wrong type.

Add class att.xPointer.
 
defines a set of attributes used by all those elements which use the TEI P3 extended pointer mechanism to point at locations which have no XML ID.
Classes
att.pointing
Attributes
 
doc
specifies the document within which the desired location is to be found.
Type
an XSD ENTITY value

In principle, the value of this attribute is supposed by TEI P3 to be the name of an external entity declared in the DTD (often in the internal DTD subset); in practice, in IDS documents it appears to be a relative reference to a file, in the form of a filename.

from
specifies the start of the destination of the pointer.
Type
an XSD string value

In principle, the value of this attribute is supposed by TEI P3 to be a TEI extended pointer; in practice, in IDS documents it appears to be an ID in the document indicated by the doc attribute.

to
specifies the end of the destination of the pointer.
Type
an XSD string value

In principle, the value of this attribute is supposed by TEI P3 to be a TEI extended pointer; in practice, in IDS documents it appears always to be the default value, DITTO.

This spec fragment is referred to by IDS-additions.

4.5 Elements and attributes added by IDS

The following elements are not present in the TEI or in CES/XCES, but have been added by IDS. All but one are intended to appear in the header.
  • appearance: ‘physical appearance’ of the source (BOT+e)
    Spec fragment ids-header-appearance
    <appearance> element (new)
     
    A child of edition.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-IDS-specific.

  • c.title: corpus title; a context-specific specialization of TEI title.
    Spec fragment ids-header-c.title
    <c.title> element (new)
     
    child of titleStmt. #PCDATA only, otherwise a context-specific specialization of TEI title.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-IDS-specific.

  • column: original label of newspaper column> section as in the source (BOT+ress)
    Spec fragment ids-header-column
    <column> element (new)
     
    child of textDesc. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <textDesc>
      <textTypeArt>Bericht</textTypeArt>
      <textDomain/>
      <column>TB-KLN2 (Abk.)</column>
    </textDesc>

    This spec fragment is referred to by specgroup-IDS-specific.

  • creatDate: time of creation.
    Spec fragment ids-header-creatDate
    <creatDate> element (new)
     
    child of creation (in profileDesc). #PCDATA only.
    Classes
    att.global
    att.responsibility
    Contains
    character data
    Example:
    <creation>
      <creatDate>2001.01.13</creatDate>
    </creation>

    This spec fragment is referred to by specgroup-IDS-specific.

  • creatRef: reference to (creation of text and) first edition.
    Spec fragment ids-header-creatRef
    <creatRef> element (new)
     
    child of creation (in profileDesc). #PCDATA only.
    Classes
    att.global
    att.responsibility
    Contains
    character data
    Example:
    <creation>
      <creatDate>1998</creatDate>
      <creatRef>(Erstveröffentlichung: 
        Oberhausen, 1998)</creatRef>
      <creatRefShort>(Erstv. 1998)</creatRefShort>
    </creation>

    This spec fragment is referred to by specgroup-IDS-specific.

  • creatRefShort: short version of reference to (creation of text and) first edition.
    Spec fragment ids-header-creatRefShort
    <creatRefShort> element (new)
     
    child of creation (in profileDesc). #PCDATA only.
    Classes
    att.global
    att.responsibility
    Contains
    character data
    Example:
    <creation>
      <creatDate>1959</creatDate>
      <creatRef>(Erstveröffentlichung: 
        Frankfurt a.M., 1959)</creatRef>
      <creatRefShort>(Erstv. 1959)</creatRefShort>
    </creation>

    This spec fragment is referred to by specgroup-IDS-specific.

  • d.title: document title; a context-specific specialization of TEI title.
    Spec fragment ids-header-d.title
    <d.title> element (new)
     
    child of titleStmt. #PCDATA only, otherwise a context-specific specialization of TEI title.
    Classes
    att.global
    att.responsibility
    Contains
    character data

    This spec fragment is referred to by specgroup-IDS-specific.

  • dokumentSigle: document ID (formerly BOTD).
    Spec fragment ids-header-dokumentSigle
    <dokumentSigle> element (new)
     
    child of titleStmt. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <titleStmt>
      <dokumentSigle>A01/AUG</dokumentSigle>
      <d.title>St. Galler Tagblatt, August 2001</d.title>
    </titleStmt>

    This spec fragment is referred to by specgroup-IDS-specific.

  • further: further edition of the same source with year (BOT+gg)
    Spec fragment ids-header-further
    <further> element (new)
     
    child of edition. #PCDATA only.
    Classes
    att.global
    att.responsibility
    Contains
    character data
    Example:
    <edition>
      <further>5. Auflage 1998 (1. Auflage 1997)</further>
      <kind/>
      <appearance/>
    </edition>

    This spec fragment is referred to by specgroup-IDS-specific.

  • kind: kind of edition of the source (BOT+g)
    Spec fragment ids-header-kind
    <kind> element (new)
     
    child of edition. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <monogr>
      <h.title type="main">Im Gegenteil</h.title>
      <h.title type="sub">Kolumnen 1986-1990</h.title>
      <h.title type="abbr" level="m">Bichsel: Im Gegenteil</h.title>
      <h.author>Bichsel, Peter</h.author>
      <editor/>
      <edition>
        <further/>
        <kind>suhrkamp taschenbuch</kind>
        <appearance/>
      </edition>
      ...
    </monogr>

    This spec fragment is referred to by specgroup-IDS-specific.

  • korpusSigle: corpus ID (formerly BOTC).
    Spec fragment ids-header-korpusSigle
    <korpusSigle> element (new)
     
    child of titleStmt. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <titleStmt>
      <korpusSigle>A01</korpusSigle>
      <c.title>St. Galler Tagblatt 2001</c.title>
    </titleStmt>

    This spec fragment is referred to by specgroup-IDS-specific.

  • numRange (in file ids.xesdoc.dtd): member of the token class, modeled on timeRange and dateRange. (Not present in samples.)
    Spec fragment ids-doc-numRange
    <numRange> element (new)
     
    a range of numbers.
    Classes
    att.text
    model.token
    model.basic
    Contains
    %model.basic;*
    Attributes
     
    from
    Type
    an XSD string value
    Values
    yes
    no
    to
    Type
    an XSD string value
    Values
    yes
    no
    type
    Type
    an XSD string value

    This spec fragment is referred to by specgroup-IDS-specific.

  • pagination: whether page numbering is present or not (processing info; formerly BOTP).
    Spec fragment ids-header-pagination
    <pagination> element (new)
     
    a range of numbers.
    Classes
    att.global
    Contains
    character data
    Attributes
     
    type
    Type
    an XSD NMTOKEN value
    Values
    yes
    no
    Example:
    <editorialDecl Default="n">
      <pagination type="yes"/>
    </editorialDecl>

    This spec fragment is referred to by specgroup-IDS-specific.

  • reference: bibliographic reference string.
    Spec fragment ids-header-reference
    <reference> element (new)
     
    a child of sourceDesc.
    Classes
    att.header
    att.responsibility
    Contains
    character data
    Attributes
     
    type
    Type
    an XSD NMTOKEN value
    Values
    complete
    super
    short
    former
    assemblage
    Type
    an XSD NMTOKEN value
    Values
    external
    regular
    non-automatic
    existence
    Type
    an XSD NMTOKEN value
    Values
    no
    yes
    origin
    Type
    an XSD NMTOKEN value
    Values
    BOTfile
    notBOTfile
    Example:
    <reference type="complete" assemblage = "regular">A01/JAN.02562 St.
                            Galler Tagblatt, [Tageszeitung], 13.01.2001, Jg. 57. - Originalressort:
                            TB-KLN2 (Abk.), [Bericht]</reference> 

    This spec fragment is referred to by specgroup-IDS-specific.

  • t.title: text title. A context-specific specialization of TEI title.
    Spec fragment ids-header-t.title
    <t.title> element (new)
     
    child of titleStmt. #PCDATA only, otherwise a context-specific specialization of TEI title.
    Classes
    att.global
    att.responsibility
    Contains
    character data
    Attributes
     
    assemblage
    Type
    an XSD NMTOKEN value
    Values
    external
    regular
    non-automatic

    This spec fragment is referred to by specgroup-IDS-specific.

  • textDomain: subject area of the text (BOT+r)
    Spec fragment ids-header-textDomain
    <textDomain> element (new)
     
    child of textDesc. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <textDesc>
      <textTypeArt/>
      <textDomain>Wissenschaft</textDomain>
      <column>Wissenschaft</column>
    </textDesc>x

    This spec fragment is referred to by specgroup-IDS-specific.

  • textSigle: text ID (formerly BOTT).
    Spec fragment ids-header-textSigle
    <textSigle> element (new)
     
    child of titleStmt. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <titleStmt>
      <textSigle>A01/JAN.02562</textSigle>
      <t.title assemblage="regular"
        >A01/JAN.02562 St. Galler Tagblatt, 
        13.01.2001, 
        Ressort: TB-KLN2 (Abk.)</t.title>
    </titleStmt>

    This spec fragment is referred to by specgroup-IDS-specific.

  • textType: type type according to type inventory (BOT+x)
    Spec fragment ids-header-textType
    <textType> element (new)
     
    child of textDesc. #PCDATA only.
    Classes
    att.header
    Contains
    character data
    Example:
    <textType>Zeitung: Tageszeitung</textType>
    <textType>Ausgabenvermerk</textType>
    <textType>Anmerkung</textType>

    This spec fragment is referred to by specgroup-IDS-specific.

  • textTypeArt: text type of a specific article (BOT+xa).
    Spec fragment ids-header-textTypeArt
    <textTypeArt> element (new)
     
    child of textDesc. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <textDesc>
      <textTypeArt>Bericht</textTypeArt>
      <textDomain/>
      <column>TB-KLN2 (Abk.)</column>
    </textDesc>

    This spec fragment is referred to by specgroup-IDS-specific.

  • textTypeRef: text type as it should appear in bibliographic string (BOT+X).
    Spec fragment ids-header-textTypeRef
    <textTypeRef> element (new)
     
    child of textDesc. #PCDATA only.
    Classes
    att.global
    Contains
    character data
    Example:
    <textDesc>
      <textType>Zeitschrift: Wochenzeitschrift</textType>
      <textTypeRef>Wochenzeitschrift</textTypeRef>
      <textTypeArt>Interview</textTypeArt>
      <textDomain>Gesellschaft</textDomain>
      <column/>
    </textDesc>
    Example:
    <textDesc>
      <textType>Zeitung: Tageszeitung</textType>
      <textTypeRef>Tageszeitung</textTypeRef>
    </textDesc>

    This spec fragment is referred to by specgroup-IDS-specific.

  • x.title (in file ids.xheader.elt): child of titleStmt; title of some object which is not a corpus (which would use c.title), not a document in the IDS-specific sense (which would use d.title), and not a text in the IDS sense (which would use t.title). Contains #PCDATA.
    Spec fragment ids-header-x.title
    <x.title> element (new)
     
    child of titleStmt; title of some object which is not a corpus (which would use c.title), not a document in the IDS-specific sense (which would use d.title), and not a text in the IDS sense (which would use t.title). Contains #PCDATA.
    Classes
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-IDS-specific.

Spec fragment specgroup-IDS-specific
Include spec fragment ids-header-appearance.
Include spec fragment ids-header-c.title.
Include spec fragment ids-header-column.
Include spec fragment ids-header-creatDate.
Include spec fragment ids-header-creatRef.
Include spec fragment ids-header-creatRefShort.
Include spec fragment ids-header-d.title.
Include spec fragment ids-header-dokumentSigle.
Include spec fragment ids-header-further.
Include spec fragment ids-header-kind.
Include spec fragment ids-header-korpusSigle.
Include spec fragment ids-doc-numRange.
Include spec fragment ids-header-pagination.
Include spec fragment ids-header-reference.
Include spec fragment ids-header-t.title.
Include spec fragment ids-header-textDomain.
Include spec fragment ids-header-textSigle.
Include spec fragment ids-header-textType.
Include spec fragment ids-header-textTypeArt.
Include spec fragment ids-header-textTypeRef.
Include spec fragment ids-header-x.title.
Include spec fragment ids-derik-timestamp (!! not found !!).
Include spec fragment ids-cmc (!! not found !!).

This spec fragment is referred to by IDS-additions.

4.6 Module, classes and elements from the TEI CMC SIG proposals " Beißwenger/Ermakova/Geyken/Lemnitzer/Storrer (2013): An XML Schema for the Representation of CMC Genres in TEI"

The following elements are not present in the TEI or in CES/XCES, but have been proposed as part of a TEI extension for Computer-mediated communication by BBAW and Dortmund Technical University. The part adopted here is the one that declares the posting structure
  • Posting:
    Spec fragment posting
    <posting> element (new)
     
    describes a stretch of text that an individual user has produced in private and then passed on to the server through performing a "posting" action (usually by hitting the [ENTER] key on the keyboard or by clicking on a [SEND] or [SUBMIT] button on the screen). Postings are the largest structural units in CMC documents that can be assigned to one author and one point in time. Their function is to make a (written) contribution to the ongoing dialogue.
    Classes
    model.divLike
    att.datable
    att.global
    att.typed
    att.ascribed
    Contains
    (#PCDATA | %model.headLike; | opener | %model.pLike; | %model.gLike; | %model.phrase; | %model.inter; | %model.global; | lg | %model.lLike; | %model.divBottom;)*
    Attributes
     
    indentLevel
    (add this attribute)
    marks the (relative) level of indentation of the respective posting (as defined by its author and in relation to the standard level of indentation which is described as „0“).
    Type
    data.count

    This spec fragment is referred to by specgroup-DERIK.

  • autoSignature:
    Spec fragment autoSignature
    <autoSignature> element (new)
     
    is an empty element used for representing the position of the user signature position in a posting.
    Classes
    model.pPart.edit
    att.pointing
    att.global
    Contains
    (#PCDATA | s | timestamp)*
    Attributes
     
    type
    Values
    (closed list)
    signed
    indicates that the corresponding posting was explicitly signed by a registered user using a user signature mark up (e.g. ~~~~).
    unsigned
    indicates that the corresponding posting was marked by either a registered or unregistered user using the Unsigned or Help template.
    user_contribution
    "user_contribution" indicates that the corresponding posting was marked using a [[Special:Contributions/IP]] link (e.g by an unregistered user)
    special_contribution
    added 2019-06-14 This is actually the same as "user_contribution" "special_contribution" indicates that the corresponding posting was marked using a [[Special:Contributions/IP]] link (e.g by an unregistered user)

    This spec fragment is referred to by specgroup-DERIK.

  • signatureContent:
    Spec fragment signatureContent
    <signatureContent> element (new)
     
    is used to describe an individual user's signature the header of the user profile. [Comment for I5.odd by HLU 2013-09-05: this element will not be available as long as there is no element using the model.persStateLike, like listPerson.]
    Classes
    model.persStateLike
    att.global
    Contains
    (#PCDATA | ref | %model.hiLike; | %model.milestoneLike; | figure)*

    This spec fragment is referred to by specgroup-DERIK.

  • emoticon:
    Spec fragment emoticon
    <emoticon> element (new)
     
    describes an interaction sign which is an iconic unit that has been created with the keyboard and which typically serves as an emotion or irony marker or as a responsive.
    Classes
    model.gLike
    att.global
    Contains
    character data
    Attributes
     
    style
    describes the native region of an emoticon.
    Values
    (closed list)
    Western
    Japanese
    Korean
    other
    systemicFunction
    describes the general, context-independent function of the emoticon
    Values
    (closed list)
    emotionMarker:positive
    emotionMarker:negative
    emotionMarker:neutral
    emotionMarker:unspec
    virtualEvent
    illocutionMarker
    ironyMarker
    responsive
    contextFunction
    describes the function of the respective instance of the emoticon in its given context.
    Values
    (closed list)
    emotionMarker:positive
    emotionMarker:negative
    emotionMarker:neutral
    emotionMarker:unspec
    virtualEvent
    illocutionMarker
    ironyMarker
    responsive
    topology
    the position of the emoticon relative to the text to which it belongs.
    Values
    (closed list)
    front_position
    back_position
    intermediate_position
    standalone

    This spec fragment is referred to by specgroup-DERIK.

  • interactionWord: +
    Spec fragment interactionWord
    <interactionWord> element (new)
     
    describes an interaction sign which is a symbolic linguistic unit whose morphologic construction is based on a word or a phrase and describes expressions, gestures, bodily actions, or virtual events―cf. the units sing, g (< grins, “grin”), fg (< fat grin), s (< smile), wildsei (“being wild”).
    Classes
    model.global.spoken
    att.global
    Contains
    character data
    Attributes
     
    formType
    is used to describe morphological properties of the interaction word.
    Values
    (closed list)
    simple
    complex
    abbreviated
    systemicFunction
    describes the general, context-independent function of the interaction word.
    Values
    (closed list)
    emotionMarker:positive
    emotionMarker:negative
    emotionMarker:neutral
    emotionMarker:unspec
    virtualEvent
    illocutionMarker
    ironyMarker
    responsive
    contextFunction
    describes the function of the respective instance of the interaction word in its given context.
    Values
    (closed list)
    emotionMarker:positive
    emotionMarker:negative
    emotionMarker:neutral
    emotionMarker:unspec
    virtualEvent
    illocutionMarker
    ironyMarker
    responsive
    semioticSource
    is used to describe the semiotic mode that forms the basis for an interaction word.
    Values
    mimic
    gesture
    bodyReaction
    sound
    action
    process
    emotion
    sentiment
    topology
    the position of the interaction word relative to the text to which it belongs.
    Values
    (closed list)
    front_position
    back_position
    intermediate_position
    standalone
    openingTag
    closingTag

    This spec fragment is referred to by specgroup-DERIK.

  • interactionTerm:
    Spec fragment interactionTerm
    <interactionTerm> element (new)
     
    describes instances of one or several interaction signs (i.e., of emoticons, interaction words, interaction templates, and/or addressing terms).
    Classes
    model.phrase
    att.global
    Contains
    (emoticon | interactionWord)*

    This spec fragment is referred to by specgroup-DERIK.

  • timestamp:
    Spec fragment timestamp
    <timestamp> element (new)
     
    is an empty element used for representing the timestamp in a posting, which was automatically inserted when the user pressed a button. This element is an addition by IDS, i.e. not from the Derik ODD.
    Classes
    model.pPart.edit
    att.pointing
    att.global
    Contains
    character data

    This spec fragment is referred to by specgroup-DERIK.

Spec fragment specgroup-DERIK
Include spec fragment posting.
Include spec fragment autoSignature.
Include spec fragment signatureContent.
Include spec fragment timestamp.
Include spec fragment emoticon.
Include spec fragment interactionTerm.
Include spec fragment interactionWord.

This spec fragment is referred to by IDS-additions.

4.7 Elements from the TEI Correspondence SIG proposal

Copied from the github site of the TEI Correspondence SIG, specifically from the file proposal.xml as of 2015-01-08

4.7.1 LICENSE for the correspondence Elements

Copyright (c) 2013, TEI-Correspondence-SIG

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

4.7.2 The following was copied from the file proposal.xml:
Spec fragment specgroup-Correspondence
<<moduleSpec ident = "correspondence" mode = "add">>
 
Module for correspondence, including letters, telegrams, postcards, e-mail etc.
<</moduleSpec>>
Add class model.correspDescPart.
 
groups together metadata elements for describing correspondence
Add class model.correspContextPart.
 
groups elements which may appear as part of the correspContext element
Add class model.correspActionPart.
 
groups elements which define the parts (usually names, dates and places) of one action related to the correspondence.
<correspDesc> element (new)
i.e.
correspondence description
 
a wrapper element for metadata pertaining to correspondence
Classes
model.profileDescPart
att.declarable
att.canonical
att.global
att.typed
Contains
(%model.correspDescPart;+ | %model.pLike;+)
Example:
<<egXML>>
<<ct:correspDesc>>
<<ct:correspAction type = "sending">>
<<persName>>Adelbert von Chamisso<</persName>>
<<settlement>>Vertus<</settlement>>
<<date when = "1807-01-29">>29 January 1807<</date>>
<</ct:correspAction>>
<<ct:correspAction type = "receiving">>
<<persName>>Louis de La Foye<</persName>>
<<settlement>>Caen<</settlement>>
<</ct:correspAction>>
<<ct:correspContext>>
<<ref type = "prev" target = "http://tei.ibi.hu-berlin.de/berliner-intellektuelle/manuscript?Brief023ChamissoandeLaFoye#1">>Previous letter of Adelbert von Chamisso to Louis de La Foye: 16 January 1807<</ref>>
<<ref type = "next" target = "http://tei.ibi.hu-berlin.de/berliner-intellektuelle/manuscript?Brief025ChamissoandeLaFoye#1">>Next letter of Adelbert von Chamisso to Louis de La Foye: 07 May 1810<</ref>>
<</ct:correspContext>>
<</ct:correspDesc>>
<</egXML>>
<correspAction> element (new)
 
contains a structured description of the place, the name of a person/organization and the date related to the sending/receiving of a message or any other action related to the correspondence
Classes
att.global
att.typed
att.sortable
model.correspDescPart
Contains
%model.correspActionPart;+
Attributes
 
type
(change this attribute)
Type
data.enumerated
Values
(open list)
sending
identifies a/the sending action of the message
receiving
identifies a/the receiving action of the message
transmitting
identifies a/the transmitting action of the message
redirecting
identifies a/the redirecting action of the message
forwarding
identifies a/the forwarding action of the message
Example:
<<egXML>>
<<ct:correspAction type = "sending">>
<<persName>>Adelbert von Chamisso<</persName>>
<<settlement>>Vertus<</settlement>>
<<date when = "1807-01-29">>29 January 1807<</date>>
<</ct:correspAction>>
<</egXML>>
<correspContext> element (new)
i.e.
correspondence context
i.e.
Korrespondenzstelle
 
provides references to preceding or following correspondence related to this piece of correspondence
Classes
model.correspDescPart
Contains
%model.correspContextPart;+
Example:
<<egXML>>
<<ct:correspContext>>
<<ptr type = "next" subtype = "toAuthor" target = "http://tei.ibi.hu-berlin.de/berliner-intellektuelle/manuscript?Brief101VarnhagenanBoeckh">><</ptr>>
<<ptr type = "prev" subtype = "fromAuthor" target = "http://tei.ibi.hu-berlin.de/berliner-intellektuelle/manuscript?Brief103BoeckhanVarnhagen">><</ptr>>
<</ct:correspContext>>
<</egXML>>
Example:
<<egXML>>
<<ct:correspContext>>
<<ref target = "http://weber-gesamtausgabe.de/A040962">>Previous letter of Carl Maria von Weber to Caroline Brandt: December 30, 1816<</ref>>
<<ref target = "http://weber-gesamtausgabe.de/A041003">>Next letter of Carl Maria von Weber to Caroline Brandt: January 5, 1817<</ref>>
<</ct:correspContext>>
<</egXML>>
Change class model.nameLike.
Classes
model.correspActionPart
Change class model.dateLike.
Classes
model.correspActionPart
Change class model.ptrLike.
Classes
model.correspContextPart
Change class model.pLike.
Classes
model.correspContextPart
Change class model.addressLike.
Classes
model.correspActionPart
Change element note:
Classes
(add) model.correspDescPart
(add) model.correspActionPart
(add) model.correspContextPart

This spec fragment is referred to by IDS-additions.

5 Top-level driver

Spec fragment ids_v2a
Include spec fragment required-modules.
Include spec fragment optional-modules.
Include spec fragment cmc-core.
Include spec fragment IDS-additions.

6 Conformance and design issues

This section of the document records some conformance and design issues which may need attention.

  • The current IDS/XCES DTD, and the DeReKo documents which conform to it, use no namespaces. Conforming customizations of TEI P5, however, are required to use the TEI namespace http://www.tei-c.org/ns/1.0 for TEI elements, and to put extensions to the vocabulary in a different namespace.

    That is, if the goal of the I5 project is to create a customization of TEI P5 which (1) accepts existing DeReKo documents as valid and (2) conforms to TEI P5, then the two goals are not simultaneously satisfiable.

  • One of the goals for this ODD file is to produce a schema which more or less matches the existing IDS/XCES DTD in the effective document grammar. Another is to ensure that the DeReKo documents provided as samples are valid against the document grammar defined here.

    These goals prove to be incompatible: some documents in the samples are not valid against the DTD. Specifically:
    • Document DeReKo-Sample-08/mld.sample.xces contains a number of idsDoc elements with numeric IDs (for example <idsDoc id="951" type="text" version="1.0" TEIform="TEI.2">); these numeric strings are not type-valid against the ID type.
    This ODD document defines the id attribute as having type ID, which means that the documents just mentioned remain invalid. (The alternative of declaring it as having ytpe NCName would render other documents invalid.)
  • One of the goals for defining I5 is to align IDS's XML vocabulary better with TEI P5. In cases where the existing IDS/XCES vocabulary deviates from TEI P5, this goal raises the question: should I5 follow TEI P5 or the existing IDS/XCES DTD?

    This ODD document follows the existing DTD in all cases where the change made in the DTD is necessary to make DeReKo data (as represented by the samples available to the author) valid. In other cases, however, this ODD document follows TEI P5. In particular, if all instances of an element in the available samples are valid against the TEI P5 declaration of that element type, no change was made to the declaration. This means that as defined here, I5 follows TEI P5 and not the existing IDS/XCES DTD in all cases where IDS/XCES DTD restricts elements to a subset of what is allowed by the TEI's definition.

    In some cases, this may make the schema defined here looser than desired.

  • In the same spirit, element types taken over by the IDS/XCES DTD from TEI P3 that are not present in TEI P5 have been declared here if and only if instances of the element types are present in the samples. So xptr has been declared here, but xref, dateRange, and timeRange have not been declared.

    It is easy to add declarations for these element types if they proved to be needed.


A References

ACH/ACL/ALLC 1994
Association for Computers and the Humanities, Association for Computational Linguistics, and Association for Literary and Linguistic Computing. 1994. Guidelines for Electronic Text Encoding and Interchange (TEI P3). Ed. C. M. Sperberg-McQueen and Lou Burnard. Chicago, Oxford: Text Encoding Initiative, 1994.
Ide 1998
Ide, Nancy. 1998. “Corpus Encoding Standard: SGML Guidelines for Encoding Linguistic Corpora.” Proceedings of the First International Language Resources and Evaluation Conference, 463–470. Granada, Spain.
Ide/Bonhomme/Romary 2000
Ide, Nancy, Patrice Bonhomme, and Laurent Romary. 2000. “XCES: An XML-based Standard for Linguistic Corpora.” Proceedings of the Second Language Resources and Evaluation Conference (LREC), 825–830. Athens, Greece.
IDS 2006
Institut für deutsche Sprache. [IDS/XCES DTD]. Mannheim: IDS, 2006. On the WEb at http://corpora.ids-mannheim.de/idsxces1/DTD/
Kupietz n.d.
[Kupietz, Marc.] IDS-Textmodell: Unterschiede gegenüber XCES. Mannheim: IDS, n.d. On the WEb at http://www.ids-mannheim.de/kl/projekte/korpora/idsxces.html
TEI 2001
TEI Consortium [together with] The Association for Computers and the Humanities, The Association for Computational Linguistics, and The Association for Literary and Linguistic Computing. 2001. TEI P4: Guidelines for Electronic Text Encoding and Interchange. Ed. C. M. Sperberg-McQueen and Lou Burnard. XML conversion by Syd Bauman, Lou Burnard, Steven DeRose, and Sebastian Rahtz. Oxford, Providence, Charlottesville, Bergen: The TEI Consortium, 2001, rpt. 2002.
TEI 2007
TEI Consortium. 2007. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Ed. Lou Burnard and Syd Bauman. Oxford, Providence, Charlottesville, Nancy: The TEI Consortium, 2007, rev. 2010.

B TEI elements suppressed

This appendix lists elements present in TEI P5 (and present modules included by this ODD file) which are suppressed.

B.1 Elements suppressed from the Core module

The following elements in the TEI core module are suppressed. Almost all of these were explicitly suppressed by CES and XCES, but some (binaryObject, choice, desc, graphic, measureGrp, and said) were not present in TEI P3 or TEI P4; they were added to TEI in TEI P5.
  • add (addition): contains letters, words, or phrases inserted in the text by an author, scribe, annotator, or corrector.
  • binaryObject: provides encoded binary data representing an inline graphic or other object.
  • cb (column break): marks the boundary between one column of a text and the next in a standard reference system.
  • choice: groups a number of alternative encodings for the same point in a text.
  • del (deletion): contains a letter, word, or passage deleted, marked as deleted, or otherwise indicated as superfluous or spurious in the copy text by an author, scribe, annotator, or corrector.
  • desc (description): contains a brief description of the object documented by its parent element, including its intended usage, purpose, or application where this is appropriate.
  • divGen (automatically generated text division): indicates the location at which a textual division generated automatically by a text-processing application is to appear.
  • expan (expansion): contains the expansion of an abbreviation.
  • headItem (heading for list items): contains the heading for the item or gloss column in a glossary list or similar structured list.
  • headLabel (heading for list labels): contains the heading for the label or term column in a glossary list or similar structured list.
  • index (index entry): marks a location to be indexed for whatever purpose.
  • listBibl (citation list): contains a list of bibliographic citations of any kind.
  • measureGrp (measure group): contains a group of dimensional specifications which relate to the same object, for example the height and width of a manuscript page.
  • meeting: contains the formalized descriptive title for a meeting or conference, for use in a bibliographic description for an item derived from such a meeting, or as a heading or preamble to publications emanating from it.
  • milestone: marks a boundary point separating any kind of section of a text, typically but not necessarily indicating a point at which some part of a standard reference system changes, where the change is not represented by a structural element.
  • postBox (postal box or post office box): contains a number or other identifier for some postal delivery point other than a street address.
  • postCode (postal code): contains a numerical or alphanumeric code used as part of a postal address to simplify sorting or delivery of mail.
  • resp (responsibility): contains a phrase describing the nature of a person's intellectual responsibility.
  • rs (referencing string): contains a general purpose name or referring string.
  • said (speech or thought): indicates passages thought or spoken aloud, whether explicitly indicated in the source or not, whether directly or indirectly reported, whether by real people or fictional characters.
  • series (series information): contains information about the series in which a book or other bibliographic item has appeared.
  • sic (latin forthusorso): contains text reproduced although apparently incorrect or inaccurate.
  • soCalled: contains a word or phrase for which the author or narrator indicates a disclaiming of responsibility, for example by the use of scare quotes or italics.
  • street: a full street address including any name or number identifying a building as well as the name of the street or route on which it is located.
  • teiCorpus: contains the whole of a TEI encoded corpus, comprising a single corpus header and one or more TEI elements, each containing a single text header and a text.
  • unclear: contains a word, phrase, or passage which cannot be transcribed with certainty because it is illegible or inaudible in the source.

B.2 Elements suppressed from the TEI header module

The following elements in the TEI header module are suppressed:
  • appInfo (application information): records information about an application which has edited the TEI file.
  • application: provides information about an application which has acted upon the document.
  • authority (release authority): supplies the name of a person or other agency responsible for making an electronic file available, other than a publisher or distributor.
  • cRefPattern (canonical reference pattern): specifies an expression and replacement pattern for transforming a canonical reference into a URI.
  • geoDecl (geographic coordinates declaration): documents the notation and the datum used for geographic coordinates expressed as content of thegeoelement elsewhere within the document.
  • handNote (note on hand): describes a particular style or hand distinguished within a manuscript.
  • interpretation: describes the scope of any analytic or interpretive information added to the text in addition to the transcription.
  • namespace: supplies the formal name of the namespace to which the elements documented by its children belong.
  • notesStmt (notes statement): collects together any notes providing information about a text additional to that recorded in other parts of the bibliographic description.
  • principal (principal researcher): supplies the name of the principal researcher responsible for the creation of an electronic text.
  • refState (reference state): specifies one component of a canonical reference defined by the milestone method.
  • rendition: supplies information about the rendition or appearance of one or more elements in the source text.
  • scriptNote: describes a particular script distinguished within the description of a manuscript or similar resource.
  • seriesStmt (series statement): groups information about the series, if any, to which a publication belongs.
  • sponsor: specifies the name of a sponsoring organization or institution.
  • stdVals (standard values): specifies the format used when standardized date or number values are supplied.
  • teiHeader (TEI Header): supplies the descriptive and declarative information making up an electronic title page prefixed to every TEI-conformant text.
  • typeNote: describes a particular font or other significant typographic feature distinguished within the description of a printed resource.

B.3 Elements suppressed from TEI text structure module

The following elements from the TEI text structure module are suppressed:
  • argument: A formal list or prose description of the topics addressed by a subdivision of a text.
  • div1 (level-1 text division): contains a first-level subdivision of the front, body, or back of a text.
  • div2 (level-2 text division): contains a second-level subdivision of the front, body, or back of a text.
  • div3 (level-3 text division): contains a third-level subdivision of the front, body, or back of a text.
  • div4 (level-4 text division): contains a fourth-level subdivision of the front, body, or back of a text.
  • div5 (level-5 text division): contains a fifth-level subdivision of the front, body, or back of a text.
  • div6 (level-6 text division): contains a sixth-level subdivision of the front, body, or back of a text.
  • div7 (level-7 text division): contains the smallest possible subdivision of the front, body or back of a text, larger than a paragraph.
  • docDate (document date): contains the date of a document, as given (usually) on a title page.
  • floatingText: contains a single text of any kind, whether unitary or composite, which interrupts the text containing it at any point and after which the surrounding text resumes.
  • group: contains the body of a composite text, grouping together a sequence of distinct texts (or groups of such texts) which are regarded as a unit for some purpose, for example the collected works of an author, a sequence of prose essays, etc.
  • imprimatur: contains a formal statement authorizing the publication of a work, sometimes required to appear on a title page or its verso.

B.4 Elements suppressed from optional modules

The following elements in the TEI analysis module are suppressed:
  • c (character): represents a character.
  • cl (clause): represents a grammatical clause.
  • interp (interpretation): summarizes a specific interpretative annotation which can be linked to a span of text.
  • interpGrp (interpretation group): collects together a set of related interpretations which share responsibility or type.
  • m (morpheme): represents a grammatical morpheme.
  • pc (punctuation character): a character or string of characters regarded as constituting a single punctuation mark.
  • phr (phrase): represents a grammatical phrase.
  • span: associates an interpretative annotation directly with a span of text.
  • spanGrp (span group): collects together span tags.
The following elements in the TEI corpus module are suppressed:
  • activity: contains a brief informal description of what a participant in a language interaction is doing other than speaking, if anything.
  • channel (primary channel): describes the medium or channel by which a text is delivered or experienced. For a written text, this might be print, manuscript, e-mail, etc.; for a spoken one, radio, telephone, face-to-face, etc.
  • constitution: describes the internal composition of a text or text sample, for example as fragmentary, complete, etc.
  • derivation: describes the nature and extent of originality of this text.
  • domain (domain of use): describes the most important social context in which the text was realized or for which it is intended, for example private vs. public, education, religion, etc.
  • factuality: describes the extent to which the text may be regarded as imaginative or non-imaginative, that is, as describing a fictional or a non-fictional world.
  • interaction: describes the extent, cardinality and nature of any interaction among those producing and experiencing the text, for example in the form of response or interjection, commentary, etc.
  • locale: contains a brief informal description of the kind of place concerned, for example: a room, a restaurant, a park bench, etc.
  • preparedness: describes the extent to which a text may be regarded as prepared or spontaneous.
  • purpose: characterizes a single purpose or communicative function of the text.
  • setting: describes one particular setting in which a language interaction takes place.
  • settingDesc (setting description): describes the setting or settings within which a language interaction takes place, either as a prose description or as a series of setting elements.