Editorial Guide

The schema of WCO project is based on the Text Encoding Initiative (TEI) P5 Guidelines widely used open standard defined to provide device-independent, system-independent methods of storing and processing texts in electronic form. Our Guidelines are to be used as an introduction to our transcriptions and an outline of tags used.

The publication of manuscripts on the web means that there should be distinguished editorial model to be used and followed by the author of publication. The digital publication can be of different types like raw texts on the web, reading editions etc. And each of these types can be considered from the viewpoint of representing Georgian manuscripts and problems associated with this representation. Thus, it is important to describe the principle of coding and elements appearing in these files including header and body to ensure their use in a proper way. Also, there are two lists of indexes for persons and places met during the compilation of digital edition.

Manuscript encoding

A single xml file consists of the metadata, a set of images and full text of the manuscript:

  • <teiHeader> contains all metadata determined for a concrete manuscript;
  • <facsimile> contains a set of images;
  • <text> contains the text of a manuscript.

The header (<teiHeader>)

In accordance with the TEI P5 Customization and Encoding Guidelines provided by the Bodleian library, the <teiHeader> element consists of four subdivisions as follows:

  • <fileDesc> - contains a full bibliographic description of a file so called the main characteristics;
  • <profileDesc> - provides a detailed description of non-bibliographic aspects of a text;
  • <encodingDesc> - shows the relationship between an electronic text and the source or sources from which it was derived and, specially, describes editorial rules of publication;
  • <revisionDesc> - summaries the revision history for a file.

Inside the header, titles are given in English and Georgian (generally, in Mkhedruli script, Asomtavruli and Nuskhuri are used for marginalia, items, titles, lists of persons and places etc.)

<fileDesc> encompasses the following elements:

  • <titleStmt> - groups information about the title of a work and those responsible for its content, so, the title of the document (<title>) and responsibility for the edition <resptStmt> were encoded.
  • The <title> element contains two types of titles a main one i.e. represented in the catalogue (the shelf-mark No) and its alternatives written in English, Modern Georgian and ALA-LC form, e.g.
  • <title type="full">
      <title type="main">MS. Georg. d. 2</title> 
      <title type="alt" xml:lang="en">Typicon of the Georgian monastery of the Holy Cross near Jerusalem</title>
      <title type="alt" xml:lang="ka">ტიპიკონი ჯვრის მონასტრისა</title>
      <title type="alt" xml:lang="ka-la">tipikoni jvris monastrisa</title> 
  • The <funder> element contains "JISC";
  • The <principal> element always contains the name of Gillian Evison, the Keeper of Oriental Collections of Bodleian Libraries.

Types of Digital Editions

Raw texts – is a digital representation of text without images represented by means of any alphanumeric characters in Unicode or ASCII. UTF8 (Unicode) is a standard for all three types of Georgian scripts as Mkhedruli and Asomtavruli and Nuskhuri; its use doesn’t need installation of additional fonts and doesn’t cause problems with regards to the character representation on the web. The problem occurs if a server interprets Unicode characters differently and shows Georgian characters, especially, Asomtavruli and Nuskhuri used with Diacritics in a form of square. The use of ASCII needs special fonts like AcadNusx, LitNusx etc. and depends on the reader’s possibility to find and install them. If a reader can’t find them, the web should show Georgian letters written in Latin script, which follows neither transliteration (ALA-LC, ISO 9984:1996) nor transcription (IPA) standards;

Reading edition – is a publishing format that allows readers to read a text in the form prepared by a concrete scholar or editor. The main problem with this format is that it creates a false impression that texts, in our case, Old or Middle Georgian follow the structure and characteristics of modern texts, which consist of letters, words, punctuation marks and white spaces with clear understanding of chapters, paragraphs etc. In the majority of cases Old Georgian manuscripts never followed the above-mentioned structure of representation; punctuation marks were not represented at all except of dot (.) and three dots (჻) used to display sometimes end of paragraph, sometimes end of word; the use of white spaces was fragmentary, somehow, chaotic and there is a great difference between chapters, paragraphs etc. represented in reading edition from those represented in manuscripts;

Critical edition – is a result of comparison between fragments of manuscripts revealing the original or most significant form of a text. The determination of such text is based on the quite famous Lachmann’s method, which allows an editor or scholar to reconstruct a text not on the basis of a single manuscript corrected sporadically, but on the systematic collection, examination, classification, and evaluation of all the extant witnesses, including manuscripts, citations, scholia, and other evidence (Most, 2016). Generally, there are two possibilities: a) to represent a text of a concrete manuscript with comments or, b) to prepare a mixed version of textual fragments reconstructing the best text of a manuscript. Both of these approaches are widely used in publishing of Georgian manuscripts in printed or online versions and comprise the majority of problems already described with regards to reading editions;

Semi-diplomatic edition – is a text, which reproduces the original document as closely as possible, but makes it accessible to readers by means of appropriate expansions or explanations;

Diplomatic edition (some scholars determine Ultra Diplomatic Editions as well (Piarezzo, 2015) - is a transcription of a concrete manuscript preserving as much as possible the original reading, punctuation, line divisions, marginalia etc. Semi-diplomatic and diplomatic editions of Georgian manuscripts available online, generally, are based on their critical printed editions and share problems similar to those described previously. The corpus of Middle and Old Georgian can be considered as an example of diplomatic representation of Georgian manuscripts;

Facsimile editions – is a text preserved in the form of photography. This type of editions is not very frequent in case of Georgian manuscripts, the exception is a Menologion for March-August, followed by biographies of saintly women and some other manuscripts available at Digital Bodleian.

  • <editionStmt> - groups information relating to one edition of a text, so, it includes information on the particularities of edition (<edition>) and the organisation responsible for the funding of a project (<funder>). The transcribed texts contain information on the funder of the project, especially, the Shota Rustaveli National Science Foundation;
  • <publicationStmt> - groups information concerning the publication or distribution of an electronic or other text, so, there were mentioned name of the Bodleian library responsible for the distribution of a bibliographic item (<publisher>, <distributor>), the postal address of a publisher (<address>) and identifiers (<idno>) i.e. this part of the document contains information on the Special Collections of Bodleian Libraries and is amended by the information of availability of text by means of <availability> element;
  • <sourceDesc> - describes the source from which an electronic text was derived or generated, typically a bibliographic description in the case of a digitised text and contains the following:
    • <msDesc> - a description of a single identifiable manuscript, which contains the following units:
      • <msIdentifier> - manuscript identifier;
      • <msContents> - describes manuscript parts;
      • <physDesc> - contains a full physical description of a manuscript;
      • <history> - groups elements describing the full history of a manuscript;
      • <additional> - groups additional information about the manuscript.

In case of <sourceDesc> the scheme of Digital Bodleian was expanded by the following elements:

  • <listWit> - an element important for the description of critical apparatus, which lists definitions for all the witnesses referred to by a critical apparatus;

This list consists of a <witness> element for each available fragment of a text with appropriate @xml:id to be referred from inside the text. The basic structure of this list is as follows:


  <witness xml:id="W1">

Typicon of the Shio-Mgvime Monastery: XIII c., 
prepared for publication by E. Kochlamazashvili and E. Giunashvili,
Tbilisi: Shio-Mgvime Monastery Pub.,
2005 (see further
<ptr target="http://eprints.iliauni.edu.ge/7918/"/>)
</witness> </listWit>
  • <listPerson> - a list of descriptions, each of which provides information about an identifiable person or a group of people referred to in a historical source;

The list consists of a <listPerson> element extended by @type attribute and <person> element for each person represented in the list. The <person> element contains <persName> element with @xml:lang attribute. The values of @xml:lang attribute comprise of ka, oge and en. The basic structure of this list is as follows:

<listPerson type="saints">

  <head xml:lang="ka">წმინდანები</head>

  <head xml:lang="en">Saints</head>

  <person xml:id="person315">

    <persName xml:lang="ka">თეოდორე</persName>

    <persName xml:lang="oge">ⴇⴄⴍⴃⴍⴐⴄ</persName>

    <persName xml:lang="en">Theodore, Theodore of Sykeon</persName>

    <!-- Byzantine Saint, Q3526533 -->



  • <listPlace> - a list of places.

The list of places is contained in a <listPlace> element extended by @type attribute and <place> element for each place. The <place> element is extended by <placeName> element in Old and Modern Georgian and English languages, <location> and <geo> elements inside. The basic structure of this list is as follows:

<listPlace type="places">

  <head xml:lang="ka">გეოგრაფიული სახელწოდებები</head>

  <head xml:lang="en">Geographical Places</head>

  <place xml:id="place002">

    <placeName xml:lang="ka">იერუსალიმი</placeName>

    <placeName xml:lang="oge">ⴈⴄⴐⴓⴑⴀⴊⴈⴋⴈ</placeName>

    <placeName xml:lang="en">Jerusalem</placeName>


      <geo>31.47 35.13</geo>


   <!-- http://dbpedia.org/page/Jerusalem, Q1218 -->



The last two elements are very important, especially, from the viewpoint that the majority of texts belonging to an Old Collection have unique information on saints, historical people and different places referred to in manuscripts.

After <fileDesc>, the second top-level of the header is <encodingDesc>, which defines taxonomies defining by means of classificatory codes in the following way:


     <taxonomy xml:id="LCSH">


          <ref target="http://id.loc.gov/authorities/about.html#lcsh">Library of Congress Subject Headings</ref>





The third top-level of the header is <profileDesc>, which specifies the languages of the manuscript in the following way:


  <language ident="ka" usage="99">Georgian Language</language>

  <language ident="el" usage="1">Greek Language</language>



Each language is identified by means of the attribute @ident. The approximate percentage of language use is given in the attribute @usage. In cases there is only one language, @usage attribute is omitted. This level, also, determines the <textClass> of manuscript in accordance with The Library of Congress Subject Headings (LCSH) in the following way:


   <keywords scheme="#LCSH">



           <term key="subject_sh2006003480">Christian Literature, Georgian</term>






The last top-level of header is <revisionDesc>, which summarises the revision history of the file. Each revision is given in a <change> element indicating the date of <change> by means of the attribute @when and the name of the person responsible for the changes by means of <persName>, e.g.


  <change when="2019-12-15">

   <persName>Irina Lobzhanidze</persName>Expanded meta-annotation and provided the transcription.


  <change when="2017-07-26">

   <persName>James Cummings</persName> Up-converted the markup using <ref 
</change> <change when="2011-05-12"> <persName>Nikoloz Aleksidze</persName> </change> </revisionDesc>

The facsimile (<facsimile>)

The <teiHeader> element is followed by <facsimile>, which contains a representation of written source in the form of a set of images aligned towards original text. Not all manuscripts are equipped with this element. This element is attached only in case a Digital Bodleian has photomaterial available for end-user and provides links to the main host of photomaterial using <surface>, <zone> -> and <graphic> elements.

  • <surface> defines a written surface as a two-dimensional coordinate space;
  • <zone> defines any two-dimensional area within a surface element;
  • <graphic> indicates the location of illustration providing an image of it.

In case there are not images of a manuscript, <facsimile> element is omitted.

The text (<text>)

The <facsimile> element is followed by <text>, which always contains the <body>. The elements which can be met inside the <body> may appear in any Georgian manuscript and can be easily adopted for any kind of digital project.

Text division (<div>) and heading (<head>)

In our project, the <body> element contains a sub-element <div>. Depending on the structure of texts, <div> sub-element was extended by a @type attribute indicating difference between Old and Modern Georgian texts and chapters and a @n attribute indicating the number of division, e.g.


  <div type="original" n="1">

    <div type="chapter" n="1">



  <div type="translation" xml:lang="ka" n="2">

    <div type="chapter" n="1">




Each chapter has heading elements mentioned in <msContents> of <teiHeader> and marked from within the text. Chapter headings are given in the <head> element within the <div> element indicating the beginning of a new chapter, e.g.

<head hand="#h2">

  <hi rend="color(FF0000)"></hi><hi rend="color(FF0000)">ⴣⴎⴈⴉⴍⴌⴈ ⴑⴀⴄⴉⴊⴄⴑⴈⴍⴢⴑⴀ ⴜⴄⴑⴈⴑⴀⴢ<lb/>

  ⴉⴇ&#x0360;ⴊ ⴂ&#x0360;ⴌⴂⴄⴁⴓ<damage><gap quantity="2" unit="chars" reason="illegible"/></damage>
>ⴈⴑⴀ ⴜ&#x0360;ⴈⴑⴀⴃⴀ ⴖ&#x0360;ⴇ ⴘⴄⴋⴍ<lb/> ⴈⴊⴈⴑⴀ ⴋ&#x0360;ⴋⴈⴑⴀ ⴙ&#x0360;ⴌⴈⴑⴀ ⴑⴀⴁⴀⴢⴑⴊⴀⴅⴐⴈ<lb/> <gap quantity="1" unit="chars" reason="illegible"/>ⴀⴢ ⴐ&#x0360;ⴈ ⴄⴑⴄ ⴄⴑⴐⴄⴇⴅⴄ ⴈⴕⴋⴌⴄⴁⴈⴑⴑⴞ&#x0360;ⴇⴀ<lb/> ⴚⴀ ⴗⴇⴀ ⴋⴍⴌⴀⴑⴒⴄⴐⴇⴀ ⴘ&#x0360;ⴀ ⴎⴀⴊⴄⴑⴒⴈⴌⴈ<lb/> ⴑⴀⴇⴀ: Ⴂ&#x0360;ⴌⴜⴄⴑⴄⴁ&#x0360;ⴢ ⴖⴀⴋⴈⴑⴇⴄⴅⴈⴑⴀⴢ჻</hi> </head>

Page breaks and catchwords (<pb>)

Page breaks are defined in <pb> element, which has several attributes a @f attribute to represent folio number, a @n attribute to indicate the page number and a @facs attribute used to link to the image, e.g.

<pb f="2r" n="3" xml:id="pb-orig-2r-3" facs="#zone-pb-2r-3"/>

Catchwords represented at the bottom of pages are placed into <fw> element with a @type attribute in the following way.

<fw type="catch">სა</fw><lb/>

<pb n="2v" n="4" xml:id="pb-orig-2v-4" facs="#zone-pb-2v-4"/>

Paragraphs (<p>) and line breaks (<lb>)

Each <div> element consists of paragraphs, which are included in <p> element and similarly to <head> element amended by @hand attribute used for denoting different hands in titles, paragraphs and additions. The manuscripts show different type of hands described by <handDesc> element of <teiHeader> and pointed to by @hand attribute from within the text. All line breaks are tagged with the </lb> element placed at the end of line. Line breaks elements are not placed before the closing of paragraph, e.g.

<p hand="#h3">

  ქ჻ლაზარე უ<choice><am>&#x0360;</am><ex>ფალს</ex></choice>ა Ⴊავრენტის<lb/>

  უღირსსა ბერსა ლავრენტის<lb/>

  შეუწევს ღ&#x0360;ნი. ჩ.ყ.ვ.


Physical condition of the manuscript: damage (<damage>), gap (<gap>) and supply (<supplied>)

Sometimes the text of manuscript is omitted or illegible, because of damage. The <damage> element is used to show an area of damage in the text. This element, sometimes, is expanded by <gap> element, e.g.

<damage><gap quantity="1" unit="chars" reason="illegible"/></damage>

When a piece of text is missing without any kind of physical damage, we use the <gap> element. This element is expanded by the following three attributes: a @quantity specifies the quantity of units omitted, a @unit names the units (words, characters) used for the measurement and is, generally, represented by chars value and, a @reason gives the reason for omission by means of illegible, omitted and other values, e.g.

<gap quantity="1" unit="chars" reason="illegible"/>

As it was mentioned above, old Georgian manuscripts were translated to Modern Georgian, because it is important to adapt old texts to the needs of readers. All omitted or damaged sections were restored and amended by appropriate Modern Georgian words. For this purpose <supplied> element was used with appropriate indication of @reason, e.g.

<supplied reason="illegible"></supplied>

<supplied reason="omitted"></supplied>

Changes in the manuscript: deletions (<del>), additions (<add>)

Handwritten manuscripts contain deletions and additions accordingly defined by <del> and <add> elements. A deletion or an addition made by a different hand is expanded by a @hand attribute pointing to the corresponding <handDesc> element in the header. A deletion element has a @rend attribute with the following values: strikethrough or overwritten, e.g.

<del rend="strikethrough"><gap quantity="4" unit="chars" reason="illegible"/></del>მღდელმონაზონსა


<del rend="strikethrough" hand="#h3">მღდელ</del>მონაზონსა

The <subst> element groups deletion and addition if they are met together in the following way:


<del rend="overwritten"><hi rend="color(FF0000)"></hi></del>



There are different kind of additions and/or notes to the text made by authors, copyists and/or readers of texts. For this purpose, the <add> element has a @place attribute indicating the place of insertion. @place attribute uses the following values:

  • above - above the line;
  • below - below the line;
  • bottom - at the foot of the page;
  • top - at the top of the page, e.g.
<hi rend="color(FF0000)">Ⴤ</hi><add place="above">&#x0360;</add>

But the majority of Georgian medieval manuscripts are amended by marginalia comprising different additions to the main text e.g. The Typicon of the Holy Cross near Jerusalem, XIV c. contains a lot of marginalia indicating numbers of psalms to be read in parallel with the main text. So, the @place attribute of <add> element in case of marginalia was expanded by means of the following values:

  • margin-left - on the left margin;
  • margin-right - on the right margin;
  • margin-bottom - on the bottom margin;
  • margin-top - on the top margin;
  • margin-right-vertical - text written vertically on the right margin;
  • margin-left-vertical - text written vertically on the left margin; e.g.
<add place="margin-right-vertical" hand="#h6">Ⴀⴋ&#x0360;ⴑⴅⴄ ⴃⴖⴄⴑⴀ ⴎ&#x0360;ⴄ ⴕⴀⴐⴇⴅⴄⴊⴈⴑⴀ:</add>

Unclear segments: unclear (<unclear>), supplied (<supplied>)

The <unclear> element with the attributes @quantity, @unit and @reason is used for segments that are difficult to read, e.g.

<unclear quantity="1" unit="chars" reason="illegible"></unclear>&#x0360;ⴊⴑⴀ

If the segment is missing, the <gap> element with @quantity, @unit and @reason attributes is used. But, if we know or guess the omitted text, we use <supplied> instead of <gap>

<hi rend="color(FF0000)"></hi><supplied reason="omitted">ემდგომად</supplied>

Tint of ink: shift of hand (<handShift>) and highlighted mark (<hi>)

Georgian medieval manuscripts are characterised by different representation of characters, words and sometimes fragments. It means that they can be written with different types of inks. In the majority of cases Asomtavruli characters used to represent titles, initial letters of a word or a paragraph are written with red, yellow or other ink distinct from the rest of the text, while Nuskhuri and Mkedruli - with black. In accordance with the recommendations of TEI, such cases can be easily described by means of <handShift> element with @medium attribute to indicate the tint of ink in the following way:

<handShift medium="red"/><handShift medium="black"/><ex>&#x0303;</ex>

The values of @medium attribute are as follows: red, yellow, black. In our project we tried to avoid <handShift> element with purpose to show the whole section written in different ink. So, we have used <hi> element to mark a word or phrase as graphically distinct from the surrounding text with @rend attribute indicating appropriate colour, e.g.

<hi rend="color(FF0000)"></hi><ex>&#x0303;</ex>

This approach is similar for diplomatic and critical editions.

Abbreviations (<choice>)

Georgian Medieval manuscripts contain a lot of abbreviated forms for suspension, contraction, truncation, brevigraphs etc. These forms represented by means of titlo diacritics can be left expanded in diplomatic edition, but the adaptation of text to Modern Georgian as well as its comparison with already published fragments of manuscripts require their expansion, i.e. the omitted characters should be restored. TEI P5 generally suggests use of <choice>, <am> and <ex> elements. In our project we have used <am> element to show signs presented in an abbreviation and <ex> element to show letters added by an editor, e.g.

For Old Georgian text:


For its translation into Modern Georgian:


Special characters

All our .xml documents are UTF-encoded, it means that the majority of Asomtavruli, Nuskhuri and Mkhedruli characters are available in Unicode. However, some characters, especially, diacritics are typed in accordance with numeric character reference placed after letters in the following way:

͠ - combining double tilde, &#x0360;

͛ - combining zigzag above, &#x035B;

̒ - combining turned coma above, &#x0312;

̄ - combining macron, &#x0304;

̇ - combining dot above, &#x0307;

͂ - combining greek perispomeni, &#x0342;

̂ - combining circumflex accent, &#x0302;

The quantity of unidentifiable glyphs so called gaiji is not too big. If they are represented in the text, they are marked with <g/> element.

Date (<date>) and numerical values

Dates are placed in a <date> element expanded by @when attribute supplying a normalized representation of the date: year-month-day, e.g.

<date when="2019-12-15">15 December 2019</date>


<date when="2019-12">December 2019</date>


<date when="2019">2019</date>


<date when="--12-15">15 December</date>


<date when="--12">December</date>

In the majority of Georgian medieval manuscripts numerical values are represented by means of characters with titlos. Such kind of numerical values are placed in <num> element expanded by @value attribute, e.g.

<num value="50">&#x0360;</num>

Named entities: persons (<persName>) and places (<placeName>)

Every time a name of a place or a person appears in a text, it is placed in <placeName> or <persName> elements. The names are linked to the index entry using @key attribute, which points to the corresponding @xml:id and to the wikidata using @ref attribute, which points to corresponding number, e.g.

<placeName key="place001" ref="Q23792">პალესტინისათა</placeName>

<persName key="person016" ref="Q44258"><choice><am>&#x0360;</am><ex></ex></choice>სილისი</persName>

Critical apparatus: apparatus (<app>)

The critical apparatus depends on the existence of published and/or unpublished manuscript fragments and their description from within a <listWit> element of header listing all the witnesses indicated within the apparatus. To record textual variants, we have used <app> element containing a lemma and one or more readings on the relevant passage made by different scribers or publishers. The critical apparatus entry includes <lem> element containing a base text and <rdg> elements containing a single reading within a textual variation. The last one is expanded by @wit attribute to link the given text and the reference id described in a list of witnesses, e.g.



  <rdg wit="#W1">თჳთოეუ</rdg>


In our project we followed the rules of negative critical apparatus by means of the parallel segmentation method. In a negative critical apparatus, all the witnesses, which are not listed in a @wit attribute bear the text of the <lem> element. This type of apparatus is quicker to encode and the parallel segmentation method allow us to place the apparatus entries inline, instead of being placed outside the text or in the additional .xml file.


For the Romanization table and Online Converter for Georgian and ALA-LC Romanization, follow the Converter for Georgian and ALA-LC Romanization.