Attention: These guidelines are for internal use only.
Version: Created 1999. Updated December 2004 by Natasha Smith and Elizabeth Wright. Special thanks to Marisa Ramírez.
Scope: These guidelines are written for graduate Research Assistants working on the digital initiative Documenting the American South. DocSouth Research Assistants should use these guidelines to work with texts encoded in SGML and prepared by our outsourcing company, Apex Data Services. Research Assistants will receive in-house training before using these guidelines. Please share your comments with Natasha Smith, Digitization Librarian.
We use Standard Generalized Markup Language (SGML) to encode all original materials that we publish on the web. We follow the rules and guidelines for encoding in SGML provided by the Text Encoding Initiative (TEI). For more background, please see "A Gentle Introduction to SGML" by C. M. Sperberg-McQueen and Lou Burnard, available at http://www.tei-c.org/Papers/gentleguide.pdf.
We use SGML to divide the text into its constituent parts. SGML-encoded texts have a hierarchical structure that is based on an analysis of the text's content. The formatting of an original text should not drive the SGML encoding, but rather the meaning and structure of a text should determine the encoding.
SGML, like many markup languages, uses Elements and Attributes. An element, also referred to as a tag, surrounds a block of text and describes the function of that text. An attribute further defines an element. These guidelines explain which elements to use for common textual content and explain which attributes must be assigned to certain elements.
We use SoftQuad Author/Editor 3.5 to edit TEI SGML files. Open Author/Editor by going to programs in the start menu. Use File>Open to select a file to work on. Check the bottom right corner of the program window to make sure it says "Rules Checking: On." If the rules are not on, go to "Special" and choose "Turn Rules Checking On."
Also, you should see the SGML tags surrounding the text in the window. If the tags do not show up, go to "View" and choose "Show Tags."
There are lots of keyboard shortcuts and ways to make your editing job more efficient in Author/Editor.
Our outsourcing company encodes all texts in SGML according to DocSouth specifications. When we receive the marked-up file, we check it carefully to verify it has been encoded correctly. The bulk of the work has already been done, but often the file needs further markup and some corrections.
Make sure you have the original book or a full copy of the original in front of you while you are encoding. The first step is to look through the book and analyze its structure. For encoding purposes, we interpret the book as a hierarchical structure. A book often has three large sections: Front, Body, and Back. A document must include a Body, but the Front and Back are not always present. Within these large sections, we nest smaller sections, like chapters and paragraphs.
The Front section includes all preliminary material before the real content begins. The Front may include:
Look carefully at the book and make sure you can tell where the Front section begins and ends.
Now, examine the end of the book. The Back section may include an index, epilogue, appendix, afterword, or other added material that appears after the main portion of the text.
The main content of the book makes up the Body section. In a novel this would be all the chapters of the novel. In a report, this would include the main articles, letters, table and data that comprise the report.
After you have analyzed the original book, check the encoded text. The <front> tag should surround all the material that belongs in the Front section. If you disagree with the way the text has been divided, add or subtract encoded text to the Front. All encoded text should be in the Front, Body, or Back sections. If you move something out of the Front section, you have to move it into the Body section. There are several ways to move text in and out of larger sections. For assistance see Lisa or Natasha.
Go to the end of the file and check to make sure the Back section has been surrounded by the <back> tag correctly. The Body tag should surround all the rest of the text between the Front and Back.
The text within the Front, Body, and Back sections is divided into Divisions. Everything except the <titlePage> section should be surrounded by a <div1> tag. <div1> is then divided into smaller sections starting with <div2>. <div2> sections may include <div3> and <div3> sections may include <div4> and so on, as needed. Smaller divisions are nested within larger divisions. Division tags should reflect the structure of the original document. Does the document have illustrations, a table of contents, list of illustrations, or index? Is the text composed of chapters, letters, poems, diary entries, or short stories? Are chapters divided into subchapters? Does a poem or letter appear within a chapter? Are poems or short stories grouped according to theme or author, etc.
Division tags should be assigned to correspond with the natural divisions of the text. In the front portion of the text, the title page and title page verso are surrounded by the <TitlePage> tag. All other sections, such as the table of contents, list of illustrations, dedication, introduction, prologue, etc., should each be surrounded by a <div1> tag.
Please note that the front matter should also contain separate divisions (<div1>) for each image that will be included in the digitized version of the text. Examples of these images would include: cover, spine, frontispiece, title page, and title page verso. For more information see the section about figures.
Remember that the tag for the last division within the front section should be placed directly before the closing </front> tag. In this way, tags are nested within each other. For example:
Example: Common Front Divisions
Divisions within the Body of the text are often more complicated than divisions within the Front and Back. In deciding how to assign divisions to the text body, examine the first page of the text's main body (look for the open tag <body> tag). Does the document's title appear on this page in addition to chapter, short story, section, or poem titles/designations? If so, the entire section within the <body> tag needs to be surrounded by a <div1> tag and each chapter, poem or other subdivision should be surrounded by <div2>. If not, each chapter, short story, poem or main section will be surrounded by the <div1> tag—unless it is more logical to divide those sections in a different manner. For example, if a collection of poems features selections by several authors that are grouped according to author, each author's section of poems should be surrounded by <div1> and each individual poem by <div2>.
Example: Document title on first page of the <body>
Example: Document with no title on first page of the <body>.
Example: Document with poems by one author and no title on first page.
Example: Document with poems by several authors, grouped according to author, and title on first page.
Example: Document with poems by several authors not grouped by author, no title on first page.
Note: If two or more poems by one author appear in succession at some point in the above example (even though this is not the general pattern in the document itself), it would be permissible to surround that section with <div1> and each of the poems in that section with <div2> (or <div2> and <div3> if a title appears on the first page).
Example: Diary entry in a text divided into chapters, no title on first page of <body>.
Example: Diary entries in a text not divided into chapters, with no title on first page of <body>.
Please Note: In many cases, a letter, poem, or other item may appear within a chapter, diary entry, or other division. If the item is a poem or quotation that appears before the beginning of the chapter text, it is tagged as an epigraph. However, if a letter, poem, quotation, etc., appears in the middle of a document's chapter, it is surrounded with the <q> tag. See the Quotation section for further information.
Please Note: See also the section on Figures for a description on how to encode illustrations and other graphic materials that appear within the text.
Divisions for the back section (if one is present) follow much the same pattern as those for the front section. Each index, epilogue, or other section should be surrounded by <div1>.
Example: A typical Back section.
Each division must be assigned an appropriate "type=" attribute. Our outsourcing company usually assigns all "type=" attributes, but you should double-check the attributes for the divisions as you review the file. Sometimes you will create new divisions and you must remember to assign the "type=" attribute.
Begin by placing the cursor after the <div> tag. Hit the F6 key or select Markup from the Author/Editor toolbar and then Edit Attributes from the menu. On the screen that appears type in a description of the division contents beside the "type=" line. Typical descriptions include:
When the division contains an image, the description should be specific to that image. For example, the division created for the image of the cover should be called "cover image". The division created for the frontispiece should be called "frontispiece image", etc. The description should correspond to whatever materials are contained within the <div> tag. You will assign these descriptions for all division levels (e.g., <div1>, <div2>, (<div3>, etc.). After typing in the description, click on the Apply button, or hit enter.
According to DocSouth convention, the page break tag <pb> precedes the text of the relevant page, independently of where this page number appears in the original text. Our outsourcing company inserts all page breaks and assigns <pb> attributes. Check several page breaks to ensure that the page numbers have been assigned correctly.
The <pb> tag should be assigned attributes for "id=" and "n=". To view <pb> attributes, place the cursor within the tag and press F6. The correct "id=" should be the letter "p" plus the page number. The "n=" should be just the page number. For the title page verso, the "id=" should be "pverso" and the "n=" should be "verso".
Example: Attributes for a <pb> tag at the top of Page 34.
Example: For page iii in the introduction to a book.
Example: For the title page verso of a book.
Please Note: In cases where a division begins at the top of a page, it is important that the <pb> tag is placed within the <div> tag, not before.
Example: A <pb> tag with attributes where a division begins on that page.
One of the most common sections you will encode is the transcription of a title page and title page verso. The title page transcription should appear in the file after all the front matter images and any other transcribed pages that precede it. Title pages vary widely in style, appearance, and the amount of information they contain. Depending on the information available, make sure the appropriate tags surround the information. The following tags should be used for encoding title pages:
<titlePage> used to surround the title page and title page verso, if present. Do not put a <div> tag around the <titlePage> tag.
<docTitle> used to surround one or more <titlePart> tags.
<titlePart> used to surround the title of the document as it appears on the titlepage. A title page always includes a "main" <titlePart>, and sometimes also a "subtitle". Always define the "type=" attribute for <titlePart>. Use two <titlePart> tags in cases where the author clearly indicates that there is a main title and a lesser title. Look for punctuation or syntax that indicate a clear split between a main title and a description of the book. Semicolons, colons, and the phrase "or," are common indicators that a second title follows. Many DocSouth books have very long titles; please enclose the entire title as it appears on the title page within the <titlePart> tag. Consult with Lisa or Natasha as needed.
<byline> used to surround the word "by" or the phrase "written by" if this word or phrase appears on its own line in the title page. If the word "by" appears inline with the author's name, use <docAuthor>
<docAuthor> used to surround just the name of the author or editor of the document, sometimes used with the <byline> tag)
<epigraph> used to surround any quote or verse (anonymous or attributed) that may appear on the title page or title page verso. (See also guidelines for encoding epigraphs.)
<docEdition> used to surround the document's edition statement found on the title page or title page verso.
<docImprint> used to surround the imprint statement (place and date of publication, name of the publisher).
<docDate> used to surround the date of the document.
<pubPlace> used to surround the location of the publisher.
<publisher> used to surround the name of the document's publisher.
Example: A brief title page.
Example: A fuller title page without a title page verso.
Example: A fuller title page with a title page verso.
The <lb> tag is used to place line breaks within a set of tags when the preservation of the appearance of the original is important. Line breaks are most often used on title pages to retain the appearance of the original. Do not use line breaks between tags.
Example: A title page with a long title using line breaks and two <titlePart> tags.
Please note: In the example above, William Wells Brown's name is included as part of the title because a part of the title follows his name. E.g. "With a Sketch of the Author's Life." Consult with Lisa or Natasha as needed.
In addition to the main title, many documents feature chapter or section headings. These headings should be surrounded by the <head> tag. No attributes need to be assigned for headings.
Example: Front section with an introduction and table of contents both with headings.
Example: First page of a document with the document title and chapter designation present.
Example: First page of a document with only chapter title and designation present.
Example: Head tags in a book of poems where each poem is surrounded by <div1>.
Example: Diary entry in a document divided into chapters with no document title on first page.
The <p> tag is the most common element in most books. Use <p> to surround each paragraph. No attributes need to be assigned for paragraph tags.
Please Note: When rules checking is on, you cannot surround a paragraph with the <p> tag unless the section in which the paragraph is located (chapter, diary entry, etc.) is first surrounded by a division tag.
Example: A chapter of text.
Lines of poetry are not surrounded by the paragraph tag, but instead have their own group of tags. Verse (even if it is only one line) should be surrounded by the line group <lg> tag.
Each <lg> must be assigned an attribute for "type=". Valid attribute values are:
Once the section/stanza has been surrounded by the <lg> tag, each line of poetry needs to be surrounded by the line tag <l>, even if the line of poetry physically takes up more than one line in the original. The <l> surrounds a complete line of verse in the poem or song, and does not reflect the visual appearance of the verse on the page. No attributes need to be assigned for the <l> tag.
Example: A simple, one-stanza poem.
To accommodate many stanzas within a longer poem, you can nest line group elements for each stanza within the line group for the entire poem.
Example: A many-stanza poem.
Please note: When one or few lines of verse are included in a paragraph of prose, these lines (or one line) of verse should be surrounded by <q> tag, followed by <lg> element and each line with <l> tag. For more explanation, see the section on quotations.
Please Note: Sometimes the poem is very long or the structure of the document makes it impossible to surround the entire poem in a line group. In these cases, assign <lg> to each stanza only. See Lisa or Natasha with any questions.
Occasionally, you may encounter texts that use the conventions of a printed drama. Examples include an entire work that is a drama, a book of collected works in which there is one or more dramas, or a book that excerpts from a drama or presents some of its dialog in dramatic style. In any of these cases, you will need to use a specific set of tags to markup the textual elements of drama. Most dramas include acts and scenes, stage directions, speaker headings, and speeches. Speeches may be in verse or prose.
Use <div1>, etc., to divide a dramatic work into acts, or scenes. Use the <HEAD> tag to surround the text that announces the act and scene.
Use <stage> to surround any stage directions or descriptions. These elements usually appear in italics in the original text.
Use <sp> to surround an entire speech made by a character, including the speaker's name at the beginning. Use <speaker> to surround the name of the person who is speaking at the beginning of the speech. If the character's speech is in verse, surround each line of verse with <L>. If the speech is in prose, surround the each paragraph in <P>.
Example: A drama in verse.
Please Note: Oral history interviews also use the <sp> and <speaker> tags. See the section on Oral History Interviews below.
We provide images of all graphic materials that appear in the original. At the beginning of the text, this includes any front matter images, such as images of the cover, title page, title page verso, etc. Images from the original are scanned in-house and saved to the shared DocSouth drive, (i:\ drive). In the SGML file, you will encode a reference to the image file.
Each illustration is surrounded by the <figure> tag and assigned attributes. Each <figure> tag must be within a <p> tag.
Front matter images. All front matter images should be encoded directly after the teiHeader and before the transcriptions of any of the preliminary pages such as the dedication or title page. Each front matter image should be placed within its own <div1> tag (assign the appropriate "type=" attribute). Within the <div1> tag, insert a <p> tag. Nested within the <p> tag, insert the <figure> tag. Within the <figure> tag, insert the <p> tag for the caption.
Captions. You must include a caption for every image. Captions should be encoded within a <p> tag. Nest the <p> tag within the <figure> tag. If the original does not have a caption, please provide one in square brackets. You should capitalize the first letter of every significant word when you provide your own caption. Use the following captions for front matter images:
For small drawings or figures that appear at the end of a chapter or section, supply the caption "vignette" in square brackets. For most other illustrations, supply the caption "Illustration" in square brackets.
Captions and Headings. All images in any book or other item should have a caption. If the original has no caption, please supply an appropriate caption in square brackets. The caption is encoded within a <p> tag inside the <figure> tag. Occasionally, an image with have a heading. A heading is a description or designation that comes above the image. Headings are also encoded within the <figure> tag using the <head> tag.
Example: A few front matter images.
Example: A figure with a heading.
Images between chapters. Sometimes the illustrations for a book will appear in between chapters or other section breaks. For these images, the <figure> will be in its own division.
Example: A figure the appears between chapters, where each chapter is a <div2>.
Images within chapters. If the image is placed in the middle of a paragraph of text, insert the <figure> tag within that paragraph. If the image appears between paragraphs, first insert a <p> tag, then insert the <figure> tag within the <p>.
Example: An image in the middle of a paragraph within a chapter.
Example: A supplied caption for a drawing at the end of a book.
If the image has a title (something that appears above the image), it should be surrounded by a <head> tag.
Create entities for images. Figure tags must be assigned values for the id and entity attributes. First, create an entity for each image associated with the document. Go to Entities>Edit Text Entities. In the Edit Text Entity screen, select the line that says #DEFAULT. For name type in the name of file without the extension. Look at the images on the i:\ drive to learn their filename. Next to content type in the full name of the image file (ex., smithcv.jpg). Click on the new button to add that image to the entity list. Please Note: Do not use upper case in assigning entities.
Assign attributes for each figure tag. With your cursor inside the <figure> tag, press F6 to edit the attributes for that tag. Each figure tag must be assigned a unique id and the correct entity must be selected from the drop down menu. All front images have a special "id=" according to the type of image. The following should be used for front images: cover, spine, frontis, title, verso. For the back cover, please assign 'id="back"'. If there is more than one title page or more than one frontispiece, see Lisa or Natasha.
All other images should be assigned an "id=" beginning with "ill" and followed by a number that reflects the order of the images in the book. The first illustration of a text should be assigned 'id="ill1"'. For the 24th illustration of a text, the id would be "ill24".
To assign a value to the entity attribute, select the entity that you've created for the illustration from the drop down list provided.
If a text includes a list, use the <list> tag to surround the entire list. Surround each item in the list with the <item> tag. If the list
Example: A basic list.
Encoding tables is very similar to the process of encoding lists. A basic table is arranged with a title on top, and with a series of columns running down, and rows running across. See this example:

Apex can confuse when it is appropriate to use a list instead of a table. Be alert for this type of mistake, and consult Lisa or Natasha with questions.
The table is divided into <row>(s) and the rows are divided into <cell>(s). The <table> tag must be inside a <p> tag. Make sure each row has a consistent number of cells. Even if a cell is empty, you must include an empty <cell></cell> in that row. For more complex tables, see Lisa or Natasha.
There are two attributes that must be assigned for tables. In most cases, Apex has correctly assigned them. Do spot-checks to make sure the attributes are okay. If they are not, consult with Lisa or Natasha. The <table> tag should be assigned attributes for "rows=" and "cols=", where "rows=" is followed by the number of rows in the table and "cols=" is followed by the number of columns in the table.
The <row> and <cell> elements must be assigned the attribute for "role=". The default value for these elements is "data". For a row across the top of the table that serves to label the data that comes below, the attribute will be 'role="label"'. For all the cells in a left-hand column that serves to label the information appearing to the right, assign each cell the attribute 'role="label"'.
Example: A simple table.
There are many exceptions to the basic table. As you can see in the above example, footnotes must be placed at the end of the table and not inline. It is very important to make sure that all of the appropriate cells line up. This means accounting for empty cells and counting across. For a table, every row must have the same number of cells or the information can become distorted.
If a table runs on for multiple pages, you must break the table at the end of each page and restart it on the next page. If the column header or table title is repeated on the next page in the original, repeat it in the next page in the file. If the table title is not repeated, please provide the title in square brackets at the top of the next page. See Lisa or Natasha with any questions. You will need to change the attributes for the <table> tag if you break up tables as they were encoded by our outsourcing company.
Many tables have information that is not simply in a grid. They may have subdivided column headers, larger blank areas or columns that cease to have information, totals for certain columns but not others, etc. Each of these cases must be evaluated individually. It will be necessary in some cases to forego the encoded table and settle for an image but do not give up too easily. Please ask questions.
Quotations that do not occur inline in the text and that are set off typographically in some way, should be encoded within the <q> element.
Once a section of text has been surrounded by a division tag and either paragraph or line/line group tags, it may be necessary to assign additional tags to highlight special features within the surrounded text. Frequently the text of a document contains words or phrases that appear in italics. These italicized words and phrases need to be surrounded by the highlight tag, <hi>. For the <hi> tag you must assign the rend attribute as italics.
Example: Sentence with the word "she" in italics.
When boldface type occurs within the document title or chapter titles, the <head> tag is sufficient. However, words occasionally appear in bold type within the text of paragraphs, tables of contents, lists and/or poetry. In these cases, surround the word(s) with the <hi> tag and assign "bold" as the "rend=" attribute.
Example: Sentence with words "cats" and "birds" in bold type.
It is common to see a string of asterisks or periods in the middle of poems, chapters, and other formal sections of a text. These informal divisions are called milestones, and are encoded using the <milestone> tag. The <milestone> tag must be assigned attributes for "unit=" and "n=". Always assign the attribute unit="typography". For "n=", type in the way the milestone looks in the original, using the exact number of periods or stars with or without spaces according to the way it appears in the original. Like <pb> and <lb>, the milestone tag should be empty—no characters can appear between the opening and closing tags of the milestone element.
Like words in bold type and italics, foreign words and phrases need to be assigned special tags. You must use your own judgment about whether words or phrases should be encoded as foreign words. Be judicious, not every common Latin or French phrase needs to be marked-up. The purpose of the <foreign> tag is to acknowledge that foreign words and phrases are used that may not be known to the general public. Different texts will require different uses of the foreign tag. Your goal is to balance alerting the reader to the use of foreign languages and maintaining the significance of marking a work with the foreign tag. If you begin working with a text that contains a lot of foreign words, consult with Lisa or Natasha for how to determine what to mark up and what to leave un-marked.
You must assign a Lang IDREF for each <foreign> tag. In the attributes dialog box, fill in a three letter code based on which language is used. Consult the Appendix for a list of language codes that includes the ISO 639 three-letter code and USMARC Code List for Languages (Library of Congress 2003) also available at http://www.loc.gov/marc/languages/langhome.html. Frequently used foreign language codes include:
Example: A sentence with the words "couteau de chasse"
There are two types of letters you can encounter when encoding: (1) letters that are quoted within a chapter or other division and (2) letters that are presented as their own divisions or section. If the letter is quoted within a chapter or other division, you will need to encode it as a quotation.
The only attribute that should be assigned for letters is the designation "letter" next to type for the <q> and <div> tags.
Example: A letter quoted within a chapter.
Please note that this is different from when you are dealing with a collection of letter, or a chapter that includes only letters. For these situations, you will not need to encode the letter as a quotation. Each letter must be its own division within the chapter or division.
Once a letter has been surrounded by a division (either as a div1 within a quotation or as a separate div within a chapter), you will use the following tags to encode the letter:
<opener> surrounds the opening elements of a letter including the date and/or place the letter was written and a salutation or greeting that appears at the beginning of the letter.
<closer> surrounds the closing elements of a letter including any date, location, salutation, or signature that appears at the close of a letter.
<dateline> surrounds the date, place, and/or time in which the letter was written (can appear at the beginning or end of a letter).
<date> surrounds the date of the letter. This element must be surrounded by the <dateline> tag.
<name> if the dateline includes the place from which the letter writer is writing, or the location of the addressee, surround the place name with the <name>. Assign the <name> tag 'type="place"'.
<salute> surrounds any salutation or greeting that opens or closes a letter and is not part of a paragraph within the letter.
<p> surrounds each paragraph (even if only a word or sentence long) that occurs within the letter.
<signed> surrounds the signature of the one who has written the letter.
Example: A simple letter included within a division.
Example: A letter in its own division in which the salutation appears as part of the first paragraph and a dateline appears in the closer.
Working with footnotes with electronic texts is different than footnotes in the print environment. From an encoding perspective, there are two parts to each footnote: the reference and the note. The reference is the superscript number or symbol within the main text on the page that alerts the reader that there is a pertinent note. The note is the text at the bottom of the page, beginning with the number or symbol and then a citation or some further information. To facilitate reading online, we move the note to the point of reference within the text. For example, an asterisk after the second sentence on the page points the reader to a note at the bottom of the page. In the encoded text, the note would be moved so that it appears right after the second sentence.
All numbers or symbols within the text that are references to footnotes must be encoded with the <ref> tag. Assign the following attributes for the <ref> tag: id, rend, target idrefs, n.
The text of the footnote, including whatever number or symbol identifies it, must be surrounded by the <note> tag. Assign the following attributes for the <note> tag: id, rend, anchored, place, target idrefs, n.
Please note: When placing one or more <p>s inside of a <note> element, make sure that there are NO spaces between the <p> tags, as well as between <note> and <p>.
Example: This is correct.
Example: This is NOT correct.
As with figures, we will assign a unique "id=" for every <ref> and <note> in the text. The "id=" for each note should begin with the letter "n" and be followed by the note's ordinal number. The id for each reference will begin with the word "ref", followed by the ref's ordinal number. Thus, the first note in a book will have the 'id="n1"' and the 130th note will have the 'id="n130"'. The first reference in a book with have the 'id="ref1"' and the 130th reference will have the 'id="ref130"'.
ID. The note's "id=" is an assigned attribute and does not relate to whatever number or symbol appears as a label for the note. In this manner, if a footnote found on page 150 of a text is the 25th note in the entire text, it will be note id="n25" even if it is labeled in the text with a superscript 1. In order to insure that a footnote is assigned the correct number, it is important to keep track of all footnotes as you go through a text. It is a good idea to keep a running total on a notepad as you encode to make sure that you assign the correct number and do not assign the same number to two notes.
REND. For rend, enter sc. In addition, caps and bold are also occasionally used. Consult Lisa or Natasha if you are not sure.
TARGET id refs. This attribute creates a circle between the <ref> and the <note>. For each ref, the target should be the note that is associated with it. Thus, for <ref id="ref23"> the target will be "n23".
N. Enter the same number you used for the unique id as the n attribute. If your attributes for id and target id refs are ref23 and n23, then you will enter 23 next to n.
ANCHORED (for <note> only). Make sure "yes" is selected.
PLACE (for <note> only). For footnotes, enter "foot".
Example: A <ref> and <note> pair with attributes assigned.
One or more marginal notes should be placed BEFORE the relevant paragraph or section. Each note should be surrounded by a separate <note> tag.
The <note> attributes are different for marginal notes. Assign id, rend, and n the same as for footnotes. Next to anchored choose no. Next to place, enter the word margin.
In general, an epigraph is a quotation (anonymous or attributed) that appearing at the start of a section or chapter, or on a title page. Be careful not to confuse an epigraph with an argument. If you are unsure, consult with your colleagues, Lisa, or Natasha.
There are two different kinds of epigraphs—those that cite an author, and those that do not. For those with no author, the structure is fairly simple.
Example: A prose epigraph.
Example: A verse epigraph.
For epigraphs that cite an author, the structure is more formal. The <epigraph> element will generally contain <q> element and <bibl> tag for encoding a bibliographic reference.
Example: An attributed epigraph at the beginning of a chapter
An argument usually appears at the beginning of a chapter and summarizes what occurs in that chapter. When using the <argument> tag, you should also surround paragraphs with a <p> tag. This will allow you to assign tags such as <hi> and <foreign> within the <argument> tag. Attributes do not need to be assigned for epigraphs or arguments.
Example: A chapter with an argument
If the text contains mistakes such as incorrect spellings, missing or incorrect punctuation, pagination or other errors, DocSouth generally marks up such errors without correcting them. To mark up errors use the <sic> and <corr> tags.
The <sic> tag is used to surround errors found within the text (especially spelling errors). By surrounding the error, the encoder alerts readers to a mistake attributable to the author or publisher. In this way, the reader will be aware that the mistake is part of the original text and not an error made by the typist or encoder. To encode using the <sic> tag, highlight the error and insert the element <sic>.
Once the error has been surrounded by the <sic> tag, assign the attribute for "corr=". The attribute for "corr=" is the correct spelling of the word. For example, if you have surrounded the word "buisy" with the <sic> tag, the correct spelling "busy" will be entered next to "corr=" on the attributes screen. In this way, the error remains in the main text, but the correction is listed on the attributes screen.
Example: The <sic> tag.
Always run spellcheck in Author/Editor when you have completed reviewing the file. Spellcheck in A/E is helpful, but far from perfect. A/E often is too sensitive and highlights words that actually are spelled correctly. Consult with online dictionaries, including the Oxford English Dictionary, available through the Library's home page by clicking "Articles & More." Correct spellings for many words have changed over time and it's not necessary to markup words that were correctly spelled in the author's time. In addition, if a word is consistently misspelled, it may not be necessary to use the <sic> tag.
The <corr> tag works like the <sic> tag only in reverse. With the <corr> tag, you locate an error, make the correction to the text and then surround it with the <corr> tag. The <corr> tag is preferable for missing elements like punctuation. For other mistakes, please use the <sic> tag.
You must assign the "sic=" attribute for the <corr> tag. For example, if you have a sentence that is missing an end punctuation mark, you can add the punctuation mark and surround it with the <corr> tag. In this instance, assign 'sic="[no punctuation]".
Example: The <corr> tag.
Original:
The carriage was running out of control with no way to stop it and avoid disaster. "What can we do" Sally shouted. Hal didn't know but he hoped they would be rescued by someone?
Correction:
Tables of contents are usually included in the front section of a text and should be surrounded by the <front> tag. The entire table of contents should be surrounded by the <div1> tag with a 'type="contents"' as the attribute. The heading of the table of contents should then be surrounded by the <head> tag. No attributes need to be assigned for the <head> tag. The list of contents should be encoded as a <list>. Surround each entry in the table of contents with the <item> tag.
Each item in the list will include a page number or range of page numbers. Each page number in the table of contents is encoded as a reference. Surround each page number with the <ref> tag. The only attribute that is assigned for the <ref> tag in the table of contents is "target=". In the attributes screen, next to the word target, type in the id of the page that is being referenced. For example, if the second chapter of a book by Smith begins on page 28, the id would be p28. If instead of a single page number, a range of pages is listed, use the number of the first page. As a result, a book by Smith that lists chapter three on pages 43-56 would have an id of p43.
Example: Table of Contents.
Encoding a list of illustrations is almost identical to encoding a table of contents. For the attribute "type=", assign "list of illustrations". In the <ref> tag, assign the figure id for the "target=" attribute, e.g., ill1, ill2.
Example: List of Illustrations.
Indices are surrounded by <div1> tags with the attribute 'type="index"'. The page numbers will NOT need to be encoded, because of the full-text searching capacity provided by web browsers. If Apex has already encoded the references, spot-check to makes sure they were encoded correctly, and leave the ref tags. For an example of an index, see /southlit/greenfact/menu.html.
Left and right double quotation marks, left and right single quotation marks, ampersands, emdashes, and other special characters must be encoded as entities. First, make sure they are in the list of text entities and that they are defined. To define entities entered by Apex, go to Entities>Edit Text Entities. If you want to define the entity ldquo, select that from the menu and below make sure the NAME is ldquo. In the third box, CONTENT, type in “. Click the CHANGE button. This will enable A/E to validate this document. If you click NEW instead of CHANGE, A/E will not let you overwrite the entity.
Left and right single quotation marks should also be encoded. Do NOT use entities for apostrophes or accent marks used in writing dialects.
Tip: To find entities use the Find>Find and Replace command. In the find box, type the entity with an ampersand and a semicolon. For example, if you want to find a left double quotation mark, type “.
Frequently special characters are encountered during encoding, especially in foreign words and phrases. These special characters include letters with diacritics such as accent marks, tildes, circumflex, and umlauts. These special characters must receive special attention during encoding. Apex usually inserts all these entities, but spot-check some of them to make sure they are correct. You will need to define the entities for all these characters.
At DocSouth, we use a similar template for all teiHeaders, but each project has its own slightly different template. To fill out the teiHeader, you will need to consult the Library's online catalog and the title page of the original book or document. Please fill out the teiHeader to the best of your ability, and ask your colleagues for assistance when you have questions.
The teiHeader is a collection of information at the beginning of the encoded text that tells about the electronic text and the original text it was created from, i.e. metadata. The teiHeader has several sections, and it takes practice to fill it out correctly. It is often best to review the teiHeader after you have reviewed all the other encoding for a document because by then you will be more familiar with the text.
The first section of the teiHeader is the <fileDesc>, which includes the <sourceDesc>. The first part of the <fileDesc> describes the electronic edition and the <sourceDesc> describes the original text. The remaining sections of the teiHeader describe the way the book was digitized, the editorial decisions that were made in the process of digitizing the book, the activities that were done in the digitization process, and the cataloging information, including languages.
Example: A teiHeader.
Sometimes a book will include more than one title. See for example, Proceedings of the Bible Convention of the Confederate States of America, available at http://docsouth.unc.edu/imls/biblconv/menu.html. This pamphlet includes the "Proceedings..." as well as "The Word of God..." a sermon given at the convention. The sermon begins after page 24 of the "Proceedings..." and has its own title page as well as pagination. Although these items are bound together and are related, they have a stand-alone structure as well. We encode these separate items using the <group> tag. This way, within the <group> tag we can place multiple instances of the the <text> tag. Both "Proceedings..." and "The Word of God..." are encoded as their own "Text."
To encode a book as a group of texts, surround the entire file from immediately after the TEIHeader with a <text> tag—this is usual practice. Place all the front materials that relate to the entire book in a <front> tag immediately within the first <text>. Examples of front materials for entire books are: front images (such as the cover, frontispiece, etc.), a preface to the full edition, and similar items.
After the shared front matter, surround the rest of the volume, excluding any shared back material, with <group>. Within the <group> tag, surround each separate work with its own <text> tag. Inside the <text> tag you can assign <front> <body> and <back> sections to each separate work. This is especially helpful when there are different title pages for different works included in the book.
Example: How to encode a group of texts.
Check to see that all left and right single and double quotation marks have been correctly labeled and that no quotation marks are missing (quotation marks are frequently omitted in the scanning process and may not have been discovered during previous proofreadings). Use the Find function to check that each left double quotation mark has a matching right double quotation mark. If not, consult with Lisa or Natasha about how to fix it.
Double check to see that all images have been assigned a figure tag; all footnotes have a circular reference; all items in a list are tagged as items; all items in tables of contents, lists of illustrations, and indexes have the appropriate reference tags and attributes assigned; all poetry/verse are surrounded by the appropriate tags.
Once you have thoroughly checked the text perform a spell check on the text by going to Edit on the main toolbar and selecting Check Spelling. A thorough spell check is essential because the Author/Editor spell check often picks up mistakes that previously have been missed.
Occasionally you may work with a text that has a lot of typos in it that were introduced by Apex. Always spot-check for typos by proofreading a few different pages in their entirety. If you find persistent errors, please alert Lisa or Natasha. It may be necessary to proofread the document more carefully in these cases.
Search the text for <unclear> and [UNK]. Our outsourcing company uses this tag and phrase to mark things they could not transcribe. We try to remove all of these and replace them with the text transcription or with the <gap> tag when necessary.
Search the text for the character prime (`) (on the keyboard it is on the same button as the tilde (~)). This character is often mistakenly inserted by Apex. Most of the time you will replace the prime (`) with an apostrophe or a left single quotation mark.
Remove all empty head tags. Meanwhile, be on the lookout for any headings that have italics—Apex often missed the italics in headings.
Remove all instances of the <seg> tag.
In lists and other parts of the text, you may find dot leaders in the original. All strings of periods and hyphens in original works should be transcribed as five periods witha space in between each period: ". . . . .".
Once you have added the TeiHeader and done the final proofreading, you can validate the document to find errors you might have missed. To validate the document, go to Special>Validate Document (or hit the F9 key).
If not, the program will take you to areas that need corrections. Examine these areas, make the appropriate corrections, and continue the validation process. If you do not know how to make a given correction, consult with your colleagues, Lisa, or Natasha. After you have successfully validated the document, save the file as an Author/Editor file. Append the phrase "-done" to the end of the filename before the extension. ".ae".
Note: you may validate sections of your file by highlighting it and choosing Special>Validate Selection
Rules checking is like a safety net: it keeps you from making any structural mistakes. However, when you begin editing a text in Author/Editor, you may need to turn rules checking off to perform certain edits. If Author/Editor gives you a warning that you will need to turn off rules checking, think through what you want to do and then turn off the rules. To turn off the rules, go to Special>Turn Rules Checking On/Off. Turn the rules on again as soon as you have fixed the problem. If you have trouble turning the rules back on, check with Natasha or Lisa.
When using Markup>Insert Element, which provides you with a long list of options to choose from, you can either scroll through the list to find the one you want or hit the key on the keyboard that corresponds to the first letter of the item you are looking for. For example, if you are on the Insert Element dialog box and you want to select the <pb> tag, hit the p key on the keyboard and the screen will take you to the selections that begin with the letter p. If you type pb it will take you to the first tag beginning with p followed by b. You can type in as many letters as you want to move quickly to the correct tag.
There are many other useful tricks to speed up your work in Author/Editor. When you're ready to accelerate, see Lisa.
These guidelines cover the majority of encoding practices used in the DocSouth digital initiative. For questions or further analysis, please make use of the following resources:
Currently, DocSouth is working with the Southern Oral History Program and the Manuscripts Department to digitize several oral history interviews. These interviews are encoded according to the following guidelines.