Introducing the TEI (Text Encoding Initiative) framework for minimal Digital Scholarly Editions

Authors

  • Saniya Irfan Indian Institute of Technology Delhi, India.

Abstract

This study presents a framework for encoding data in the Humanities and Social Sciences according to the Text Encoding Initiative (TEI) principles. The notion of TEI is not widely recognised among the Indian Digital Humanities community. This method paper presents an organised structure for the creation of a Scholarly Digital Edition (SDE) as an archive of significant academic work. The study elucidates the process from data extraction to preprocessing, followed by a detailed guide on encoding data into TEI tags, highlighting the significance of markup and the role of TEI in digital archiving, particularly for Humanities data. The digitisation of cultural manuscripts is essential as it aids the preservation of original documents, increases accessibility, and reduces the necessity for personal interaction with often utilised rare manuscripts. In the forthcoming era, Indian Digital Humanities must adopt innovative techniques to encapsulate material with embedded metadata for enhanced preservation strategies. A Digital Scholarly Edition is a critical depiction of historical materials, carefully curated and digitally displayed to enhance accessibility and understanding for scholars and the general public. Through the use of digital tools, DSEs can offer various text versions, emphasise differences, and deliver comprehensive analyses that are impractical in print formats. This method maintains the original content while augmenting its applicability and significance in modern academia. The Text Encoding Initiative (TEI) is a methodology for converting unstructured, plain digitised text into a Digital Scholarly Edition (DSE) that incorporates encoded metadata, thereby enhancing information retrieval, computational analysis, and visual representation.

Keywords

Indian DH, Humanities Data, Text Encoding Initiative (TEI), Scholarly Digital Edition (SDE) , Encoding

References

Arko, Robert A., Kathryn M. Ginger, Kim A. Kastens, and John Weatherley. 2006. “Using Annotations to Add Value to a Digital Library for Education.” D-Lib Magazine 12 (5). https://doi.org/10.1045/may2006-arko.

Bansode, Sadanand. 2008. “Creation of Digital Library of Manuscripts at Shivaji University, India.” Library Hi Tech News 25 (1): 13–15. https://doi.org/10.1108/07419050810877508.

Cummings, James. ‘The Text Encoding Initiative and the Study of Literature’. A Companion to Digital Literary Studies, edited by Ray Siemens and Susan Schreibman, John Wiley & Sons, Ltd, 2013, pp. 451–76. DOI.org (Crossref), https://doi.org/10.1002/9781405177504.ch25.

CSS Introduction. https://www.w3schools.com/css/css_intro.asp. Accessed 19 Nov. 2022.

Faruqi, Khwaja Ahmad, translator. Dastanbūy: A Diary of the Indian Revolt of 1857 by Mirza Asadullah Khan Ghalib. Asia Publishing House, 1970. (https://franpritchett.com/00ghalib/texts/txt_dastanbu_kafaruqi.pdf)

HTML vs XHTML: Know the Difference [2022 Edition] | Simplilearn. https://www.simplilearn.com/tutorials/html-tutorial/html-vs-xhtml. Accessed 19 Nov. 2022.

Liu, Alan. Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse. p. 37. 2004.

Parks, M.B. 1992. Pause and Effect an Introduction to the History of Punctuation in the West. First. Routledge.

Sahle, Patrick. 2016. “What Is a Scholarly Digital Edition?” Digital Scholarly Editing: Theories and Practices 1:19–39.

Singh, Anil. 2012. “Digital Preservation of Cultural Heritage Resources and Manuscripts: An Indian Government Initiative.” IFLA Journal 38 (4): 289–96. https://doi.org/10.1177/0340035212463139.

replit. ‘Tei_header’. Replit, https://replit.com/@snowka/teiheader. Accessed 19 Nov. 2022.

Roueché, Charlotte. ‘Why Do We Mark Up Texts?’ Collaborative Research in the Digital Humanities, Taylor and Francis Group, 2012.

Taj, Amreen, and Bhakti Gala. 2024. “Digitization Projects for Cultural Heritage Materials: A Study with Special Reference to Arabic, Persian, and Urdu

Tei_header - Replit. https://replit.com/@snowka/teiheader. Accessed 19 Nov. 2022.

TEI: Text Encoding Initiative. https://tei-c.org/. Accessed 20 Nov. 2022.

TEIgarage: https://teigarage.tei-c.org Accessed 11 June. 2025.

XML Editor. https://www.oxygenxml.com/xml_editor.html. Accessed 19 Nov. 2022.

Manuscripts.” In Advances in Library and Information Science, edited by K.R. Senthilkumar, 238–55. IGI Global. https://doi.org/10.4018/979-8-3693-2782-1.ch013.

Forthcoming paper’s conference presentation slides: https://doi.org/10.5281/zenodo.13997483

Downloads

Published

15-05-2026

How to Cite

Irfan, S. (2026). Introducing the TEI (Text Encoding Initiative) framework for minimal Digital Scholarly Editions . Digital Humanities Intersections, 1(1). Retrieved from https://dhi.iiti.ac.in/index.php/dhjournal/article/view/20

Issue

Section

Methods Papers