Introducing the TEI (Text Encoding Initiative) framework for minimal Digital Scholarly Editions
Abstract
This study presents a framework for encoding data in the Humanities and Social Sciences according to the Text Encoding Initiative (TEI) principles. The notion of TEI is not widely recognised among the Indian Digital Humanities community. This method paper presents an organised structure for the creation of a Scholarly Digital Edition (SDE) as an archive of significant academic work. The study elucidates the process from data extraction to preprocessing, followed by a detailed guide on encoding data into TEI tags, highlighting the significance of markup and the role of TEI in digital archiving, particularly for Humanities data. The digitisation of cultural manuscripts is essential as it aids the preservation of original documents, increases accessibility, and reduces the necessity for personal interaction with often utilised rare manuscripts. In the forthcoming era, Indian Digital Humanities must adopt innovative techniques to encapsulate material with embedded metadata for enhanced preservation strategies. A Digital Scholarly Edition is a critical depiction of historical materials, carefully curated and digitally displayed to enhance accessibility and understanding for scholars and the general public. Through the use of digital tools, DSEs can offer various text versions, emphasise differences, and deliver comprehensive analyses that are impractical in print formats. This method maintains the original content while augmenting its applicability and significance in modern academia. The Text Encoding Initiative (TEI) is a methodology for converting unstructured, plain digitised text into a Digital Scholarly Edition (DSE) that incorporates encoded metadata, thereby enhancing information retrieval, computational analysis, and visual representation.
Keywords
References
Arko, Robert A., Kathryn M. Ginger, Kim A. Kastens, and John Weatherley. 2006. “Using Annotations to Add Value to a Digital Library for Education.” D-Lib Magazine 12 (5). https://doi.org/10.1045/may2006-arko.
Bansode, Sadanand. 2008. “Creation of Digital Library of Manuscripts at Shivaji University, India.” Library Hi Tech News 25 (1): 13–15. https://doi.org/10.1108/07419050810877508.
Cummings, James. ‘The Text Encoding Initiative and the Study of Literature’. A Companion to Digital Literary Studies, edited by Ray Siemens and Susan Schreibman, John Wiley & Sons, Ltd, 2013, pp. 451–76. DOI.org (Crossref), https://doi.org/10.1002/9781405177504.ch25.
CSS Introduction. https://www.w3schools.com/css/css_intro.asp. Accessed 19 Nov. 2022.
Faruqi, Khwaja Ahmad, translator. Dastanbūy: A Diary of the Indian Revolt of 1857 by Mirza Asadullah Khan Ghalib. Asia Publishing House, 1970. (https://franpritchett.com/00ghalib/texts/txt_dastanbu_kafaruqi.pdf)
HTML vs XHTML: Know the Difference [2022 Edition] | Simplilearn. https://www.simplilearn.com/tutorials/html-tutorial/html-vs-xhtml. Accessed 19 Nov. 2022.
Liu, Alan. Transcendental Data: Toward a Cultural History and Aesthetics of the New Encoded Discourse. p. 37. 2004.
Parks, M.B. 1992. Pause and Effect an Introduction to the History of Punctuation in the West. First. Routledge.
Sahle, Patrick. 2016. “What Is a Scholarly Digital Edition?” Digital Scholarly Editing: Theories and Practices 1:19–39.
Singh, Anil. 2012. “Digital Preservation of Cultural Heritage Resources and Manuscripts: An Indian Government Initiative.” IFLA Journal 38 (4): 289–96. https://doi.org/10.1177/0340035212463139.
replit. ‘Tei_header’. Replit, https://replit.com/@snowka/teiheader. Accessed 19 Nov. 2022.
Roueché, Charlotte. ‘Why Do We Mark Up Texts?’ Collaborative Research in the Digital Humanities, Taylor and Francis Group, 2012.
Taj, Amreen, and Bhakti Gala. 2024. “Digitization Projects for Cultural Heritage Materials: A Study with Special Reference to Arabic, Persian, and Urdu
Tei_header - Replit. https://replit.com/@snowka/teiheader. Accessed 19 Nov. 2022.
TEI: Text Encoding Initiative. https://tei-c.org/. Accessed 20 Nov. 2022.
TEIgarage: https://teigarage.tei-c.org Accessed 11 June. 2025.
XML Editor. https://www.oxygenxml.com/xml_editor.html. Accessed 19 Nov. 2022.
Manuscripts.” In Advances in Library and Information Science, edited by K.R. Senthilkumar, 238–55. IGI Global. https://doi.org/10.4018/979-8-3693-2782-1.ch013.
Forthcoming paper’s conference presentation slides: https://doi.org/10.5281/zenodo.13997483
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Saniya Irfan

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Readers may share this work with proper attribution, for non-commercial purposes only, without modification. For permissions beyond this license, contact editor.dhi@iiti.ac.in


