Manuscript Growth and Episodic Composition

Last week, at the XVIIth Congress of the International Association of Buddhist Studies in Vienna, I presented the paper “Manuscript Growth and Episodic Composition: Commentaries and Avadānas in Early South Asia” in which I argue that several of the Gāndhārī scrolls containing scholastic texts and narrative sketches show signs of having been compiled and added to over a period of time. Proceedings for the conference panel, containing an extended version of the paper, are in the early planning stages. In a side note of my paper, I also announced a recent discovery that I made when reading the Khotan Dharmapada with my students and that may be of wider interest: The colophon of this scroll does not (as per Brough’s edition) specify the monastery where it was written, but rather that the scribe was a certain Dharmaśrava. I briefly present the evidence in my article “Gandhāran Scrolls: Rediscovering an Ancient Manuscript Type,” and am working on a comprehensive discussion of my new reading of the Khotan Dharmapada colophon and its implications.

Manuscript Coverage

We completed coverage of all published Gāndhārī manuscripts in our Dictionary, including the Khotan Dharmapada, the recent manuscript discoveries published in the Gandhāran Buddhist Texts series, the various published samples from the British Library, Senior, Bajaur and Split Collections, and some Central Asian manuscript fragments on palm leaf, silk and paper. The total number of Dictionary articles is now 3,912.

Completeness of the Dictionary

A new volume on South Asian Buddhist manuscripts (From Birch Bark to Digital Data: Recent Advances in Buddhist Manuscript Research) has just been published by the Austrian Academy of Sciences. It brings together papers from the 2009 conference ‘Indic Buddhist Manuscripts: The State of the Field’ at Stanford University and provides the most comprehensive and detailed overview of the topic available. In spite of its title (adapted from my 2012 paper “From Birch Bark to Digital Editions”), the volume does not address the digital representation of manuscripts, but our Catalog of Gāndhārī Texts and Dictionary of Gāndhārī are mentioned in the survey of Gāndhārī manuscript studies by Richard Salomon (pp. 1, 14–15). We are given more credit for completeness than currently due: The cited number of 125,000 (now 138,560) “entries” refers to individual word tokens in the source corpus for our Dictionary. The number of proper Dictionary articles (most of them recently written) is currently 1,941, though happily growing at a steady pace. We anticipate that we will complete lexicographic coverage of published Gāndhārī manuscripts by the end of this year.

Preliminaries to a Grammar of Gāndhārī

This Monday, at the 224th Annual Meeting of the American Oriental Society in Phoenix, I presented the paper “Preliminaries to a Grammar of Gāndhārī: Sound System and Morphological Categories.” After an overview of the main varieties of literary Gāndhārī, I proposed a chronology of sound changes between Old Indo‐Aryan and late Gāndhārī. I then discussed two central problems that I face in my preparation of a working grammar of Gāndhārī for the user of manuscript editors: the set of productive morphological categories in Gāndhārī and their relationship to fossilized forms, and the range and practical handling of linguistic variation in Gāndhārī sources. I concluded by giving an update on the tagged corpus of Gāndhārī texts prepared by Andrew Glass and myself for our Dictionary of Gāndhārī.

Collaborative Research Tools for Gāndhārī and Sanskrit Buddhist Manuscripts

At the international symposium “Humanities Studies in the Digital Age and the Role of Buddhist Studies” at the University of Tokyo last week, I presented the paper “Collaborative Research Tools for Gāndhārī and Sanskrit Buddhist Manuscripts.” After an overview of the field of Gāndhārī manuscript and epigraphic studies and the particular challenges of its source material, I describe the resources and software solutions that we provide on In addition to our Dictionary, Bibliography and Catalog, over the last fifteen years we assembled a comprehensive corpus of Gāndhārī source texts (2,441 manuscripts, inscriptions and coins) and linked our reference works and their source corpus by a custom software system. At this juncture, the standardization of tools and data formats has assumed special importance in order to ensure the long‐term usefulness of our content and an improved interchange with other projects engaged in the study of Pali, Sanskrit, Chinese and Tibetan Buddhist literature. I describe how we plan to address these challenges and introduce a new software development effort (supported by the Bavarian Academy of Sciences and Humanities, University of Washington, University of Lausanne and Prakaś Foundation) implementing our designs and enhancing the collaborative research tools on


Over the last few weeks, we collected all word instances in the text collection that currently have Sanskrit parallels associated with them (3,404 items). We arranged these under 1,960 separate lemmata, and I associated each lemma with a standardized Gāndhārī spelling. The result is now available from the Dictionary → GD submenu, while the old Dictionary search interface (based on word forms rather than lemmata) continues to be available internally under Dictionary → Index. At this point I would like to solicit feedback from the users of on my proposed standardized spelling for lemmata. The spelling basically uses only those graphemes that are common to the various Gāndhārī orthographies; applies anusvāra consistently; uses the spelling sp for the reflex of OIA sibilant + m or v; uses g for the lenition product of velar stops, but y for the lenition product of palatal stops and original y; and does not mark optional palatalizations. My aim is to provide a standardized spelling that is as central to the overall Gāndhārī tradition as possible, while being as helpful as possible to the users of our Dictionary. Please have a look through the list of lemmata that is now online and let me know whether you think I come close to these goals.

Sample TEI Encoding for Gāndhārī Manuscripts

In the spring of 2010, as I began making plans for using the Text Encoding Initiative’s Guidelines to encode and process the source texts on, I produced an annotated sample TEI document that illustrates the encoding of a manuscript using two verses from Richard Salomon’s edition of the British Library Anavataptagāthā (Salomon 2008: 204–207):

eva ṇadu tada thero
bhrad[a] budha[sa tadi](*ṇo
śpaya karmu viagha)ṣe
aṇodatu mahasare ☒

cadudiśami saghami
[kuḍa]gharo maya kridu ◦
vivaśisa praveaṇo
badhumadi[raya](*dha)[ṇ](*i)e ◦

The sample document illustrates how to indicate textual divisions in TEI XML markup; how to record states of preservation, certainty of reading and editorial interventions; how to associate lexical and grammatical information with the transcribed text; how to add editorial notes to variously‐sized elements of the transcribed text; and how to link the transcription with a manuscript image for display and paleographical applications. The sample document consists of two parts: the main file (available here) and a supplementary file containing grammatical feature definitions for reuse among several documents (available here).

From Birch Bark to Digital Editions

At the Seventh Biennial International Conference on Buddhist Texts (“Critical Edition, Transliteration and Translation”) at the Somaiya Vidyavihar in Mumbai, I presented the paper “Buddhist Manuscripts from Gandhāra: From Birch Bark to Digital Editions.” After an overview of the Gandhāran manuscript tradition, I propose a new digital research environment for the study and publication of Gandhāran manuscripts (and other ancient documents) that is based on a close linking of visual evidence (images of manuscripts or inscriptions) and textual interpretation. A core feature of the proposed system is the storage of several alternative interpretations of a single manuscript or inscription in parallel, which I illustrated using the Śatruleka reliquary inscription (as interpreted in Bailey 1982, Salomon 1984 and Mukherjee 1984) and the Kopśakasa reliquary (as interpreted in Fussman 1984, Falk 2010 and Baums 2012). In addition to documenting the published record of research, such parallel storage of multiple interpretations also allows concurrent active users of the system to record their ideas with individual credit and without danger of overwriting each other’s interpretations. An extended version of my paper will appear in the proceedings of the conference.

Digital PTSD

We are pleased to announce the availability of a digital version of the Pali Text Society’s Pali‐English Dictionary as part of the Dictionary of Gāndhārī system. This valuable addition now makes it possible for users of this website to compare both Sanskrit and Pali lexical information while consulting the Dictionary of Gāndhārī. The PTSD remains under copyright by the Pali Text Society. Special thanks are due to the Council of the PTS (and to Rupert Gethin in particular) for granting their kind permission to provide this digital version on, as well as to James Nye of the Digital South Asia Library for generously sharing the DSAL’s digitization of the PTSD on which our version builds.