Lemmatization

Over the last few weeks, we collected all word instances in the Gandhari.org text collection that currently have Sanskrit parallels associated with them (3,404 items). We arranged these under 1,960 separate lemmata, and I associated each lemma with a standardized Gāndhārī spelling. The result is now available from the Dictionary → GD submenu, while the old Dictionary search interface (based on word forms rather than lemmata) continues to be available internally under Dictionary → Index. At this point I would like to solicit feedback from the users of Gandhari.org on my proposed standardized spelling for lemmata. The spelling basically uses only those graphemes that are common to the various Gāndhārī orthographies; applies anusvāra consistently; uses the spelling sp for the reflex of OIA sibilant + m or v; uses g for the lenition product of velar stops, but y for the lenition product of palatal stops and original y; and does not mark optional palatalizations. My aim is to provide a standardized spelling that is as central to the overall Gāndhārī tradition as possible, while being as helpful as possible to the users of our Dictionary. Please have a look through the list of lemmata that is now online and let me know whether you think I come close to these goals.