Methods of text representation for minority/endangered languages.


The last ten to fifteen years have seen a significant theoretical effort resulting in cumulative methodological developments in the area of minority language documentation, including methods of interlinear glossing. The results have been exhaustively presented in papers by Christian Lehmann as well as in the so called Leipzig glossing rules. However, some key steps remain: raising awareness of the methods among practicing descriptivists, getting these descriptivistsí feedback, and standardization of the practice of text representation.

Russia, with its ethnically, culturally and linguistically diverse population, presents a unique situation. On the one hand, issues and activities concerned with documenting minority languages are of paramount importance. However, standardization of text representation (including standardization of interlinear glossing) is still rudimentary. There exist a number of isolated traditions which are more or less internally consistent but deviate in more or less important points from each other and from the Leipzig rules (one example is the tradition of text representation of the Moscow State University Field School). Bringing these traditions to a unified standard requires a united, cooperative effort of the field linguists nationwide. At the suggested Round Table, the existing standards will be reviewed, explained and critically assessed. This will be a critical first step in this direction, especially since it will bring together not only Russian linguists from at least three different regions and traditions (Siberia, Moscow, St. Petersburg) but also colleagues from the Max Planck Institute for Evolutionary Anthropology, where the Leipzig rules were created, as well as from other instutions outside Russia. Importantly, one of the immediate objectives of the field activities is to create an electronic corpus of interlinear glossed texts of minority languages of the RF. The representational standard would not only ensure comparable representation of the texts in paper publications, it would also constitute progress toward a unified model of inputting the texts into computer, greatly simplifying their processing and search, including searching by grammatical glosses.

The issues to be considered at the Round Table include the following: the inventory of morphological glosses; inventory and rules of using the delimiters; standard (IPA) transcription and connected issues; representation of phenomena typical of some special kinds of texts (e.g. representing falstarts and self-repairs in spontaneously produced narratives and dialogs), and technical problems of representing the glossed texts by means of various pieces of software accessible worldwide. Some of these issues simply demand a discussion. One of the most important objectives of the discussion could be a search for a balance between representational standardization and efficiency of the standard for representing the data of various given languages. Specifically, a unified glossing standard must be both strict enough to make the representation of structurally different data comparable and loose enough to provide the researcher with a certain degree of flexibility regarding the choices necessary for an adequate and efficient representation of linguistic data. (It should be noted that not all choices of this sort are equally essential/obligatory, because the researcherís knowledge of the language is often limited). Some specific problems and questions are:


        Is it necessary or redundant to represent prosodic data (stress, phrasal prosody etc.)?

        The difficulties of choosing one or other delimiter because of the lack of the data or scalar opposition (such as dot or dash with fusion / cumulation);

        The absence of a robust theoretical approach or empirical data to suggest that a specific segment is a separate phonetic word or a clitic.


Just as is the case of the meta-language of theoretical linguistics, it is quite obvious that a search for a universal meta-language of text representation needs to be based on data from typologically diverse languages and requires a joint effort of experts in these languages. LENCA-3 is a forum that suits both these requirements very well.


Language: Russian, English.

Duration: one section.


A.E. Kibrik (Moscow State University)

M.A. Daniel (Moscow State University)

A.Yu. Filchenko (Tomsk Pedagogical State University)