On the data bank project


MPI-EVA, Department of Linguistics , Leipzig

The Max Planck Institute for Evolutionary Anthropology, Department of Linguistics, Leipzig, is developing a Multimodal data bank and data server. The work on the data bank and data server was started in the spring 2000. The data bank will contain all kinds of electronic linguistic data: dictionaries, grammars, running texts, word lists, audio-visual material and linguistic data located in different kinds of data bases. In addition to the linguistic data, also cultural and demographic material and maps on distribution on linguistic and cultural information will be available in the server. A special attention in collecting data will be paid in endangered languages. The server will also contain basic tools to be used in research. The multimodal linguistic data server can also be used as a forum in informing the public on languages and cultures of the world, their diversity and universals.

The main part of the data bank is located in the UNIX-operating system, but also other operating systems will be included in the data bank and data server. In this stage of the work, the main directories of the data bank are as follows:


The data are located in the directory /data, the tools that can be used in analyzing the data are located in the directory /tools, and the directory /users contains the directories of the users of the data bank. The structure of the directory /data is organized according to the data types as follows:


The data in the different directories are organized according to the language families and/or specific data types. For instance, the directory /syntactic-typology contains the following directories and files (the files are separated with commas):

/mon-khmer-lgs/ Khasi
/munda-lgs/ Kharia
/dravidian-lgs/ Kurux
/indo-aryan-lgs/Bagri, Bangani, Konkani, Konkani-Choraon, Konkani-Shiroda, Nepali, Sambhalpuri,
/tibeto-burman-lgs/ Manipuri, Naga

In the directory /metadata-descriptions the data are first arranged according to the names of languages, and then, according to the data types. The data are adapted to the data bank in that way that they shold be portable and also platform independent. The data prepared by using other character sets as the basic Latin-1 are converted to UNICODE.

lready at the very first beginning of the work, the data bank contains dictionaries, running texts, syntactic and morphological typological data, and word lists from several languages. The largest dictionary will be the IDS-dictionary that in the present situation (July 2001) contains material from the Uralic languages. Also the data to be prepared during the DOBES-project will be located in the data bank. Information on all the data is given in the README-files that are located in the data directories. Information on the data will also be given with the help of metadata that will be located on the web-side of the MPI-EVA, Department of Linguistics.

So far, the directory /tools contains some basic scripts for analyzing linguistic data. Instructions needed in the use of the scripts are/will be connected with each of the scripts. Also all the tools available in the UNIX-operating system can also be used in the work.

The data located in the Multimodal data bank in Leipzig can be used in research and teaching. When somebody would like to use the data bank, s/he should sign a computer account application form that will be available in the following address: Computer account application form. The filled and signed application form should be send to the following address:

Peter Froehlich
Max Planck Institute for Evolutionary Anthropology
Department of Linguistics
Inselstrasse 22
D-04103 Leipzig
E-mail: froelich@eva.mpg.de
Telephone: +49 - (0)341 - 99 52 412
Fax: +49 - (0)341 - 99 52 119

Information on the computer account needed when using the data will be returned at the earliest convenience.

If somebody is willing to give his/her data, the contract needed when taking care of the copyright and ownership questions is available on the following web-address: Agreement form.

For more information on the data bank, you are asked to make contacts to the following address:

Max Planck Institute for Evolutionary Anthropology
Department of Linguistics, Inselstrasse 22
D-04103 Leipzig
Fax: +49 - (0)341 - 99 52 119

Pirkko Suihkonen, July 2001
Last modified: Nov. 16, 2001.