The data bank project at the Max Planck Institute for Evolutionary Anthropology, Department of Linguistics, Leipzig, started on April, 2000. The work covered the following topics:
The preparation of a system for the documentation of electronic
linguistic data, especially data on endangered languages. This data
can be used in research.
The development of system for archiving and using electronic linguistic data.
The system for the documentation of electronic linguistic data consists of the following parts: a) adapting data to the UNICODE-format, b) preparing information on the data structure, and c) preparing metadata descriptions of the data. The system of adapting data to the electronic archive contains metadata descriptions of electronic linguistic data and a catalogue located in the archive. The metadata descriptions have been developed in cooperation with the international metadata project steered at the Technical Department of the Max Planck Institute of Psycholinguistics, Nijemgen International Standards for Linguistic Engineering). The contracts made with data owners, who give data to the bank, and the MPI-EVA, Department of Linguistics are also included in the archives data-collecting system.
The data bank is located in the UNIX-Operating system. The data types include running texts, dictionaries and material prepared with the relational data base format. Data that contains texts originally written in a non-Latin-1 script (e.g in Cyrillic or in IPA-characters), have been adapted to the UNICODE-format. In addition to this, the bank includes examples of maps and video data.
The basic elements in the system on using data in the data bank contains a server that can be used in practical work. The principles of the functions of the server were presented in May 2001. Tools that interact with the server. The data bank contains some tools for organizing data and collecting different kinds of information on data. The data bank can be used at the MPI-EVA, Department of Linguistics, Leipzig. The possibility of using the data bank outside of the institute must be discussed separately (cf. the info-page).
The following researchers and research assistants participated in the project:
The following researchers have donated data to the data bank:
The contract to be used on receipt of data in the data bank has been drawn up on the basis of the contract used at the University of Helsinki, Department of General Linguistics, when getting data for the University of Helsinki Language Corpus Server. Likewise, the computer account application form to be used by people outside of the MPI-EVA, Department of Linguistics, Leipzig has been used at the University of Helsinki.
The head of the pilot project was Pirkko Suihkonen. One aspect on this pilot has been to develop a basis for electronic linguistic corpora that can be used in linguistic research in language typology.