The data bank project at the Max Planck Institute for Evolutionary Anthropology, Department of Linguistics, Leipzig, started on April, 2000. The work covered the following topics:
The preparation of a system for the documentation of electronic
linguistic data, especially data on endangered languages. This data
can be used in research.
The development of system for archiving and using electronic
linguistic data.
The system for the documentation of electronic linguistic data consists of the following parts: a) adapting data to the UNICODE-format, b) preparing information on the data structure, and c) preparing metadata descriptions of the data. The system of adapting data to the electronic archive contains metadata descriptions of electronic linguistic data and a catalogue located in the archive. The metadata descriptions have been developed in cooperation with the international metadata project steered at the Technical Department of the Max Planck Institute of Psycholinguistics, Nijemgen International Standards for Linguistic Engineering). The contracts made with data owners, who give data to the bank, and the MPI-EVA, Department of Linguistics are also included in the archives data-collecting system.
The data bank is located in the UNIX-Operating system. The data types include running texts, dictionaries and material prepared with the relational data base format. Data that contains texts originally written in a non-Latin-1 script (e.g in Cyrillic or in IPA-characters), have been adapted to the UNICODE-format. In addition to this, the bank includes examples of maps and video data.
The basic elements in the system on using data in the data bank contains a server that can be used in practical work. The principles of the functions of the server were presented in May 2001. Tools that interact with the server. The data bank contains some tools for organizing data and collecting different kinds of information on data. The data bank can be used at the MPI-EVA, Department of Linguistics, Leipzig. The possibility of using the data bank outside of the institute must be discussed separately (cf. the info-page).
The following researchers and research assistants
participated in the project:
The
following researchers have donated data to the data bank:
The contract
to be used on receipt of data in the data bank has been drawn up on
the basis of the contract used at the University of Helsinki,
Department of General Linguistics, when getting data for the
University of Helsinki Language Corpus Server. Likewise, the computer
account application form to be used by people outside of the MPI-EVA,
Department of Linguistics, Leipzig has been used at the University of
Helsinki.
The
head of the pilot project was Pirkko Suihkonen. One aspect on this pilot has
been to develop a basis for electronic linguistic corpora that can be
used in linguistic research in language typology.