DATABANK FOR
ENDANGERED FINNO-UGRIC LANGUAGES
Report
I. ORGANIZATION OF THE PROJECT
Goals of the Project:
-
Providing a text corpus collection of endangered Finno-Ugric languages
in machine-readable form;
- Basic linguistic research.
Economical
Fund:
- Academy of Finland (SA)
- University of Helsinki: Department of Finno-Ugric Studies
- Department of General Linguistics
- Joint Committee of the Nordic Research Councils for the Humanities (NOS-H)
- The Nordic Research Council
Length
of the Research Period:
- Academy of Finland: 1996-1998
- Joint Committee of the Nordic Research Councils for the
Humanities: 1996-1997
Project personal:
- Seppo Suhonen, University of Helsinki, Department of
Finno-Ugric Studies, chief of the project.
- Researchers
and research-assistants
:
Finland
- University of Helsinki, Department of Finno-Ugric
Studies:
- NOS-H: Jelena Adel
(part-time researcher),
- Jarmo Alatalo (full-time researcher),
- Miikul Pahomov (part-time researcher),
- and Merja Salo (full-time researcher)
- Academy of Finland: Erja Kujala
(short-term research-assistant)
- Jack Rueter,
- and Tapani Salminen (part-time researchers).
Department of General Linguistics:
Academy of Finland: Pirkko
Suihkonen (full-time researcher).
Sweden
- University of Uppsala, Department of Finno-Ugric
Studies
- NOS-H: André Hesselbäck and
- Manja Lehto (full-time researchers)
- University of Umeå, Institute for Saami
-
NOS-H: Olavi
Korhonen (part-time researcher).
Norway
- Nord-Trøndelag College, Department of Education
:
NOS-H: Nora Bransfjell (part-time researcher)
- Norwegian Computing Centre for the Humanities
NOS-H: Sjur Moshagen (full-time researcher)
- Norwegian University for Science and Technology, Department of
Linguistics:
NOS-H: Sagka Renander (short-term research-assistant).
II. COMPUTER CORPORA
Languages from which the computer corpora will be created:
- Finland: Komi and Erzya (Jack Rueter), Khanty (Merja Salo), Nenets
(Tapani Salminen),
Selkup and Kamassian (Jarmo Alatalo), Livonian (Seppo Suhonen);
- Sweden: Ingrian (Manja Lehto), Hill Mari (André Hesselbäck) and
Ume Saami,
(Olavi Korhonen)
- Norway: Southern Saami (Nora Bransfjäll, Sjur Moshagen,
Sagka Renander).
Samples of the following languages were adjusted for use in the University of Helsinki Language Corpus Server (UHLCS) in 1996 - 1999:
Uralic languages: Livvi, Dvina-Karelian, Ludian, Ingrian,
Veps, Livonian
Kildin Saami, South Saami, Ume Saami,
Erzya, Moksha, East Mari, West Mari,
Komi Zyrian, Komi Permyak, Khanty,
Mansi, Hungarian, Enets, Nenets, Selkup and Kamas;
Indo-European languages: Kurdish, Ossete, Tajik, Armenian,
Latvian, and Lithuanian,
Belorussian, Ukrainian, Serbo-Croatian, and Moldavian (Romanian);
Caucasian languages: Avar, Lak, and Tabassaran;
Turkic languages: Altai, Azerbaijani, Balkar, Bashkir,
Crimean Tatar, Gazauz, Khakas,
Kirghiz, Kumyk, Kazakh, Turkmen, Tuvin, Uyhghur, Uzbek, and Yakut;
Mongolic languages: Buryat, and Kalmyk;
Tungusic languages: Even, Evenki, and Nanay;
Chukotko-Kamchatkan languages: Chukchi, and Koryak.
Pirkko Suihkonen, Aug. 9, 1998. Updated in Aug. 2002.