COMPUTER CORPORA OF AFRO-ASIATIC AND NIGER-CONGO LANGUAGES

Hebrew, Somali, Swahili



The Hebrew computer corpora

The Hebrew corpus contains the following documents:

  1. Biblia Hebraica Stuttgartensia. (c) Copyright 1967/1977, 1983. Deutsche Bibelgesellshaft, Stuttgart. Used by permission.
  2. Morphologically tagged text of BHS. (c) Copyright 1991. Center for the application of Technology to Biblical and Theological Education (CATBE), Westminster Theological Seminary, Philadelphia, PA.

The documents are licenced directly from the copyright holders. The analysis of the original versions were corrected, when the documents were adapted to the University of Helsinki Language Corpus Server. The use of the corpora located at the University of Helsinki Corpus Server is restricted to concern research and teaching. Reference to the corpora has to be done in the papers in which they are used as a source. For further information, please contact: address of the contact person.

Metadata descriptions for the Hebrew corpus



The Somali computer corpus

The Somali corpus at the University of Helsinki Language Corpus Server consists of the Somali translations of articles published in the Finnish newspaper "Selkouutiset" (News in clear language) in the issues number 13, 1994, number 18, 1994, and number 20, 1994. The newspaper is published by Kehitysvammaliitto ry with the support of a working group collected from foundations on social and healthy affairs.

The use of the corpora located at the University of Helsinki Corpus Server is restricted to concern research and teaching. Reference to the corpora has to be done in the papers in which they are used as a source. Address of contact person.

Metadata descriptions



The Swahili computer corpora

The Swahili computer corpora at the University of Helsinki Language Corpus Server consist of two large corpora called (1) the Swahili text corpus, and (2) the corpus containing data from various Swahili dialects.

"The Swahili corpus is located in tuuri (a server of the Department of General Linguistics at the University of Helsinki) corpus directory /corp/swa. Everyone that has a username in tuuri has access to it. The corpus consists of (1) standard Swahili texts and (2) interviews done in various Swahili dialects and speech forms on the islands of Zanzibar, Pemba, Tumbatu and Mafia of the Indian Ocean, and of the coastal area of Tanzania. The dialectal texts were collected in conjunction with The Institute of Kiswahili Research (University of Dar-es-Salaam), and The Institute for Asian and African Studies (University of Helsinki), in 1989-1991. New texts are being added to the corpus from time to time." (Arvi Hurskainen, March 20, 2000)

More information on the Swahili corpora and tools prepared to be used in the analysis of the corpora is available in the directory /corp/swa/ within the Swahili corpora.

The use of the corpora located at the University of Helsinki Corpus Server is restricted to concern research and teaching. Reference to the corpora has to be done in the papers in which they are used as a source. For further information, please contact: address of the contact person:

Metadata descriptions


University of Helsinki Language Corpus Server
P.S., 1998; 2002; 2007.