Organization of the Directories and the Corpora of the Multilingual Resource Collection of the UHLCS


The structure of the main directory of the UHLCS is organized on the basis of the institutions and organizations which accept the computer account applications. The same system is also followed in organization of the UNIX groups. There are four main directories at the top level of the directory tree: /general-linguistics, /general-linguistics-kotus, /language-departments and /multilingual-language-archive:

  1. /general-linguistics: the computer account application is accepted by the Department of General Linguistics, University of Helsinki.
  2. /general-linguistics-kotus: the computer account application is accepted by the Department of General Linguistics, University of Helsinki, or the Research Institute for the Languages in Finland.
  3. /language-departments: the computer account applications are accepted by a Linguistic Department of the University of Helsinki. In practice the computer account application is accepted by the authorized person who is the owner or a representative of the owner of the corpus/corpora located at the UHLCS at the directory /linguistic-departments.
  4. /multilingual-language-archive: the computer account applications are accepted by the Department of General Linguistics, University of Helsinki, but the owner of the data or the representative of the owner must be informed on the computer account application (the names of the owners and the representatives of the owners of the corpora).

At the top level of the directory tree, there is also the directory /ADM which is needed for administrative purposes, and README-files which contain information on the UHLCS, the UNIX-groups, computer accounts, and information on the copyrights concerning the electronic data located at the UHLCS.

The sub-directories are organized on the basis of the language families. At the top-level of the directory tree there is the directory which corresponds to a language phylum: /indo-european-lgs, /uralic-lgs, etc. The sub-directories, daughters in the directory paths are branched on the basis of the main branches in the language family tree. The third level in the language family tree is optional. If needed, there can be several sub-nodes in the directory tree. The last node in the language family tree is the name of the language. The data directories containing small corpora are organized according to the owners of the corpora, and the directories containing several corpora are organized according to the corpora and data types.

  1. /The name of the language family (language phylym)
    1. /The name of the branch at a lower level in the language family tree (a daughter language group)
      1. /The name of the branch at a lower level in the language family tree (a daughter language group)
        1. /The name of the language
          1. /The data directories
            1. /Data directories
              1. /Data
            2. /Metadata files
At the same level with the data directories, there are also directories which contain information on the corpora and instructions for editing the corpora. Also the original data files are located at the data directories.


P.S. 2007; Last modified: Mon Nov 24 16:35:01 EET 2008