INDEX OF /uhlcs/metadata/


The metadata directory "Metadata" contains all the metadata descriptions of the machine-readable corpora located at the University of Helsinki Language Corpus Server.

The sricpts in the directory "Data-structure scripts" convert the text corpora into the XML-format by marking the text with information on tags describing the structure of the text. The scripts which are described in the documentation are used in interpreting the structure of the text. The tags are always written on separate lines, so they will increase the linecount of the corpus.

The scripts skip all the texts inside <PUBLICATION_INFO> tags. When needed, these tags should be manually added to text corpus in advance. This is because some of the corpora contain the first page of the actual book that would confuse automatic interpretation of the text.

Data-structure scripts

Updated by P.S., Sept. 20, 2003.