The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Harris (1954; 1968). We study the assumption from two standpoints: firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts); and secondly, similar usages (contexts) should lead to similar meanings (word senses).
If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labelling contexts in an unsupervised or weakly supervised manner (Linden and Lagus 2002; Linden 2003; Linden 2004a). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context is the only means we have to separate word senses.
If we start with similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or bilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Linden and Piitulainen 2004b). In the bilingual material, we find translations by supervised learning of transliterations (Linden 2004c;Linden 2005a). In both the monolingual and bilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case, we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets.
In the introduction (Linden 2005b), we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it can be stored. We also consider how to define context, how to collect realistic contexts, and how to represent them. We discuss how to evaluate the context classifiers and word sense classifications, and finally we present the word sense discovery and disambiguation methods proposed in the publications of for this work.This work confirms that the hypothesis proposed by Harris is useful, and we implement three methods for exploiting his hypothesis which have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes.