Graduate School of Language Technology in Finland

Kieliteknologian valtakunnallinen tutkijakoulu - Språkteknologiska forskarskolan i Finland

Course on Unsupervised Learning in Language Technology (1 - 2 cr)


Roman Yangarber (New York University)


The course will explore unsupervised approaches to machine learning for Language Technology/Natural Language Processing. We will consider several problems in NLP, some of which may yield to approaches involving supervised learning. Supervised learning, however, in general depends on the existence of labeled, or annotated, training data. This poses two problems for language technology:

Therefore it becomes important to devise methods which remove reliance on pre-annotated data. The course will review a series of problems whose solutions use large corpora, but require no human supervision. These solutions rely instead on different kinds of "bootstrapping" from a few initial examples. Rather than learning from labeled data, these methods attempt to exploit the redundancy inherent in natural language.

The course will illustrate these ideas in several areas of NLP: bilingual text alignment, word-sense disambiguation, dictionary induction, name classification, and acquisition of patterns for information extraction. We will study papers from recent literature. We will also pay attention to evaluation schemes devised to assess the efficacy of the methods.


Department of General Linguistics
Siltavuorenpenger 20 A

Duration and Organization

The course will take place on the following dates:

May 5, 2003 May 6, 2003 May 7, 2003 May 8, 2003 May 9, 2003


Participants who are able to program will have the option of programming exercises. Others may participate in exercises which consist of experiments and evaluation with existing systems. Some material for the exercises is available here:


Participants are expected to download these articles and study them before the beginning of the course:

Back to the Courses Page


Last updated: