Helper for the Background Material of Mohri's Course on
Weighted Finite-State Transducers in Text and Speech
Processing
Edited by Anssi Yli-Jyrä
Note:Anssi will give an introductory lecture on
the subject on 28 April 2002 from 14:00 o'clock at
Siltavuorenpenger. Join in if you are interested! There will be
lots of time for questions.
The items 1-4 below contain references to introductory
material in Finite-State Transducers (FST). They are not
required if the student is already familiar with the
finite-state technology. Before you can learn about Weighted
FSTs you should have learned what finite-state transducers are
and why they are useful. The readings 1-2 below are application
oriented and easy for a novice. The readings 3 and 4 construct
a theoretical basis which should cover a lot of the background
which is assumed in the course on weighted finite-state
transducers.
The background articles in (5) will be studied during the
course. They are mainly very difficult and a novice should know
that this true also for advanced students. The function of the
course is to make the central results in these articles more
accessible to the students. However, it would be beneficial if
the students have a look at some of the most accessible sections
in these articles as listed below in (5). These sections
amount for some 10 pages in total.
1. Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette,
Anne Schiller. 1996. Regular Expressions for Language
Engineering. Natural Language Engineering, 2 (4)
305-328.Cambridge University Press.
http://citeseer.nj.nec.com/karttunen96regular.html
Easy to read. Application oriented. Motivating. This
article does not require familiarity with any specific
mathematical notation, but you need to know a little bit about
set theory (e.g. intersection, complement). Some basic concepts
about regular languages may be easier to grasp from the article
bye Roche and Schabes (below), but if you know them, this
article makes them motivate you with applications. It
introduces a formalism for regular expressions and uses it
throughout the text. It contains examples of applications and
the reader can pick his favourite from them, because the text is
accessible to grasshopper readers. After reading this article,
you should learn more about the underlying theory: about
automata, transducers and their diagrams.
2. Xeroxin Äärellisten Automaattien Kääntäjä. A WWW
document. The Document Company - Xerox, XRCE. http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/
- "Finite-State Networks (Äärelliset automaattiverkot)" - This
page introduces the diagram representation for tranducers.
- "Application of Finite-State Networks" - This page
explains how a finite-state network (transducer) is applied to
an input string to produce output strings.
- "Examples of Networks and Regular Expressions (Esimerkkejä
verkoista ja säännöllisistä lausekkeista)" - This page
contains examples to illustrate the compilation of
finite-state networks (transducers) from regular expressions.
The page contains several examples which should help the
reader to see various applications of the finite-state
technology.
3. Emmanuel Roche and Yves Schabes. Introduction to
Finite-State Devices in Natural Language Processing. In
Finite-state Language Processing, E. Roche and Y. Schabes
(eds.). Also available as Technical Report MERL-TR-96-13,
Mitsubishi Electric Research Laboratories.
http://www.merl.com/reports/TR96-13/index.php
This is the introductory chapter for an important
collection of articles. As a whole, it contains far too much
material for an average need. A novice is adviced to read pages
1-18 in the report (1-17 in the book) carefully. If the
mathematical notation is too difficult, a text book in discrete
mathematics, computational linguistics, algebra, computer
science etc. should make it familiar right in the beginning.
After you have learned the first 18 pages, you will be well
prepared for finite-state transducers.
4. Lauri Karttunen. A short history of two-level
morphology. Presented at ESSLLI 2001 Special Event "Twenty
Years of Two-Level Morphology" organised by Lauri Karttunen,
Kimmo Koskenniemi and Gertjan van Noord. Helsinki. http://www.helsinki.fi/esslli/evening/20years/twol-history.pdf
This is an excellent historical review on finite-state
methods in phonology and morphology. It connects finite-state
methods to works of Chomsky (1960) and Halle (Chomsky and Halle
1968) and presents the development of the current state-of-art
in computational morphophonology.
5. Articles mentioned in the articles
section of the course page.:
- Finite-State Transducers in Language and Speech
Processing. Try to read pages 1-4, section 2.4, page 11,
and sections 4-5 which mainly contain motivations and
reflections. This article should be considered as a "Theory
Handbook". The students are not required to grasp all the
material in this article, but later on in the course we will
use software that is based on the theoretical basis described
in this article.
- Weighted Finite-State Transducers in Speech
Recognition. This contains the definitions and results on
transducer that are most relevant for speech processing.
Reading the sections 1 - 2.1 (3½ pages) gives a good start for
the course.
- Weighted Grammar Tools: the GRM Library. This
article describes some new ideas based on the rewriting rules
known from Chomsky's Sound Pattern of English and from
generative phonology. Karttunen's review on Two-Level
Morphology (4) may serve as a background. In the first part,
the notation is relatively simple, although a novice can only
try to grasp a general idea without paying too much attention
to the details and to complicated diagrams. A novice should
skip sporadic references to rational power series and try to
get an intuitive idea of meaning of the figure 2.1 as an
realization for a rule. The second part of Mohri's
article (starting form page 29) is not very accessible and is
absolutely only for advanced students after the course.