## Graduate School of Language Technology in Finland## Kieliteknologian valtakunnallinen tutkijakoulu - Språkteknologiska forskarskolan i Finland |

Edited by *Anssi Yli-Jyrä*

**Note:**Anssi will give an **introductory lecture** on
the subject on 28 April 2002 from 14:00 o'clock at
Siltavuorenpenger. Join in if you are interested! There will be
lots of time for questions.

The items 1-4 below contain references to introductory material in Finite-State Transducers (FST). They are not required if the student is already familiar with the finite-state technology. Before you can learn about Weighted FSTs you should have learned what finite-state transducers are and why they are useful. The readings 1-2 below are application oriented and easy for a novice. The readings 3 and 4 construct a theoretical basis which should cover a lot of the background which is assumed in the course on weighted finite-state transducers.

The background articles in (5) will be studied during the course. They are mainly very difficult and a novice should know that this true also for advanced students. The function of the course is to make the central results in these articles more accessible to the students. However, it would be beneficial if the students have a look at some of the most accessible sections in these articles as listed below in (5). These sections amount for some 10 pages in total.

1. Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette,
Anne Schiller. 1996. **Regular Expressions for Language
Engineering**. *Natural Language Engineering*, 2 (4)
305-328.Cambridge University Press.
http://citeseer.nj.nec.com/karttunen96regular.html

- Easy to read. Application oriented. Motivating. This
article does not require familiarity with any specific
mathematical notation, but you need to know a little bit about
set theory (e.g. intersection, complement). Some basic concepts
about regular languages may be easier to grasp from the article
bye Roche and Schabes (below), but if you know them, this
article makes them motivate you with applications. It
introduces a formalism for regular expressions and uses it
throughout the text. It contains examples of applications and
the reader can pick his favourite from them, because the text is
accessible to grasshopper readers. After reading this article,
you should learn more about the underlying theory: about
automata, transducers and their diagrams.

2. **Xeroxin Äärellisten Automaattien Kääntäjä**. A WWW
document. The Document Company - Xerox, XRCE. http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/

- "Finite-State Networks (Äärelliset automaattiverkot)" - This page introduces the diagram representation for tranducers.
- "Application of Finite-State Networks" - This page explains how a finite-state network (transducer) is applied to an input string to produce output strings.
- "Examples of Networks and Regular Expressions (Esimerkkejä verkoista ja säännöllisistä lausekkeista)" - This page contains examples to illustrate the compilation of finite-state networks (transducers) from regular expressions. The page contains several examples which should help the reader to see various applications of the finite-state technology.

3. Emmanuel Roche and Yves Schabes. **Introduction to
Finite-State Devices in Natural Language Processing.** In
*Finite-state Language Processing*, E. Roche and Y. Schabes
(eds.). Also available as Technical Report MERL-TR-96-13,
Mitsubishi Electric Research Laboratories.
http://www.merl.com/reports/TR96-13/index.php

- This is the introductory chapter for an important
collection of articles. As a whole, it contains far too much
material for an average need. A novice is adviced to read pages
1-18 in the report (1-17 in the book) carefully. If the
mathematical notation is too difficult, a text book in discrete
mathematics, computational linguistics, algebra, computer
science etc. should make it familiar right in the beginning.
After you have learned the first 18 pages, you will be well
prepared for finite-state transducers.

4. Lauri Karttunen. **A short history of two-level
morphology.** Presented at ESSLLI 2001 Special Event "Twenty
Years of Two-Level Morphology" organised by Lauri Karttunen,
Kimmo Koskenniemi and Gertjan van Noord. Helsinki. http://www.helsinki.fi/esslli/evening/20years/twol-history.pdf

- This is an excellent historical review on finite-state
methods in phonology and morphology. It connects finite-state
methods to works of Chomsky (1960) and Halle (Chomsky and Halle
1968) and presents the development of the current state-of-art
in computational morphophonology.

5. Articles mentioned in the articles section of the course page.:

**Finite-State Transducers in Language and Speech Processing**. Try to read pages 1-4, section 2.4, page 11, and sections 4-5 which mainly contain motivations and reflections. This article should be considered as a "Theory Handbook". The students are not required to grasp all the material in this article, but later on in the course we will use software that is based on the theoretical basis described in this article.**Weighted Finite-State Transducers in Speech Recognition**. This contains the definitions and results on transducer that are most relevant for speech processing. Reading the sections 1 - 2.1 (3½ pages) gives a good start for the course.**Weighted Grammar Tools: the GRM Library**. This article describes some new ideas based on the rewriting rules known from Chomsky's Sound Pattern of English and from generative phonology. Karttunen's review on Two-Level Morphology (4) may serve as a background. In the first part, the notation is relatively simple, although a novice can only try to grasp a general idea without paying too much attention to the details and to complicated diagrams. A novice should skip sporadic references to rational power series and try to get an intuitive idea of meaning of the figure 2.1 as an realization for a rule. The second part of Mohri's article (starting form page 29) is not very accessible and is absolutely only for advanced students after the course.

Last updated: Thursday, 27-Mar-2003 10:21:54 EET