Journal of Natural Language Engineering
Special Issue on:

Finite State Methods and Models in Natural Language Processing

Download Call For Papers


Special Issue Description

The languages described by regular expressions are exactly those recognized by automata that have a finite number of states (Kleene 1956). This fundamental result has been extended to string transductions, sets of trees, formal power series, grammars, semigroups, and finite models. Furthermore, the interest in finite equivalence relations in these systems has led to the study of families of recognizable languages. The study was pioneered by Schützenberger, McNaughton, Papert and Kamp, who established the equivalence between star-free generalized regular expressions, counter-free automata, first-order logic and temporal logic. Kleene's theorem, its extensions and its restrictions have tremendous methodological relevance to speech and natural language processing (NLP).

In 1972, C. Douglas Johnson pioneered the study of formal aspects of phonological descriptions by reducing phonological grammars to finite state systems. This was the dawn for the currently existing finite state approached to phonology and morphology with wide-spread applications to written languages. In addition, many other NLP areas admit that they employ finite state systems or their extensions. In particular, we mention decision trees, deterministic parsing, chunk parsing, edit distance, rational kernels, hidden Markov models, Viterbi algorithms, semiring parsing, and weighted tree automata. In sum, finite state methods, both statistical and knowledge-based, are now being used in various areas of linguistic computation, ranging from speech and character recognition to message understanding and statistical machine translation. Finite state methods in NLP continue to be an area of further research.

The current special issue has two goals. The first is to summarize the state of the art in the finite state methods and models of the NLP community. The second is to increase awareness of the community about open issues and new perspectives: the study of the adequacy of finite-state methods and models extends from the traditional richness and use to new generalizations and ambitious tasks.

The call for papers is open to everyone: New contributions are warmly encouraged; in addition, the issue is ideal also for extended and updated versions of the papers presented in workshops on Finite State Methods and Natural Language Processing (FSMNLP Helsinki 2005, Potsdam 2007, Ispra 2008, and Pretoria 2009). Submissions of completely new papers under the special theme are welcomed as equally appropriate as extended and updated papers.


Topics of Interest

1. New or updated work on the traditional topics of FSMNLP workshops

The traditional topics in the series of FSMNLP workshops are appropriate. The submitted paper must clearly deal with finite states. An extended version of a meeting paper can be submitted, provided that the contribution has been updated as appropriate given the progress in the field. The forums for any preliminary versions of the paper must be indicated.

2. Study of new questions raised by fundamental results in finite state phonology and morphology

In linguistics, morphological grammars describe the structure of word forms in a language, and phonological grammars describe the systematic use of sounds as units that encode the word forms. Finite state phonology and morphology consider grammars and rules whose formal semantics can be defined using finite state transducers. The fundamental result in these fields states that one can construct finite state transducers aka lexical transducers from phonological (Johnson 1972, Kaplan and Kay 1994) and morphological grammars (Koskenniemi 1983, Karttunen 1994, Beesley and Karttunen 2003). This result is not the end of inquiries but it paves the way for the study of many interesting problems, including the following:

  • adaptations of the fundamental result to less rigid formalisms and descriptive approaches used in field linguistics
  • correlation between availability of computational morphology and language development of under-resourced languages
  • approaches to the adaptation of lexical transducers to a cluster of languages
  • seamless construction of computational lexicons from dynamically changing linguistic descriptions
  • model-checking and automatic verification of phonological grammars
  • finite state approaches to language variation and diachronic description
  • portability and long-term archiving of resources in computational morphology and morpho-syntax
  • constructing refreshed lexical transducers quickly from updated extended regular expressions
  • lexical transducers for tonal languages from auto-segmental accounts of spreading and alignment of articulatory features
  • learning and training finite state grammars from small samples
  • designs of weighted phonological and morphological grammars
  • feasible finite state restrictions of optimality-theoretic and multi-tiered phonology
  • efficient finite state re-implementation of competitively efficient ad hoc methods (see the article by S. Wintner in NLE 14(4) 2007).

3. Study of new methods with connections between languages, trees and finite state automata

The theory of classical string automata has a natural extension to tree automata, which found applications in NLP. In 1982, Joshi and Levy pointed out in Computational Linguistics 8(1) that phrase structure grammars actually generalize to tree automata that bring more descriptive power. The link between phrase-structure grammars, tree grammars and tree automata has enriched the study of enumerative grammars that generate trees and has given birth to model-theoretic grammars that instead describe trees. The study includes but is not restricted to:

  • representations of languages by tree automata and tree grammars
  • non-projectivity and partial commutativity of yields of tree automata and grammars
  • hierarchies of tree automata and tree grammars
  • weighted extensions of representations of languages and transductions
  • tree logics and model-theoretic syntax
  • language generation through tree logics
  • efficient and dynamic construction of automata from updated rules or logical formulas
  • representations of trees and automata-based query languages for tree banks
  • learning, training and minimization of representations of languages and transductions
  • semiring parsing and other generalizations of classical parsing algorithms
  • applications and case studies of methods based on the theory of tree automata.

In 1963, Chomsky and Schützenberger gave a morphic representation for the context-free languages i.e. the yields of local tree automata. The representation involves bracketed strings that encode the structure of the parse trees. Such use of bracketed strings and local trees gives rise to methods that decompose tree automata into simpler components many of which have regular yields. The relevant topics include:

  • representation of tree grammars and tree automata through decompositions into constraints
  • Chomsky-Schützenberger representations for tree grammars and weighted tree grammars
  • feasible constituent, dependency and alignment bracketing schemes for grammars, tree banks and parallel corpora
  • intersection grammars and conjunctive grammars with recognizable sets of string or trees
  • chart parsing, restarting automata, nested word automata and visibly pushdown languages
  • regular bracketed string languages that approximate sets of trees.

4. Study of advantages of restrictions defined in algebraic theories of automata and languages or in finite model theory

Feasible finite state methods in natural language processing are an application of a vast body of basic research in mathematics and computer science. There remain, however, situations where straightforward and general methods fail to be applicable or efficient. It is in these situations where NLP applications motivate interest in special families of regular relations, automata, semirings, and formal power series. Connections between such families and NLP tasks have not yet been fully elaborated. If we are lucky, the study of the connections will lead to new fundamental results that give rise to freshly identified subfamilies of finite state methodologies in NLP. The linguistically relevant restrictions on the power of general-purpose finite state methods might include:

  • finite ambiguity
  • sequential functions
  • step functions
  • idempotent semirings
  • locally finite semirings
  • descriptive complexity
  • deterministic transition closure
  • first-order definability
  • aperiodic, counter-free and star-free sets
  • unambiguous concatenation
  • partial commutativity
  • discounted weights
  • finite synchronization delay
  • limited center-embedding
  • local testability
  • finite bandwidth.

In addition, the topics of interest include tools that support experiments with these restrictions and their specialized algorithms:

  • new kinds of implementations of finite state compilers, libraries and on-demand operations
  • optimized algorithms for manipulation of automata or related formalisms
  • benchmarks suitable for evaluation of performance of algorithms
  • methods that construct, minimize or decompose automata or transducers.


Editorial Board

Guest Editors

  • Anssi Yli-Jyrä (Department of Modern Languages, University of Helsinki, Finland)
  • András Kornai (Harvard Univerity and Hungarian Academy of Sciences)
  • Jacques Sakarovitch (CNRS and Ecole Nationale Supérieure des Télécommunications, Paris, France)

Guest Editorial Board

  • Julie Berndsen (School of Computer Science & Informatics, University College Dublin, Ireland)
  • Francisco Casacuberta (Instituto Tecnologico De Informática, Valencia, Spain)
  • Jean-Marc Champarnaud (Université de Rouen, France)
  • Jan Daciuk (Gdańsk University of Technology, Poland)
  • Manfred Droste (Institut für Informatik, Universität Leipzig, Germany)
  • Dafydd Gibbon (Fakultät für Linguistik und Literaturwissenschaft, University of Bielefeld, Germany)
  • Colin de la Higuera (University of Nantes, France)
  • Lauri Karttunen (Palo Alto Research Center and Department of Linguistics, Stanford University, USA)
  • André Kempe (CADEGE Technologies & Consulting, France)
  • Kevin Knight (Computer Science Department, University of Southern California)
  • Hans-Ulrich Krieger (DFKI GmbH, Saarbrücken, Germany)
  • Marco Kuhlmann (Department of Linguistics and Philology, Uppsala University, Sweden)
  • Andreas Maletti (Universitat Rovira i Virgili, Spain)
  • Stoyan Mihov (Bulgarian Academy of Sciences, Sofia, Bulgaria)
  • Mark-Jan Nederhof (School of Computer Science, University of St Andrews, Scotland)
  • Kemal Oflazer (Sabanci University, Turkey)
  • Jakub Piskorski (Polish Academy of Sciences, Warsaw, Poland)
  • Michael Riley (Google Research, New York, USA)
  • Strahil Ristov (Rudjer Boskovic Institute, Zagreb, Croatia)
  • Max Silberztein (Université de Franche-Comté, France)
  • Bruce Watson (Dept. of Computer Science, University of Pretoria, South Africa)
  • Menno van Zaanen (Department of Communication and Information Sciences, Tilburg University, the Netherlands)


Submission

Articles submitted to this special issue must adhere to the NLE instructions for contributors. Style Guide and LaTeX style files can be found here. We encourage authors to keep their submissions below 30 pages.

Up to date information will be available at http://www.ling.helsinki.fi/projects/jnle/.

Contact and Submission of Manuscripts

Anssi Yli-Jyrä  
Department of Modern Languages 
University of Helsinki
firstname (dot) lastname-with-dash (at) helsinki (dot) fi
Web page

Important Dates

  • Call for papers: 23 January 2010
  • Deadline for submissions: 23 May 2010 (extension to the deadline is available on request)
  • First decision: (10 July 2010) 1 September 2010
  • Submission of revised version: (15 August 2010) 23 Sept 2010
  • Final decision: (22 September 2010) 21 October 2010
  • Submission of camera-ready versions: (23 October 2010) 28 October 2010