The Fifth International Workshop on Finite-State Methods and Natural Language Processing

University of Helsinki, Finland
1 - 2 September 2005
(This document will be replaced by a new, fully featured website by 15 February 2005. An unofficial version of it is now in a temporary location.)

Papers due: 25th April 2005

FSMNLP 2005 will be a forum for researchers working on applications of finite state methods (FSM) in natural language processing (NLP) or on the theoretical and implementation aspects of such finite-state methods that are relevant to NLP. The aim of the workshop is to bring together members of the academic, research, and industrial community working on finite-state based models in language technology, computational linguistics, linguistics and cognitive science or on related methods in fields such as computer science and mathematics.

We invite novel high-quality papers that are related to the themes including but not limited to:

  1. NLP applications and linguistic aspects of finite-state methods

    The topic includes but is not restricted to:
    &ndash speech, sign language, phonology, hyphenation, prosody
    – scripts, text normalization, segmentation, tokenization, indexing
    – morphology, stemming, lemmatisation, information retrieval, spelling correction
    – syntax, POS tagging, partial parsing, disambiguation, information extraction
    – machine translation, translation memories, glossing, dialect adaptation
    – annotated corpora and treebanks, semi-automatic annotation, error mining, searching

  2. Finite-state models of language

    With this more focused topic (inside 1) we invite papers on aspects that motivate sufficiency of finite-state methods or their subsets for capturing various requirements of natural language processing.

    The topic includes but is not restricted to:
    – performance, linguistic applicability, finite-state hypotheses
    – Zipf's law and coverage, model checking against finite corpora
    – regular approximations under parameterized complexity, limitations and definitions of relevant complexities such as ambiguity, recursion, crossings, rule applications, constraint violations, reduplication, exponents, discontinuity, path-width, and induction depth
    – similarity inferences, dissimilation, segmental length, counter-freeness, asynchronous machines
    – garden-path sentences, deterministic parsing, expected parses, Markov chains
    – incremental parsing, uncertainty, reliability/variance in stochastic parsing, linear sequential machines

  3. Practices for building lexical transducers for the world's languages.

    The topic accounts for usability of finite-state methods in NLP. It includes but is not restricted to:
    – required user training and consultation, learning curve of non-specialists
    –questionnaires, discovery methods, adaptive computer-aided glossing and interlinearization
    – example-based grammars, semi-automatic learning, user-driven learning (see topic 6 too)
    – low literacy level and restricted availability of training data, writing systems/phonology under development, new non-Roman scripts
    – linguist's workbenches, stealth-to-wealth parser development
    – endangered languages, experiences of using existing tools for computational morphology and phonology

  4. Specification and implementation of sets, relations and multiplicities in NLP using finite automata

    The topic includes but is not restricted to:
    – regular rule formalisms, grammar systems, expressions, operations, closure properties, complexities
    – algorithms for compilation, approximation, manipulation, optimization, and lazy evaluation of finite machines
    – finite string and tree automata, transducers, morphisms and bimorphisms
    – weights, registers, multiple tapes, alphabets, state covers and partitions, representations
    – locality, constraint propagation, star-free languages, data vs. query complexity
    – logical specification, MSO(SLR,matches), FO(Str,<), LTL, generalized restriction, local grammars

  5. Constraint-based grammars and k-ary regular relations

    With this more focused topic (inside 4) we invite researchers from related fields (computational linguists, mathematicians and computer scientists) into discussion that is motivated by constraint-based, declarative approaches to morphology/phonology and computational problems related to them. For example, regular relations in general are not closed under intersection, but restricted use of intersection of relations have proven useful in computational phonology and morphology, and their implementations such as KIMMO, PC-KIMMO, TWOLC, SEMHE, AMAR, WFSC, etc. In the future, new useful approaches and implementations may come up. The approaches may also propagate to other application areas in natural language processing, including finite-state syntax and query languages for parallel annotations in linguistic corpora.

    The topic includes but is not restricted to:
    – multi-tape automata, same-length relations and partition-based morphology, Semitic morphology
    autosegmental phonology, shuffle, trajectories, synchronization, segmental anchoring, alignment constraints, syllable structure, partial-order reductions
    – problems related to auto-intersection of multi-tape automata e.g. marked Post Correspondence Problem
    – varieties of regular languages and relations, descriptive complexity of finite-state based grammars
    – automaton-based approaches to declarative constraint grammars, constraints in optimality theory
    parallel corpus annotations, register automata, acyclic timed automata

  6. Machine learning of finite-state models of natural language

    This topic includes but is not restricted to:
    – learning regular rule systems, learning topologies of finite automata and transducers
    – parameter estimation and smoothing, lexical openness
    – computer-driven grammar writing, user-driven grammar learning, discovery procedures
    – data scarcity, realistic variations of Gold's model, learnability and cognitive science
    – incompletely specified finite-state networks
    – model-theoretic grammars, gradient well/ill-formedness

  7. Finite-state manipulation software (with relevance to the above themes)

    This topic includes but is not restricted to
    – regular expression pre-compilers such as regexopt, xfst2fsa, standards and interfaces for finite-state based software components, conversion tools
    – tools such as LEXC, Lextools, Intex, XFST, FSM, GRM, WFSC, FIRE Engine, FADD, FSA/UTR, SRILM, FIRE Station and Grail
    – free software such as FSA Utilities, Unitex, OpenFIRE, Vaucanson, SFST, PCKIMMO, MONA, Hopskip, ASTL, UCFSM, HaLeX, SML, and WFST
    – results obtainable with such exploration tools as automata, Autographe, Amore, and TESTAS
    – visualization tools such as Graphviz and Vaucanson-G
    – language-specific resources and descriptions, freely available benchmarking resources

The descriptions of the topics above are not meant to be complete. The submitted papers or abstracts may fall in several categories.


Paper/poster submissions due: 25th April
Notifications sent out: 25th May
Deadline for early registration: 10th June
Abstracts for software demos due: 10th June
Camera-ready papers due: 20th June


We expect three kinds of submissions:

  1. full papers,
  2. interactive presentations (posters) and
  3. software demos.

Submissions are electronic and in PDF format. The link to the the web-based submission server will be inserted here. The information about the author(s) should be omitted in the submitted papers. The authors' names, affiliation(s), address(es) and e-mail address(es) will be provided separately on the submission.


Submissions have to be in English, which is the workshop language.

Full papers should consist of no more than 11pt, 8 single column, single spaced pages. Interactive presentations should be submitted by providing an abstract of no more than 2 pages. Submission of demo abstracts is encouraged at the same time as full papers and posters.

The use of LaTeX for generating the PDF document is strongly encouraged because this facilitates preparation of the final version and possible later submission to journals. For the accepted papers and poster abstracts, a LaTeX style will be made available for the preparation of the final version. If you have any questions, please contact anssi.yli-jyra through firstname.lastname@ling.helsinki.fi, using the string "FSMNLP/inquiry" in the Subject field.


The final versions of papers and abstracts will be published both online and/or on CD-ROM (with an official ISBN number), as well as a technical report if there are many participants who would like to have printed proceedings.


In addition to the proceedings, extended versions of selected papers from the FSMNLP 2005 proceedings will be solicited for further publication in a collection of articles, published by an international publisher.


We are planning to reserve a journal issue so that selected papers could be invited for submission for a special issue of an international journal. After earlier FSMNLP and similar workshops, the following special issues have been published:

Currently, we cannot say whether a similar special issue will be realized this time or not.


Steven Bird (University of Melbourne, Australia) — Francisco Casacuberta (Universitat Politècnica de València, Spain) — Jean-Marc Champarnaud (Université de Rouen, France) — Jan Daciuk (Gdansk University of Technology, Poland) — Jason Eisner (Johns Hopkins University, USA) — Tero Harju (University of Turku, Finland) — Arvi Hurskainen (Institute for Asian and African Studies, University of Helsinki, Finland) — Juhani Karhumäki (University of Turku, Finland, co-chair) — Lauri Karttunen (PARC and Stanford University, USA, co-chair) — André Kempe (Xerox Research Centre Europe, France) — George Anton Kiraz (Beth Mardutho: The Syriac Institute, USA) — Andras Kornai (Budapest Institute of Technology, Hungary) — Terence Langendoen (University of Arizona, USA) — Eric Laporte (Université de Marne-la-Vallée, France) — Mike Maxwell (Linguistic Data Consortium, USA) — Mark-Jan Nederhof (University of Groningen, the Netherlands) — Gertjan van Noord (University of Groningen, the Netherlands) — Kemal Oflazer (Sabanci University, Turkey) — Jean-Eric Pin (CNRS/University Paris 7, France) — James Rogers (Earlham College, USA) — Giorgio Satta (University of Padua, Italy) — Jacques Sakarovitch (CNRS/ENST, France) — Richard Sproat (University of Illinois at Urbana-Champaign, USA) — Nathan Vaillette (University of Tübingen, Germany) — Atro Voutilainen (Connexor, Finland) — Bruce W. Watson (University of Pretoria, South Africa) — Shuly Wintner (University of Haifa, Israel) — Sheng Yu (University of Western Ontario, Canada) — Lynette van Zijl (Stellenbosch University, South Africa)


Information about the registration procedure will be provided later. Participant's registration free is normally 100 EUR, but students will not need to pay that much.


The workshop will take place in the University of Helsinki. The organizing institution is the Department of General Linguistics in the University of Helsinki. The local committee is headed by Anssi Yli-Jyrä at CSC — Scientific Computing Ltd. (also a Ph.D. student at the University of Helsinki). Several institutions in Finland are co-operating with the organizing committee in order to make the event well-received in the national and international research community. The official call for papers was send out on 2 Feb 2005, but the homepage at http://www.ling.helsinki.fi/events/FSMNLP2005 was already indexed by Google somewhat earlier.

The workshop is a follow-up for some earlier workshops, but also continues their dynamic, changing tradition. FSMNLP workshops have traditionally had tutorial lessons and/or invited speakers. These workshops and courses are under different names and time intervals:


There are initial thoughts about a national, one-day workshop on Automata, Words and Languages (AWL) that would take place in Helsinki just before FSMNLP. An AWL workshop was arranged in 2002 at the University of Turku.


 CSC — Scientific Computing Ltd., Espoo, Finland

The complete list of sponsors to be announced later.