Two-Level Morphology Day (TWOL Day)


Applications and Problems of Finite-State Morphology in Less Studied Languages

August 31, 2005
Helsinki, Finland
http://www.ling.helsinki.fi/events/TWOLDAY2005

Venue

Address: Fabianinkatu 33, Helsinki. University Main Building ("Yliopiston Päärakenus" in Finnish), (auditorium IV), Wednesday, 31th August, 12:00 - 18:00.

Call for Papers

The TWOLDAY Background and Call For Papers has been archived to this location.

Organizers of TWOL Day 2005

- Arvi Hurskainen (Institute for Asian and African Studies, University of Helsinki, Finland)
- Anssi Yli-Jyrä (CSC, the Finnish IT center for science, Finland)

Contact: twolday AT ling DOT helsinki DOT fi


The TWOL Day mini-workshop program

Wednesday, August 31, 2005

11:30am-12:15pm Registration
12:15am-12:30pm Opening
12:30-13:00pm Building the Estonian Finite State Morphology
Presenter:
Heli Uibo, University of Tartu
Full Abstract: [click here]
Short Abstract:
The Estonian morphology is complex the number of different inflected forms is 28 for nouns and about 70 for verbs. The Estonian language is both agglutinative and flective, thus it is natural to model its morphotactics by the network of lexicons and its stem flexion rules by rules. The Estonian finite state morphology has been implemented using the XEROX tools LEXC, TWOLC and XFST. For modelling derivation a new solution has been proposed. It is shown that the two-level representation is useful for the description of the Estonian stem internal changes, especially because the stem flexion type does not depend on the phonological shape of a stem in the contemporary Estonian any more. The network of lexicons, combined with rules, having effect on morpheme boundaries, naturally describe the morphotactic processes. The lexicons are also useful for describing the non-phonologically caused stem end alternations. Some open problems remain to be solved in the ongoing research on Estonian finite-state morphology.


13:00-13:30pm Non-concatenative processes in Kabyle
Presenters:
Sinikka Loikkanen
Abstract:
We will describe non-concatenative processes in Kabyle (Berber). Berber is an Afro-Asiatic language (formerly known as Hamito-Semitic) spoken by about 12 million people in the Northern Africa, Kabyle is a variant spoken in Algeria. Kabyle has a templatic morhology in which words inflect by internal changes as well as prefixes and suffixes. On the one hand, Semitic-like stem inderdigitation can be used in verbal morphology (use of flag diacritics and the compile-replace algorithm). But, on the other hand, that kind of word formation does not always work, specially with nouns, since many roots can have more than one structure and sense, in that case, stems need to be specified lexically. In some cases even concatenation of morphemes can be used. In this presentation, we will describe the problems related to word formation processes and morphological analysis in Kabyle.

13:30-14:00pm A Two-Level Morphology of Malagasy
Presenters:
Mary Dalrymple and Lisa Mackie
Centre for Linguistics and Philology, Oxford University Walton Street, Oxford OX1 2HG UK
{mary.dalrymple,lisa.mackie}@ling-phil.ox.ac.uk
Full Abstract: [click here]
Short Abstract:
We present a two-level model of Malagasy nominal and verbal morphology, based primarily on the discussion of Malagasy morphology in Keenan and Polinsky (1998). Malagasy is an Austronesian language spoken by about six million people on the island of Madagascar. With Welsh, it is a focus of the Verb- Initial Grammars subproject within the PARGRAM initiative (http://users.ox.ac.uk/~cpgl0015/pargram/), a collaborative project to develop computational lexicons and grammars within the shared linguistic framework of Lexical Functional Grammar (Butt et al., 2002). Because of the complicated and productive patterns of Malagasy verbal and nominal morphology, the development of such a grammar relies heavily on a computational component for morphological analysis.


14:00-14:30pm Welsh Initial Mutations: An FST Analysis
Presenter:
Ingo Mittendorf and Louisa Sadler, University of Essex
Full Abstract: [click here]
Short Abstract:
For the Welsh part of the project Verb Initial Grammars: a Multilingual/Parallel Perspective we have been developing a morphological analyser using Xerox Finite State tools (xfst, lexc). The morphological analyser is integrated into the Xerox Linguistic Environment (XLE) platform which we use to write our Lexical Functional Grammar (LFG) based grammar of Welsh. All of this is work in progress. One of our first challenges was the Welsh system of Initial Mutations: the initial phoneme of a word regularly alternates with other phonemes in several sets called "initial mutations". These different initial mutations appear in specific lexical or syntactical environments.


14:30am-15:00pm Coffee Break
15:00-15:30pm Implementing Ndonga Verbal Morphology with Finite State Tools
Presenter:
Minttu Hurme, Department of African and Asian Studies, University of Helsinki
Full Abstract: [click here]
Abstract:
Ndonga, as is typical to all Bantu languages, has a very rich verbal morphology, which includes also some non-concatenative phenomena. The Xerox Finite State Tools system, which was used to implementate the Ndonga verbal morphology, has two features specifically designed to deal with the non-concatenative morphology: flag diacritics and the compile-replace algorithm. The inflectional circumfixes and restricting the combinations of extensions were quite simple to program using the flag diacritics. The reduplication proved to be more problematic. While the compile-replace algorithm worked very well when a limited test set of stems was used, it caused fatal memory problems to XFST when applied to an unlimited set of real stems.


15:30-16:00pm Describing Non-Concatenative Processes in Bantu Languages
Presenters:
Arvi Hurskainen, University of helsinki
George Poulosm, University of South Africa
Louis Louwrens, University of South Africa
Full Abstract: [click here]
Short Abstract:
In this paper we discuss the problems pertaining to non-concatenative processes in Bantu languages. For example, verbs undergo processes of productive derivation, reduplication, and inflection, and there can be up to 15 morpheme slots. While derivation and inflection can be handled as concatenation of morphemes, reduplication cannot. We are particularly interested in how to handle verbs in disjoining writing systems, where part of verb morphemes are written as separate words. We suggest that in disjoining writing systems verb structures are first identified with a specially constructed tokeniser and then analysed with a morphological analyser. Such a tokeniser requires, in addition to identifying words, punctuation marks and diacritics, also identifying sequences of such 'words' that are part of the verb. Test languages used in this study are Kwanyama (Hurskainen and Halme 2001) and Northern Sotho.


16:00-16:30pm Morphological Parsing of Tone: An Experiment with Two-Level Morphology on the Ha language
Presenters:
Lotta Harjula, Institute for Asian and African Studies, University of Helsinki, Finland
Full Abstract: [click here]
Short Abstract:
Morphological parsers are typically developed for languages without contrastive tonal systems. Ha, a Bantu language of Western Tanzania, proposes a challenge to these parses with both lexical and grammatical pitch-accent (Harjula 2004) that would, in order to describe the tonal phenomena, seem to require an approach with a separate level for the tones. However, since the Two-Level Morphology (Koskenniemi 1983) has proven successful with another Bantu language, Swahili, it is worth testing its possibilities with the tonally more challenging Bantu languages.


16:30-17:00pm An Initiative for an Open and Extendible Finite-State Morphology Workbench
Presenters:
Anssi Yli-Jyrä
Abstract:
Many less studied languages are typically of minor commercial interest, but still in need of better language technology support. Unfortunately, such languages tend to get a smaller share in lotteries for research and development funding. In order to develop essential language technology, the experts working in less studied languages need the most cost-efficient tools and practices at their hands.

The purpose of this talk is to stimulate discussion on the future of TWOL and comparable finite-state formalisms. I propose a tentative model that allows the TWOLC language and comparable tools for linguists to become part of an open-ended and extendible collection of software. In this framework, it would be possible to add new features to the user's interface, the formalism or to the finite-state network layer by developing components that correspond to each one's expertice, as follows:
  1. linguists: macros, definitions, ideas for user's interface
  2. computer linguists: grammar formalisms and compilers
  3. computer scientists: manipulation algorithms, data structures
The framework would be based on an open source middleware that would glue together both open source and commercial libraries, modules and formalisms. The resulting interfaces promote reusability, identification of missing components, distributed software development, interchangeability, competition and technical evolution.