HFST-supported_language_project_Example-vot_twolc-lexc


The problem: How to code the declension of the Votic noun tüttö 'girl'

The Votic language is a Baltic-Finnic member of the Uralic language family. This language, like many of the other Baltic-Finnic languages, has an extensive regular inflection, and attests to vowel harmony and gradation. The majority of inflection involves the use of suffixes.

There are many solutions to the coding of these phenomena in closely related languages, so the object of this exercise is to find one which can be readily applied to multiple reuse, as is common practice at Giellatekno.


One possible combinatory solution

  1. Vowel harmony in the suffixes is highly predictable. If we see a word with the front vowels ü or ö, we can be nearly certain that any vowels in subsequent inflections will be front vowels. Lexc continuation lexica alone will be used to impliment vowel harmony.
  2. The stem is a building block whose form changes according to inflectional context. Some contexts are readily observed in the surface level of the language, whereas others are opaque. Twolc rules in combination with Lexc continuation lexica will be applied to produce stem variation.
  3. The interface between stem and suffix may include vowel changes. Vowel change will also be produced with the help of Twolc rules and Lexc continuation lexica.


The LEXC

The noun tüttö 'girl' has two stems: strong grade, and weak grade.

With one word a simple enumeration of all forms will provide us with the general picture.

  1. Root
  2. NounRoot
  3. End
LEXICON Root
NounRoot ;

LEXICON NounRoot
!! Nominative singular, strong grade
tüttö+N+Sg+Nom:tüttö END ;
!! Genitive singular and associated comitative, weak grade
tüttö+N+Sg+Gen:tütö END ;    !!= * @CODE@ tütö
tüttö+N+Sg+Com:tütö%>ka END ; !!= * @CODE@ tütöka
!! Partitive, illative and terminative singular, strong grade
tüttö+N+Sg+Par:tüttö END ;  !!= * @CODE@ tüttö
tüttö+N+Sg+Ill:tüttö%>se END ;  !!= * @CODE@ tüttöse
tüttö+N+Use/NG+Sg+Ill:tüttö END ;  !!= * @CODE@ tüttö
tüttö+N+Sg+Ter:tüttö%>ssaa END ;  !!= * @CODE@ tüttössaa
!! Other oblique cases singular, weak grade
tüttö+N+Sg+Ine:tütö%>z END ;  !!= * @CODE@ tütöz
tüttö+N+Sg+Ela:tütö%>sse END ;  !!= * @CODE@ tütösse
tüttö+N+Sg+All:tütö%>lle END ;  !!= * @CODE@ tütölle
tüttö+N+Sg+Ade:tütö%>lle END ;  !!= * @CODE@ tütölle
tüttö+N+Sg+Abl:tütö%>lte END ;  !!= * @CODE@ tütölte
tüttö+N+Sg+Tra:tütö%>ssi END ;  !!= * @CODE@ tütössi
!! Nominative plural, weak grade
tüttö+N+Pl+Nom:tütö%>d END ;  !!= * @CODE@ tütöd
!! Oblique plural, strong grade
tüttö+N+Pl+Gen:tüttö%>je END ;  !!= * @CODE@ tüttöije
tüttö+N+Use/NG+Pl+Gen:tüttö%>i END ;  !!= * @CODE@ tüttöi
tüttö+N+Pl+Par:tüttö%>i END ;  !!= * @CODE@ tüttöi
tüttö+N+Use/NG+Pl+Par:tüttö%>i%>te END ;  !!= * @CODE@ tüttöite
tüttö+N+Pl+Ill:tüttö%>i%>se END ;  !!= * @CODE@ tüttöise
tüttö+N+Use/NG+Pl+Ill:tüttö%>i END ;  !!= * @CODE@ tüttöi
tüttö+N+Pl+Ine:tüttö%>i%>z END ;  !!= * @CODE@ tüttöiz
tüttö+N+Pl+Ela:tüttö%>i%>sse END ;  !!= * @CODE@ tüttöisse
tüttö+N+Pl+All:tüttö%>i%>lle END ;  !!= * @CODE@ tüttöille
tüttö+N+Pl+Ade:tüttö%>i%>lle END ;  !!= * @CODE@ tüttöille
tüttö+N+Pl+Abl:tüttö%>i%>lte END ;  !!= * @CODE@ tüttöilte
tüttö+N+Pl+Tra:tüttö%>i%>ssi END ;  !!= * @CODE@ tüttöissi
tüttö+N+Pl+Ter:tüttö%>i%>ssaa END ;  !!= * @CODE@ tüttöissaa
tüttö+N+Pl+Com:tüttö%>i%>ka END ;  !!= * @CODE@ tüttöika

LEXICON END
# ;

Here the two stems have been directed through five continuation lexica.

LEXICON Root
NounRoot ;

LEXICON NounRoot
tüttö+N:tüttö FRONT_NMN_SG-NOM ;  !!= * @CODE@ +Sg+Nom
tüttö+N:tütö FRONT_NMN_SG-GEN-STEM ;  !!= * @CODE@  +Sg+Gen, +Sg+Com
tüttö+N:tüttö FRONT_NMN_SG-PAR-STEM ;  !!= * @CODE@ +Sg+Par, +Sg+Ill, +Sg+Ter
tüttö+N:tütö FRONT_NMN_SG_INE-STEM ;  !!= * @CODE@ 
!! +N+Sg+Ela, +N+Sg+All, +N+Sg+Ade, +N+Sg+Abl, +N+Sg+Tra, +N+Pl+Nom
tüttö+N:tüttö FRONT_NMN_PL-OBL ;   !!= * @CODE@ tüttö-

LEXICON FRONT_NMN_SG-NOM
+Sg+Nom: FRONT_K ;

LEXICON FRONT_NMN_SG-GEN-STEM
+Sg+Gen: FRONT_K ;    !!= * @CODE@ tütö
+Sg+Com:%>ka BACK_K ; !!= * @CODE@ tütöka

LEXICON FRONT_NMN_SG-PAR-STEM
+Sg+Par: FRONT_K ;  !!= * @CODE@ tüttö
+Sg+Ill:%>se FRONT_K ;  !!= * @CODE@ tüttöse
+Use/NG+Sg+Ill: FRONT_K ;  !!= * @CODE@ tüttö
+Sg+Ter:%>ssaa BACK_K ;  !!= * @CODE@ tüttössaa

LEXICON FRONT_NMN_SG_INE-STEM
+Sg+Ine:%>z FRONT_K ;  !!= * @CODE@ tütöz
+Sg+Ela:%>sse FRONT_K ;  !!= * @CODE@ tütösse
+Sg+All:%>lle FRONT_K ;  !!= * @CODE@ tütölle
+Sg+Ade:%>lle FRONT_K ;  !!= * @CODE@ tütölle
+Sg+Abl:%>lte FRONT_K ;  !!= * @CODE@ tütölte
+Sg+Tra:%>ssi FRONT_K ;  !!= * @CODE@ tütössi
+Pl+Nom:%>d FRONT_K ;  !!= * @CODE@ tütöd

LEXICON FRONT_NMN_PL-OBL
+Pl+Gen:%>je FRONT_K ;  !!= * @CODE@ tüttöije
+Use/NG+Pl+Gen:%>i FRONT_K ;  !!= * @CODE@ tüttöi
+Pl+Par:%>i FRONT_K ;  !!= * @CODE@ tüttöi
+Use/NG+Pl+Par:%>i%>te FRONT_K ;  !!= * @CODE@ tüttöite
+Pl+Ill:%>i%>se FRONT_K ;  !!= * @CODE@ tüttöise
+Use/NG+Pl+Ill:%>i FRONT_K ;  !!= * @CODE@ tüttöi
+Pl+Ine:%>i%>z FRONT_K ;  !!= * @CODE@ tüttöiz
+Pl+Ela:%>i%>sse FRONT_K ;  !!= * @CODE@ tüttöisse
+Pl+All:%>i%>lle FRONT_K ;  !!= * @CODE@ tüttöille
+Pl+Ade:%>i%>lle FRONT_K ;  !!= * @CODE@ tüttöille
+Pl+Abl:%>i%>lte FRONT_K ;  !!= * @CODE@ tüttöilte
+Pl+Tra:%>i%>ssi FRONT_K ;  !!= * @CODE@ tüttöissi
+Pl+Ter:%>i%>ssaa BACK_K ;  !!= * @CODE@ tüttöissaa
+Pl+Com:%>i%>ka BACK_K ;  !!= * @CODE@ tüttöika

LEXICON BACK_K
# ;
LEXICON FRONT_K
# ;

One of the short-comings of this solely lexc-supported description is that several original entries are given. In OMorFi the solution is simply to shorten the stem to the syllable tüt and then provide continuation lexica in or ö. Here, however, the solution will be sought in two-level rules.


The TWOLC

!! =================================== !
!! The Votic morphophonological/twolc rules file !
!! =================================== !

Alphabet
a b č d e f g h i j k l m n o p r s š z ž t u v õ ä ö ü 
A B Č D E F G H I J K L M N O P R S Š Z Ž T U V Õ Ä Ö Ü 

ʼ !! U+02BC MODIFIER LETTER APOSTROPHE

!! Triggers
%^WGStem:0

%> ;

Sets

 Vow = a e i o u õ ä ö ü 
       A E I O U Õ Ä Ö Ü ;
 Cns = b č d f g h j k l m n p r s š t v z ž 
       B Č D F G H J K L M N P R S Š T V Z Ž ;
 
Rules

!! CONSONANTS
"tt:t0, from strong grade to weak grade"
!! __@RULENAME@__
 t:0 <=> t _ Vow %^WGStem: ;
!! tüttö+N+Sg+Gen: __girl/tyttö__
!€ tüttö%^WGStem
!€ tüt0ö0



A complete TWOLC and LEXC readout.

Multichar_Symbols  !!≈ !!!Definitions for @CODE@

!! The parts-of-speech are:

+N              !!= * @CODE@

!! The nominals are inflected in the following Case and Number

+Sg             !!= * @CODE@ singular
+Pl             !!= * @CODE@ plural

+Abl            !!= * @CODE@ ablative
+Ade            !!= * @CODE@ adessive
+All            !!= * @CODE@ allative
+Com            !!= * @CODE@ comitative
+Ela            !!= * @CODE@ elative
+Gen            !!= * @CODE@ genitive
+Ill            !!= * @CODE@ illative
+Ine            !!= * @CODE@ inessive
+Ela            !!= * @CODE@ elative
+Nom            !!= * @CODE@ nominative
+Ter            !!= * @CODE@ terminative
+Tra            !!= * @CODE@ translative


LEXICON Root
NounRoot ;

LEXICON NounRoot
tüttö:tüttö N_TÜTTÖ ;

LEXICON N_TÜTTÖ !!= * @CODE@ tüttö:tüttö
+N+Sg+Nom: FRONT_K ;  !!= * @CODE@ tüttö
+N:%^WGStem FRONT_NMN_SG-GEN-STEM ;  !!= * @CODE@ tütö
!! +Sg+Gen, +Sg+Com
+N: FRONT_NMN_SG-PAR-STEM ;  !!= * @CODE@ tüttö
!! +N+Sg+Ill, +N+Sg+Ter
+N:%^WGStem FRONT_NMN_SG_INE-STEM ;  !!= * @CODE@ tütö-
!! +N+Sg+Ela, +N+Sg+All, +N+Sg+Ade, +N+Sg+Abl, +N+Sg+Tra, +N+Pl+Nom
+N: FRONT_NMN_PL-OBL ;   !!= * @CODE@ tüttö-

LEXICON FRONT_NMN_SG-NOM
+Sg+Nom: FRONT_K ;

LEXICON FRONT_NMN_SG-GEN-STEM
+Sg+Gen: FRONT_K ;    !!= * @CODE@ tütö
+Sg+Com:%>ka BACK_K ; !!= * @CODE@ tütöka

LEXICON FRONT_NMN_SG-PAR-STEM
+Sg+Par: FRONT_K ;  !!= * @CODE@ tüttö
+Sg+Ill:%>se FRONT_K ;  !!= * @CODE@ tüttöse
+Use/NG+Sg+Ill: FRONT_K ;  !!= * @CODE@ tüttö
+Sg+Ter:%>ssaa BACK_K ;  !!= * @CODE@ tüttössaa

LEXICON FRONT_NMN_SG_INE-STEM
+Sg+Ine:%>z FRONT_K ;  !!= * @CODE@ tütöz
+Sg+Ela:%>sse FRONT_K ;  !!= * @CODE@ tütösse
+Sg+All:%>lle FRONT_K ;  !!= * @CODE@ tütölle
+Sg+Ade:%>lle FRONT_K ;  !!= * @CODE@ tütölle
+Sg+Abl:%>lte FRONT_K ;  !!= * @CODE@ tütölte
+Sg+Tra:%>ssi FRONT_K ;  !!= * @CODE@ tütössi
+Pl+Nom:%>d FRONT_K ;  !!= * @CODE@ tütöd

LEXICON FRONT_NMN_PL-OBL
+Pl+Gen:%>je FRONT_K ;  !!= * @CODE@ tüttöije
+Use/NG+Pl+Gen:%>i FRONT_K ;  !!= * @CODE@ tüttöi
+Pl+Par:%>i FRONT_K ;  !!= * @CODE@ tüttöi
+Use/NG+Pl+Par:%>i%>te FRONT_K ;  !!= * @CODE@ tüttöite
+Pl+Ill:%>i%>se FRONT_K ;  !!= * @CODE@ tüttöise
+Use/NG+Pl+Ill:%>i FRONT_K ;  !!= * @CODE@ tüttöi
+Pl+Ine:%>i%>z FRONT_K ;  !!= * @CODE@ tüttöiz
+Pl+Ela:%>i%>sse FRONT_K ;  !!= * @CODE@ tüttöisse
+Pl+All:%>i%>lle FRONT_K ;  !!= * @CODE@ tüttöille
+Pl+Ade:%>i%>lle FRONT_K ;  !!= * @CODE@ tüttöille
+Pl+Abl:%>i%>lte FRONT_K ;  !!= * @CODE@ tüttöilte
+Pl+Tra:%>i%>ssi FRONT_K ;  !!= * @CODE@ tüttöissi
+Pl+Ter:%>i%>ssaa BACK_K ;  !!= * @CODE@ tüttöissaa
+Pl+Com:%>i%>ka BACK_K ;  !!= * @CODE@ tüttöika

LEXICON BACK_K
# ;
LEXICON FRONT_K
# ;


N_TÜTTÖ declension type with front-harmony continuation lexica can be found here
The morphophonological rules are located here



Contact Jack Rueter: first_name.surname(åt)helsinki.fi .


Last modified: Thu Jun 11 9:26:17 EEST 2015