The purpose of this page is to collect links and information on our methdo for compilation of replace rules, two-level rules and other finite-state based rule formalism with rewrite-rules.
In 2002, I discovered a bug in XFST when compiling compound context restrictions (CCR) and showed that the correct semantics of CCR is expressible with star-free regular operators. The relevant publications:
In 2003, I found a simpler formula that captured the correct semantics of CCR. This formula suggested generalizations and the operation of Generalized Restriction (GR) was introduced. Constraints with additional preconditions were introduced with applications to my co-author's idea about layerization of rules.
I found in 2006 that Generalized Restriction lends itself to compilation of individual Two-Level Rules. Kimmo Koskenniemi provided a lot of valuable critique and helped me to relate the ideas to the main formalisms of Two-Level Morphology. During the research, I found that various combinations of rules can be compiled with a single GR, if they are combined under union inside the operands of GR. This GR based approach to compilation of Two-Level Grammars provided a number of extensions, including more general context conditions, additional rule types, more conflict resolutions, rules with multi-character changes, and longest application principle. I like to call the new family of systems Generalized Two-Level Morphology (GTWOL), although at this point the semantics of GTWOL was not precisely coupled to a particular definition since the semantics of the ideal compilation formula was developed incrementally in the paper.
In December 2005, Kimmo Koskenniemi prompted me to look for a new approach for compilation of replace rules. I found the core ideas of method of already in December 2005. The approach was based on Two-Level Morphology. Kimmo contributed to the approach mostly when planning the FinTAL paper, since the existence of the longest application principle in GTWOL is crucial for the replace rule compilation method.
In 2007, I was an invited scholar in BUTE, Hungary. There I rewrote the paper that was originally planned for CIAA 2007. The paper was to describe the rich possibilities of the new compilation method for replace rules more in detail compared to the poster by Anssi Yli-Jyrä and Koskenniemi, 2007. This paper was still rewritten after its acceptance to FSMNLP 2007, and again for the final proceedings. The final proceedings is still in print. Nevertheless, I recommend the final version that substantially improves the presentation and takes the feedback in the FSMNLP 2007 meeting into account. In particular, the paper describes the GTWOL and Bracketed GTWOL systems that are used inside the compilation method, and discusses the semantics of the obligatory replacement.
In BUTE, I developed a branch of the SFST tool that includes functionalities for the GR operation and BGTWOL. I used my version, called AFST, when testing my compilation method for parallel replace rules. The system remained, however, problematic, since SFST and AFST did not contain full support for open alphabets, the "other" symbol in the transducers and unknown universal languages. Due to these technical problems, the software demo was not given as such, but a running system was still seen during the presentation of Anssi Yli-Jyrä, 2007b. The AFST code is available here. Currently, the HFST team in Helsinki is going to incorporate some of its ideas to HFST and add the missing support for open alphabets.
I promised in 2007 to give an invited talk in FSMNLP 2007. I knew already how important it is to understand the possibilities of the GR operator. Much in my current research stems from this paper and I consider the paper as an important landmark in the history of GR operator. The paper puts forward a number of open research questions but also gives a lot of structure to the landscape of GR applications. In addition, the paper presents dozens of new formulae that illustrate important uses of the operator for the first time. The title of the talk was altered in the final version which is also much more verbose, formally correct and readable. The final version has 24 pages and its second display formula (formula 32) in Section 6.1 contains a visible Greek epsilon symbol. The final FSMNLP 2007 proceedings will contain this final version or the last but one version which is similar with respect to all other sections. The proceedings is still in print as far as I know, but it will be an open-access publication when available.
My plan is that this page would eventually contain information also on the prior and current state of the art and the implementations. Including, for example, the methods of: