1. Background
The Research Unit for Computational Linguistics, henceforth abbreviated RUCL, was founded for a five-year period from the beginning of 1985. Its official names in English, Finnish, and Swedish were:
The Research Unit for Computational Linguistics at the University of Helsinki
Helsingin yliopiston tietokonelingvistiikan tutkimusyksikkö
Forskningsenheten för datalingvistik vid Helsingfors universitet.
RUCL has been working under the auspices of the Board of the University of Helsinki (Finn. "konsistorin alainen erillislaitos"). In practice, it has simultaneously worked as an integrated part of the Department of General Linguistics at the University of Helsinki.
RUCL was founded in 1984 by virtue of an agreement entered by the Academy of Finland, the University of Helsinki, and professor Fred Karlsson. The founding of RUCL was preceded by an international evaluation conducted by professor Sture Allén (University of Gothenburg), professor Martin Kay (Stanford University), and professor Lauri Karttunen (Stanford University). The first five-year period of RUCL ended in 1989.The members of the Steering Group 1985-1989 were:
Professor Ossi Ihalainen, Chairman 1989, University of Helsinki
Professor Erik Andersson, Åbo Akademi University
Professor Benny Brodda, University of Stockholm
Professor Lauri Karttunen, Stanford University
Professor Pekka Sammallahti, University of Oulu, Chairman 1985-1988
After an evaluation conducted by the Steering Group of RUCL in 1989, a second five-year period was granted for 1990-1994. The agreement was signed on December 19, 1989, by Professor Seikko Eskola on behalf of the Council of the Humanities (Academy of Finland), by Rector Päiviö Tommila on behalf of the University of Helsinki, and by Fred Karlsson. On September 19, 1992, the Council of the Humanities (Valtion humanistinen toimikunta) nominated a partially new Steering Group for RUCL for the remaining period 1992-1994:
Professor Pekka Sammallahti, Chairman, University of Oulu
Professor Benny Brodda, University of Stockholm
Professor Lauri Karttunen, Stanford University
Professor Kimmo Koskenniemi, University of Helsinki
Professor Jouko Lindstedt, University of Helsinki
Planning Secretary Eija-Maija Kotilainen, Academy of Finland
The last meeting of the Steering Group was on May 8, 1993. The minutes of that meeting are appended to the present report (Appendix 1). Yearly reports have been submitted to the Council of the Humanities. In 1994 no Steering Group meeting was held. That year the main achievements of RUCL were evaluated two times in connection with the searches for Centers of Excellence conducted by the Academy of Finland and the University of Helsinki.
Professor Fred Karlsson has been Director of RUCL during the whole period 1985-1994, alongside ordinary duties as professor of General Linguistics at the University of Helsinki.
The activities of RUCL came to an end in December, 1994. No more extensions
were applied for under the name of RUCL. However, the work of RUCL is continued at
The Research Unit for Multilingual Language Technology that was founded by the Board
of the University of Helsinki as a Center for Excellence for the period 1995-1999 (Appendix 2). The written evaluation submitted in that connection by professor Mark Liberman,
Department of Linguistics, University of Pennsylvania, is enclosed as Appendix 3.
2. Budget
According to the original agreement, the following staff was assigned to RUCL for the period 1990-1994:
Academy of Finland
- 2 senior researchers (A25)
- two one-year stipends for advanced researchers ("varttuneen tieteenharjoittajan apuraha", 1992 and 1994)
University of Helsinki
- senior computer officer ("atk-alan erikoistutkija")
- computer officer ("pääsuunnittelija", plkr 720)
- researcher
The senior researcher positions were originally reserved for Dr. Kimmo Koskenniemi and Dr. Lauri Carlson. As both were appointed professors at the University of Helsinki (Koskenniemi in 1992, Carlson in 1993), the senior researcher positions were converted into research grants for hiring junior researchers (470,100 FIM on May 22, 1992, and 274,500 FIM on May 28, 1993). Formal agreements between the three parties (Academy of Finland, University of Helsinki, Fred Karlsson) were signed on July 13, 1992, and August 10, 1993, respectively.
In the original agreement, the Academy of Finland gave an additional five-year grant of 1,002,100 FIM (roughly 200,000 FIM a year) split into the following budgetary sections:
- secretary (20 h/w, plkr 334) 248,100
- research aid ("aputyövoima") 369,000
- travel etc. 250,000
- publications 55,000
- computer equipment 80,000
All funds have been used, except the two one-year stipends for advanced researchers originally reserved for Fred Karlsson (due to excessive external duties: Chairman of the Group for Evaluating the Humanities 1991-1993, Acting Dean of the Faculty of Humanities, University of Helsinki, in the academic year 1993-1994).
The funds provided by the Academy of Finland were mainly (roughly 90%) used for hiring researchers and research aids. The personnel hired at RUCL during 1990-1994 is presented in section 3. Some 110,000 FIM (roughly 10%) was used for travel, i.e. participation in scientific meetings, seminars, and the like. The international meetings where papers were read by members of RUCL are listed in section 6 below. In practice, RUCL spent no money of its own on publications or computer equipment.
In addition, the University of Helsinki has provided RUCL with a yearly allowance for expenses totalling some 80,000 FIM over the period 1990-1994. The University has also provided premises, telecommunications, and other indispensable infrastructure. The Faculty of Humanities has supported in the process of providing adequate computing facilites.
RUCL has also raised external funding. RUCL participated as associated partner in the ESPRIT II project SIMPR ("Structured Information Management, Processing, and Retrieval", ESPRIT II project no. 2083) in 1989-1992. Funding amounting to 3,5 million FIM was provided to RUCL by the Technology Development Centre of Finland (TEKES). The work done under SIMPR as well as the structure of the SIMPR consortium is described in Appendix 4. The SIMPR project was evaluated every six months by a committee consisting of Professor Henry King, London, Professor Peter Ingwersen, Copenhagen, and Professor Gerard Kempen, Nijmegen. Fred Karlsson was the leader of the RUCL-connected activities of SIMPR, Atro Voutilainen, Juha Heikkilä, and Arto Anttila were the principal researchers.
In 1992, a contract worth 0,6 million FIM was made between RUCL and HarperCollins Publishers, Glasgow, according to which RUCL undertook to tag and parse the whole "Bank of English", a large text repository containing 200 million words, using the computer programs developed at RUCL for analysing English. This is the largest tagging project so far undertaken in the world. The project was scheduled for the period February 1993 - February 1995 and was successfully completed according to the plan. Fred Karlsson led the project, Timo Järvinen did the actual work.
In 1987, the first of a series of contracts was made between IBM Finland and RUCL
concerning basic research on machine translation. The contracts were renewed on a
yearly basis in the period 1987-1992, totalling some 1,3 million FIM. From 1990 this subproject was lead by Lauri Carlson, the principal researchers were Maria Vilkuna, Krister
Lindén, and Tarja Heinonen.
3. Personnel
Here, the central members of RUCL during the period 1985-1994 are listed, including: (i) the rough duration of their affiliation with RUCL; (ii) the dates of their most recent exam and the university where it was taken; (iii) their scientific profile in outline; and (iv) the post-academic careers of those that have left RUCL after graduation, or after completion of other duties at RUCL (FK = Master of Arts, FL = Licentiate of Philosophy, FT = Ph.D.):
Arto Anttila, FK 1990, Research Associate 1989-1992 (English constraint grammar syntax). Admitted to the Stanford University Ph.D. programme in Linguistics 1992 (full tuition, which was obtained by 12 applicants out of 620).
Juhani Birn, FL 1989, Research Associate 1989-90, 1994 (1 1/2 years; Swedish morphology and syntax). Obtained a personal Research Assistantship from the Academy of Finland for the period 1990-1993. Submitted his Ph.D. thesis to the University of Helsinki in April, 1995.
Olli Blåberg, FT, University of Uppsala, Sweden, 1994, Research Associate 1985-1987 (Swedish morphology). Presently Managing Director of Lanser Data, Inc.
Lauri Carlson, Senior Researcher 1985-1993 (unification-based syntax, semantics, machine translation, dialogue games). Appointed Professor of Linguistic Theory and Translation at the University of Helsinki, Kouvola School of Translation Studies, from January 1, 1993 (took office on August 1, 1993).
Juha Heikkilä, FK 1990, Research Associate 1989-1994 (ENGTWOL morphology and lexicon, Ph.D. thesis on the description of idioms). Admitted to the Cambridge University (UK) Ph.D. programme in Linguistics from October, 1994.
Tarja Heinonen, FK 1992, Research Associate (Finnish syntax, bilingual translation lexicon). Awarded an ASLA grant in 1993 for postgraduate study at the University of Delaware.
Silja Huttunen, FK 1994, Research Assistant 1992-1994 (Finnish text corpus, morphological description of spoken Finnish). Employed as researcher at ValTer, the Multilingual Term Bank for Public Administration, from 1995.
Kristiina Jokinen, FK 1987, FL 1990, Research Associate 1986-1990 (derivational morphology, compounding, unification-based grammar formalisms). Affiliated with Manchester University, Department of Linguistics (in connection with an ESPRIT II -project), 1990-.
Timo Järvinen, Candidate of Humanities 1994, Research Assistant 1993-1994 (tagging the "Bank of English"). Admitted as a Ph.D. student to the new Ph.D. programme "Kielellinen merkitys ja sen prosessointi" (Linguistic Meaning and its Processing) for the period 1995-1999.
Fred Karlsson, Director of RUCL 1985-1994 (alongside ordinary duties as Professor of General Linguistics, University of Helsinki; theory of robust language-independent parsing, description of Finnish and Swedish morphology and syntax).
Kimmo Koskenniemi, Senior Researcher 1985-1991 (two-level morphology, finite-state syntax, theory and practice of computing). Appointed Professor of Computational Linguistics at the University of Helsinki, from May 1, 1992.
Johan Lilius, programmer 1985-1987, Doctor of Technology 1994 (Helsinki University of Technology).
Krister Lindén, FK 1988, Senior Computer Officer 1986-1992 (machine translation, lexicon design). Appointed Managing Director of Lingsoft, Inc., 1992-.
Taina Rantala, HSO-secretary, secretary (20 h/w) 1985-1994.
Jussi Piitulainen, student, Research Assistant 1994 (categorial grammar, implementation of finite-state automata).
Kari Pitkänen, FK 1994, Research Assistant 1991-1994 (compiling corpora and updating morphological analyzers and lexicons, mainly Swedish and Finnish). Admitted as a Ph.D. student to the new Ph.D. programme "Kielellinen merkitys ja sen prosessointi" (Linguistic Meaning and its Processing) for the period 1995-1999.
Sari Salmisuo, FK 1993, Research Assistant 1993-1994 (compilation of Finnish text corpus). Research Assistant at the Research Unit for Multilingual Language Technology, University of Helsinki, 1995.
Pirkko Suihkonen, FT 1990, Research Associate 1994 (4 months; text bank design, corpus work on several languages). Postdoctoral studies at Department of Linguistics, University of California, Los Angeles, in 1993 and in the fall of 1994.
Pasi Tapanainen, FK 1991, FL 1992, Computer Officer 1987-1993 (theory and practice of computing, especially finite-state automata). Appointed Research Associate at the new Rank Xerox Research Centre, Grenoble, 1993-1995.
Ilkka Westman, Senior Computer Officer 1987-1994 (hardware and system administration, network support, microcomputer support, corpus server administration, for RUCL, the Department of General Linguistics, and several other departments).
Liisa Vilkki, FK 1991, Research Associate (Russian morphology), obtained a personal Research Assistantship from the Academy of Finland for the period 1993-1995.
Maria Vilkuna, FT 1987, Research Associate 1987-1992 (lexical desription for machine translation purposes). Obtained a personal Senior Researchership from the Academy of Finland for the period 1992-1995 with affiliation at RUCL. Permanent appointment as Special Researcher (Finn. erikoistutkija) at The Research Center for Domestic Languages, Helsinki, from April, 1995.
Atro Voutilainen, FT 1994, Research Associate 1989-1994 (morphological disambiguation of English, parsing theory, especially finite-state syntax as applied to English).
In addition, a number of students and assistant personnel have been affiliated with
RUCL and its external projects, mostly part-time and with tasks relating to corpus analysis, scanning, or correction (mainly English, Swedish, Finnish, and Russian): Melina
Bister (Swedish corpus analysis), Hanna Graeffe (Swedish corpus analysis), Kea Kangas
(Finnish corpus analysis), Sirkku Keskinen (optical scanning), Maarit Kinnunen (computer
programming), Leena Korhonen (excerption of English dictionaries), Katinka von Kraemer
(lexicon excerption), Mikko Lounela (corpus analysis), Pirkko Paljakka (syntactic analysis
of English corpora), Tuula Peltonen (Russian corpus analysis), Aarne Ranta (relation between type theory and Constraint Grammar), Pekka Rantanen (English corpus analysis),
Kirsi Rissa (technical assistance), Petri Riukula (corpus analysis, computer service), Leena Savolainen (syntactic analysis of English corpora), Outi Sihvola (optical scanning, corpus correction), Pia Taskinen (Finnish corpus analysis), and Katariina Tossavainen (Finnish corpus analysis).
4. Main scientific results 1990-1994 in outline
Three overriding themes were set as goals for the period 1990-1994: (i) elaboration of the theory of morphosyntactic parsing, especially finite-state syntax; (ii) a theory of the lexicon, especially its integration with morphology and syntax and its use for parsing purposes; and (iii) an extension of the computational morphology and syntax to the domain of texts.
Our starting point was the considerable experience we have accumulated during more than a decade of research and application in the domains of multilingual morphological analysis and surface syntactic analysis. From the early beginnings we have had two overriding concerns. The first is to develop theories that are truly language-independent and that therefore can be applied to any language. Of course, proper modesty must be exercised in evaluating the feasibility of this objective. We have worked in depth on English (Voutilainen, Heikkilä, Anttila, Järvinen, Paljakka, Rantanen), Finnish (Karlsson, Carlson, Koskenniemi, Vilkuna, Heinonen, Pitkänen, Salmisuo), Swedish (Karlsson, Birn, Pitkänen, Blåberg), German (Majorin), and Russian (Vilkki, Peltonen). Therefore we have some substantial evidence in favour of our claim of language-independence for the theories and models we have developed so far. Our morphological and syntactic theories have been applied to many more languages than those mentioned, e.g. Arabic, Basque, French, and Swahili.
As our second basic aim we have postulated that the individual grammars, parsers, and computer programs implementing our theories must be such that they can be successfully applied to unrestricted text, i.e. to language corpora of any length. We have considerable experience in working with very large corpora drawn from several languages especially in the subarea of grammatical analysis (tagging, parsing). Positive evidence in favour of this claim is easy to cite. RUCL did the basic analytic work for the largest tagging project undertaken, the Bank of English, 200 million words), and also for the largest Swedish tagging project (the Stockholm-Umeå project tagging 1 million words of Swedish text).
The paper Karlsson (1990) formulates the basic principles of Constraint Grammar which is a language-independent formalism for writing parsing grammars. Given a Constraint Grammar description of a particular language, it can be plugged into the general Constraint Grammar parser and used for two purposes: (i) morphological disambiguation of multiply ambiguous word-form tokens, and (ii) surface syntactic analysis of word-form tokens. Both of these tasks are performed using the same kind of constraints, and the same kind of reductionistic parsing methodology. A full-scale Constraint Grammar has so far been designed for English by Voutilainen, Heikkilä, and Anttila. The Constraint Grammar framework and its application to English is documented in detail in the 430-page book Karlsson - Voutilainen - Heikkilä - Anttila, Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text, published by Mouton de Gruyter in Berlin and New York in January 1995 (see Appendix 5). This book is our main publication so far.
In morpholexical analysis based upon the Constraint Grammar approach, several utilities are needed:
(i) A large morpholexical description (for English, ENGTWOL). The morphological description follows the Two-Level paradigm introduced by Koskenniemi. For each lexical entry, all legitimate inflections, as well as the most productive derivational endings, are provided as continuation classes. The present English lexicon (recently updated by Timo Järvinen in the course of the Bank of English work) contains about 90,000 entries, which represent more than 1,200,000 concrete word form types. For the purposes of grammatical analysis (parsing), the lexicon also uses a collection of morphosyntactic tags (grammatical descriptors) for parts of speech, inflection, derivation, and even syntactic subcategorization. The lexical analyser is based on the two-level program by Koskenniemi, and the speed of analysis is about 1,000 words per second on a Unix workstation.
(ii) Tokeniser. This rule-based component contains about 8,000 linguistic rules for (a) the detection of punctuation marks, (b) identification of compounds, idioms and other multiword units that from the point of view of grammar "behave" like simple words, and (c) splitting enclitic words into grammatically motivated segments. The tokeniser is implemented as a simple and fast rewrite program.
(iii) Lexicon update tools. Because the ENGTWOL (or any other) lexicon cannot represent all words in the language, tools are needed for dealing with those 1-5% of word tokens not represented in the ENGTWOL Master lexicon. One alternative is updating the ENGTWOL lexicon: for this purpose Voutilainen designed a rule-based module that assigns the relevant lexical pointers to those base form -- part-of-speech duplets provided by the end-user. For each part of speech, a specific rewrite rule set was implemented.
(iv) A component for the analysis of unknown words. Another possibility is to heuristically assign ENGTWOL-style morphological analyses to unknown word tokens in running text, before entering the lexically analysed (ambiguous) sentence into parsing. The rule-based component assigns one or more alternative morphological analyses to words not recognised by ENGTWOL.
After morphological analysis, 35-50% of all words in running English text emerge as ambiguous, i.e. almost half of all words are provided with more than one analysis. Usually only one alternative fits in context. For choosing the correct alternative, a disambiguation grammar was written by Atro Voutilainen. The grammar contains linguistic constraints on the linear order of morphological tags. Whenever the context specification of an ambiguity-forming tag is satisfied, the tag is discarded as contextually illegitimate. In this way, what `survives' the grammar is (optimally) the unambiguous and desired analysis.
The present grammar contains 1,200 `grammar-based' and 200 `heuristic' constraints. Together, they make about 97% of all words unambiguous, with an error rate of 0.5%. Compared to its best known competitors (at least 15 statistical systems), the error rate of the ENGCG morphological disambiguator is only a fraction.
Kimmo Koskenniemi developed a theoretical framework for surface-oriented syntactic parsing using finite-state techniques. Practical experiments laid the foundation of the "Finite-state Syntax" (sometimes called "Finite-state Intersection Grammar" or FSIG) and was presented in the COLING-90 paper (Koskenniemi 1990). This work was then followed by Pasi Tapanainen's Master of Science thesis on a compiler and parser for FSIG (1991) and Licentiate of Philosophy thesis (1993) which elaborated further the same subject.
Intensive work on Finite State syntax started in 1991 as a collaboration between Kimmo Koskenniemi, Pasi Tapanainen, and Atro Voutilainen. Tapanainen was concerned with the algorithmic problems, while Voutilainen's main contribution lay (i) in the design of the appropriate grammatical representation, (ii) design of the methodology for the construction and evaluation of linguistic parsing descriptions, and (iii) research into the properties of the linguistic knowledge base (grammar and data-driven `heuristic' components) that is presupposed by an accurate parser of unrestricted English text.
A grammatical representation is (i) the set of grammatical descriptors and (ii) their application guidelines. A grammatical representation is documented as a grammar definition corpus and a coding manual. A grammar definition corpus contains a systematic collection of sentences that represent the grammatical constructions in the language. These sentences were derived from a 2,000-page comprehensive descriptive grammar by Quirk, Greenbaum, Leech and Svartvik (1985). Each sentence was semiautomatically annotated using the grammatical descriptors. Relevant documentation was also written for documenting the underlying descriptive policies. Also annotated running ordinary text should be included in a grammar definition corpus for representing those phenomena that occur in real texts but are inadequately covered in descriptive grammars (punctuation etc.). In collaboration with a research assistant (Pirkko Paljakka), over 200,000 words of text from scientific journals, newspapers, novels and technical manuals was annotated 1993-1994.
A grammar definition corpus is not only a crucial part of the specification of the grammatical representation; it can also be used for (i) identifying and diagnosing errors in grammar rules and (ii) as a "learning corpus" for the acquisition of frequency-based lexico-syntactic preferences that can be used for guiding the parser in the case of remaining structural ambiguity.
For the emerging English Finite-State grammar, a new grammatical representation was designed by Voutilainen. It is an extension of the syntactic representation used in the English Constraint grammar (ENGCG) in the following respects: (i) its grammatical coverage is wider (it accounts for certain infrequent constructions not adequately represented in ENGCG), (ii) clauses are identified more accurately, (iii) a more careful distinction is made between finite and nonfinite clauses, and (iv) the functional account is extended beyond the level of simple words, to various types of clause. The Finite-State representation assigns a functional dependency-oriented structure to sentences. It is designed to be expressive (i.e. the writer of the grammar rules can directly refer to the relevant grammatical categories) and resolvable (i.e. it avoids the introduction of certain grammatically genuine ambiguities).
In addition to two experimental grammars, Voutilainen has recently written the first part of an emerging comprehensive parsing grammar using the above-mentioned grammatical representation for the analysis of written Standard English of the British and American varieties. Most of the rules contain two parts: (i) the distributional category and (ii) all necessary context conditions for the distributional category. Whenever a distributional category described in the grammar is encountered in the ambiguous input sentence, the Finite-State parser checks that a necessary context is present. Unless this is the case, the reading is discarded as ungrammatical. In other words, also the Finite-State framework is based on a reductionistic (rather than additive) scheme.
Atro Voutilainen defended his Ph.D. dissertation in Helsinki in March 1994. The central themes are disambiguation and syntactic parsing of English, as just explicated.
Also a preliminary heuristic component has been written. Its function is the resolution of grammatically genuine ambiguities in favour of the most typical analysis, i.e. in the case of structural uncertainty, the parser consults past language experiences. In all, the present grammar represents all major syntactic constructions in present-day English. The current grammar needs further constraints for also making it more restrictive -- the present version sometimes gives more analyses for a sentence than is desirable. It should also be noted that work on the parsing algorithm is still underway; e.g. Jussi Piitulainen is investigating techniques for optimising the parsing process. As a result of this work, the parser is expected to become capable of successfully analysing also very long and complex sentences, which ability the present version sometimes lacks, whatever the status of the sentence is with regard to grammaticality.
In 1993, Voutilainen designed and implemented a system for the analysis and extraction of simple noun phrases from English texts. The system, known as NPtool, uses the ENGCG morphological disambiguator as the first module. On top of this module, a very shallow syntactic parser was designed. This syntactic representation uses only seven tags mainly for identifying modifiers, nominal heads, adverbials and verb chains. The representation is implemented as a hybrid set of Constraint Grammar and Finite-State rules that make 93-96% of all words syntactically unambiguous, with an error rate of less than 1%.
NPtool also employs a Finite-State mechanism for the extraction of noun phrases (or other low-level constructs) from the syntactically analysed sentences. The extractor also uses a mechanism for processing remaining ambiguity in the analysis of the extracted constructions, distinguishing between ambiguously and unambiguously analysed constructions. Overall, NPtool is capable of recognising 98,5-100% of all simple noun phrase tokens in running text, producing an excess of `false positive hits' that is some 3-5% (most of even these false positives are identified as suspect by the ambiguity handling mechanism). The best known competitors recognise up to 95% of all simple noun phrase tokens in texts, so NPtool seems to be the most reliable known noun phrase recogniser. The present speed of NPtool is somewhat over 100 words per second on a Unix workstation. Optimisation is possible.
In conjunction with his work on parsing descriptions, Voutilainen has investigated various heuristic techniques that can be used for resolving those ambiguities left pending by the strict grammar rules. Two deserve mention.
(i) Automatically extending the lexicon with multiword entries. Certain structural ambiguities can be resolved almost without errors if the lexicon contained a (very large) number of compound entries for word sequences like `tea time' that can be ambiguous in isolation. A technique based on surface grammatical analysis and corpus-based frequencies has been developed and evaluated. Using this technique, very large extensions can be made to the Master Lexicon with very little human intervention; and in this computationally very inexpensive manner the hard disambiguation problem can be lightened considerably. The generation of multiword entries presupposes large text corpora, e.g. the 200-million-word Bank of English Corpus available for research in Helsinki.
(ii) Learning from partial analyses. Standard probabilistic techniques often base their predictions on statistics derived from manually tagged corpora. If this learning corpus differs from the analysed text, the probabilistic system's reliability on the analysed text becomes poorer. An alternative option is to base the statistics on the partially analysed text itself, and use those statistics for resolving the remaining ambiguities. Experiments with this idea are reported in Voutilainen's doctoral dissertation. This approach can also be generalised to higher levels of structural analysis.
Karlsson (1992) designed SWETWOL, a comprehensive morpological analyzer for Swedish based on Svenska Akademiens ordlista and containing some 50,000 lexical entries. The work has been continued especially by Juhani Birn and Kari Pitkänen.
Within the framework of Swedish Constraint Grammar (SWECG), Birn has worked on the development of the SWETWOL. As SWETWOL is to be put to use in SWECG, the goal of the system being the assignment of syntactic functions to words in sentences, he has systematized some aspects of the SWETWOL lexical feature assignments for the purpose of optimally consistent syntactic analysis. In a report entitled 'A syntax-geared approach to determiners and pronouns in Swedish Constraint Grammar', Birn introduces into SWETWOL the distinction between determiners (DET) and pronouns (PRON), the latter category implying head status. The category of articles (ART) is dispensed with as a major part of speech, and so is the category of numerals (NUM). Categories not treated as major (syntactic) parts of speech may have the status of additional lexical features. The lexical description has been pruned of some unresolvable ambiguities, which makes for more efficient morphological disambiguation. The lexical treatment of participles, adverbs, verbs, and subordinators has also been under scrutiny. The set of syntactic labels to be used in SWECG has been evolving during the lexical work.
Birn is finalizing his Ph.D. thesis "En teori om funktionell satsledsanalys från vänster till höger" which is to be submitted in April 1995. In the thesis, Birn presents a theory of functional analysis of sentential constituents (such as noun phrase). The theory consists of a formalism for the grammatical representations needed as informational basis for the functional analysis of sentential constituents and, based on the representation formalism, a set of formalized rules for left-to-right-based functional identification of such constituents. The notion "partially specified syntactic label" plays an important role.
Lauri Carlson concentrated his work on unification based machine translation. Originally unification was a method of matching terms in logical formulas for automated theorem proving. It has become generally known through its use in the programming language Prolog. Prolog's term unification has since evolved in linguistics into unification of labeled attribute value graphs. Unification represents a reasonable compromise between the two central goals of computational linguistics: (i) linguistically correct and illuminating representations of linguistic facts and generalizations, and (ii) computationally efficient application of that information to the processing of natural language.
In a large monograph manuscript (contents enclosed as Appendix 6), Carlson showed how the structures of both languages in a language-pair may be represented and
processed in the same way, and how bidirectional translation is possible using the same
descriptions and representations of the languages involved.
5. Corpus work
Considerable effort has been spent on collecting and tagging corpora in various languages, especially English, Finnish, Swedish, and Russian. The Bank of English project has already been mentioned above. Timo Järvinen's report on the BofE project is enclosed as Appendix 7, Kari Pitkänen's and Sari Salmisuo's reports on some of the Swedish and Finnish tagging work as Appendix 8, and Tuula Peltonen's report on the Russian corpus project (in collaboration with the Department of Slavic languages, University of Helsinki) as Appendix 9. Pirkko Suihkonen has established corpora in several Finno-Ugric languages, e.g. Estonian, Carelian, Vepsian, Mari, Khanti and Komi. The assemble of corpora that now are at the disposal of linguistic researchers at the Department of General Linguistics in Helsinki belongs to the largest at least in Europe. An overview of the total corpus resources is give in Appendix 10.
Over the years, RUCL has taken considerable pains in order to make its corpus resources publicly available. All the corpora are stored in one computer and are available over the Internet network to every registered user. This service is called the University of Helsinki Language Corpus Server (UHLCS). Presently, we have more than 300 registered users from most Finnish universities and several universities abroad. A description of the registered users is enclosed as Appendix 11.
RUCL has taken part in three major corpus tagging projects: the Bank of English
(200 million words, tagged by our programs) in collaboration with the COBUILD group at
the Department of English, University of Birmingham (UK), the Stockholm-Umeå Swedish
text corpus project (1 million words, tagged by our SWETWOL program) in collaboration
with the Departments of Linguistics at University of Stockholm and University of Umeå,
and the FISC corpus (a Finland Swedish text corpus of 2,5 million words) in collaboration
with the Department of Nordic Languages, University of Helsinki. Kari Pitkänen has been
an important advisor of the FISC project, and Fred Karlsson has participated in its Steering Group.
6. Participation in international meetings
During 1990-1994, RUCL work has been presented at the following international conferences with refereed papers by:
- ESPRIT'90 Conference, Brussels, 1990 (Atro Voutilainen)
- 13th International Conference on Computational Linguistics, University of Helsinki, 1990
(Fred Karlsson, Kimmo Koskenniemi, Lauri Carlson, Maria Vilkuna)
- The Second Nordic Conference on Text Comprehension in Man and Machine, Stockholm, 1990 (Fred Karlsson)
- Directions in Corpus Linguistics. Nobel Symposium 82, Stockholm, 1991 (Fred Karlsson)
- Svenskans beskrivning 18, Universitetet i Lund, 1991 (Juhani Birn)
- Ninth National Conference on Artificial Intelligence (AAAI-91), Anaheim, California, 1991
(Fred Karlsson, Juha Heikkilä, Atro Voutilainen, Arto Anttila)
- 14th International Conference on Computational Linguistics, University of Nantes, 1992
(Kimmo Koskenniemi, Pasi Tapanainen, Atro Voutilainen)
- The 1993 Joint International Conference. The Association for Computers and the Humanities & The Association for Literary and Linguistic Computing, Georgetown University,
Washington DC (Juha Heikkilä, Atro Voutilainen)
- The Sixth Conference of the European Chapter of the Association of Computational Linguistics, Utrecht, 1993 (Pasi Tapanainen, Atro Voutilainen)
- The Workshop on Very Large Corpora, Ohio State University, 1993 (Atro Voutilainen)
- 9:e Nordiska Datalingvistikdagarna, University of Stockholm, 1993 (Atro Voutilainen)
- ICAME 1993, Zürich, 1993 (Juha Heikkilä, Atro Voutilainen)
- Fourth ACL Conference on Applied Natural Language Processing, Stuttgart, 1994 (Atro
Voutilainen, Pasi Tapanainen)
- 15th International Conference on Computational Linguistics, University of Kyoto, 1994
(Timo Järvinen, Pasi Tapanainen, Atro Voutilainen)
- The Seventh Conference of the European Chapter of the Association for Computational
Linguistics, University of Dublin, 1995 (Timo Järvinen, Atro Voutilainen, Pasi Tapanainen)
7. Invited lectures
- University of Aarhus, Department of Linguistics, March 1991 (Fred Karlsson, 4 hours)
and November 1993 (Fred Karlsson, 6 hours)
- University of Pennsylvania, Department of Linguistics and Phonetics, September 1992
(Atro Voutilainen)
- University of Oslo, Department of Linguistics and Information Sciences, October 1993
(Fred Karlsson, 6 hours)
- Universidad del Pais Vasco, Informatika Fakultatea, Donostia (Spain), October 1994
(Atro Voutilainen, 4 hours)
- Rank Xerox Research Centre, Grenoble Laboratories, Meylan, France, November 1994
(Atro Voutilainen, Kimmo Koskenniemi)
- University of Stockholm, Department of Linguistics, November 1994 (Atro Voutilainen,
4 hours)
- Universität München, Centrum für Informatik, February 1995 (Fred Karlsson, 4 hours)
8. Computer programs, algorithms, and lexicons
- a parser driver for Constraint Grammar in Common Lisp, Fred Karlsson
- a parser driver for Constraint Grammar in C, Pasi Tapanainen
- a parser driver for finite-state intersection grammar in C (Kimmo Koskenniemi, Pasi Tapanainen)
- an extractor of noun phrases in unix and flex (Atro Voutilainen)
- a machine-readable English lexicon (ENGTWOL) (Juha Heikkilä, Atro Voutilainen, updated by Timo Järvinen)
- a constraint set for morphological disambiguation of English (Atro Voutilainen)
- a constraint set for syntactic analysis of English (Arto Anttila, updated by Timo Järvinen)
- a constraint set for English noun phrase syntax (Atro Voutilainen)
- a machine-readable Swedish lexicon (SWETWOL) (Fred Karlsson)
- a constraint set for morphological disambiguation of Swedish (Fred Karlsson, updated
by Kari Pitkänen)
- a constraint set for morphological disambiguation of Finnish (Fred Karlsson)
- preprocessors for English (Atro Voutilainen), Finnish (Kimmo Koskenniemi, Kari Pitkänen), and Swedish (Timo Järvinen, Kari Pitkänen)
9. Recognition
On August 16-25, 1990, RUCL and the Department of General Linguistics arranged the 13th International Conference on Computational Linguistics (COLING-90) in Helsinki. There were more than 550 participants from almost 40 countries. COLING is the major international conference series in the field of computational linguistics. RUCL was selected for arranging the 1990 COLING in 1987. The programme of the COLING-90 conference is enclosed as Appendix 12. The Proceedings presented to the 13th International Conference on Computational Linguistics were edited by Hans Karlgren and published by RUCL in Helsinki (three volumes, more than 1,000 pages). The proceedings are distributed by the Association for Computational Linguistics, New Jersey, USA.
The Sixth Conference of the European Chapter of the Association for Computational Linguistics was arranged in Utrecht, The Netherlands, in May, 1993. There Atro Voutilainen and Pasi Tapanainen from RUCL were, in a competition of 55 papers, awarded the first Don Walker Prize for the best paper read at the conference. Voutilainen and Tapanainen's paper was entitled "Ambiguity Resolution in a Reductionistic Parser".
In February, 1994, the Central Board of the Academy of Finland (Tieteen keskustoimikunta) gave RUCL the status of Center of Excellence for 1994. In October, 1994, the Academy of Finland renewed the Center of Excellence status of RUCL for 1995. On the basis of these two nominations, the University of Helsinki gets additional funding from the Ministry of Education ("tuloksellisuusmääräraha") for 1995 and (probably) for 1996.
In March, 1994, The Board of the University of Helsinki (konsistori) founded the Research Unit for Multilingual Language Technology as a Center of Excellence for a five-year period from August 1, 1994. This research unit continues the work of RUCL.
When Ph.D. programmes ("Graduate Schools") were planned in Finland in mid
1994, Fred Karlsson was one of the two principal planners of the programme "Linguistic
Meaning and its Processing"). When decision were made in late 1994, this programme
was the largest programme founded in the humanities in Finland. RUCL has participated
in planning and implementing the initial curriculum for 1995.
10. Special international tasks 1990-1995
Lauri Carlson: Referee for the EACL (European Association of Computational Linguistics) 1992 meeting. Member of the Editorial Board of Nordic Journal of Linguistics 1984-. Expert member of the LRE project DELIS ("Descriptive Lexical Specifications and Acquisition Tools for Lexicon Building") 1991-. Participation in the EU EAGLES project ("Expert Advisory Groups on Language Engineering Standards") 1992-.
Fred Karlsson: Elected member of Academia Europaea, 1988-. Chairman of the Organizing Committee of the 13th International Conference on Computational Linguistics, Helsinki 1990 (550 participants). Member of the Steering Group of the ESPRIT II project SIMPR (No. 2083) 1989-1992. Member (representing Finland) of the International Committee of Linguists. Member of the Editorial Boards of Linguistic Abstracts, Oxford, and Journal of Natural Language Engineering, Cambridge (UK). Referee for IJCAI (International Joint Conference for Artificial Intelligence) 1994, 1996.
Atro Voutilainen: Member of the program committee of the SIGDAT Workshop organised by the European Chapter of the Association of Computational Linguistics, March
27, 1995, Dublin, Ireland. Member of the syntactic annotation subgroup of Phase II (1.2. -
31.7. 1995) of EAGLES. Member of the Organizing Committee of 9 nordiska datalingvistikdagarna, Helsinki 1995.
11. Central publications 1990-1995
The disposition in this section follows the requirements of the Academy of Finland. Articles published in the Proceedings of the 13th International Conference on Computational
Linguistics (COLING-90, arranged by RUCL in Helsinki 1990) are listed under 1.1 because COLING is a strictly refereed forum and these Proceedings are distributed by the ACL
(Association of Computational Linguistics).
11.1. Publications abroad
11.1.1. Monographs published abroad
Karlsson, Fred, Voutilainen, Atro, Heikkilä, Juha and Anttila, Arto
-- [1995]. Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Berlin/New York: Mouton de Gruyter. 430 pp.
11.1.2. Articles in books published abroad (not "Proceedings")
Carlson, Lauri
-- [1994]. "Language as a Game". In R. E. Asher, ed., The Encyclopedia of Language and Linguistics, Oxford/New York/Seoul/Tokyo: Pergamon Press, pp. 1975-1977.
Karlsson, Fred
-- [1992]. "Comments on Professor John Sinclair's Paper 'The Automatic Analysis of Corpora'". In J. Svartvik, ed., Directions in Corpus Linguistics, Berlin/New York: Mouton de Gruyter, pp. 398-400.
-- [1992]. "Finnish". In W. Bright, ed., International Encyclopedia of Linguistics, New York/Oxford: Oxford University Press, Vol. 2, pp. 14-17.
-- [1993]. "Robust Parsing of Unconstrained Text". In P. de Haan and N. Oostdijk, eds., Corpus-based syntax, Amsterdam and Atlanta: Rodopi, pp. 89-113.
-- [1994]. "Computational Morphology". In R. E. Asher, ed., The Encyclopedia of Language and Linguistics, Oxford/New York/Seoul/Tokyo: Pergamon Press, pp. 2570-2573.
Koskenniemi, Kimmo
-- [1991]. "A Discovery Procedure for Two-Level Phonology". In L. Cignoni and C. Peters, eds., Computational Lexicology and Lexicography: A Special Issue Dedicated to Bernard Quemada, Vol. I, Pisa, pp. 451-46
-- [1992]. "Computational Morphology". In W. Bright, ed., International Encyclopedia of Linguistics, Vol. I, New York/Oxford: Oxford University Press, pp. 291-293.
Voutilainen, Atro and Heikkilä, Juha
-- [1994]. "An English Constraint Grammar (ENGCG): A Surface-Syntactic Parser of
English". In Udo Fries, Gunnel Tottie and Peter Schneider, eds., Creating and Using
English Language Corpora, Amsterdam and Atlanta, Rodopi, pp. 189-199.
11.1.3. Articles in "Proceedings" published abroad
Birn, Juhani
-- [1991]. "Diskontinuerlighet som strukturenlig företeelse". In Thelander M., Gunnarsson, B-L., Hammermo, O., Josephson, O., Liberg, C., Nordberg, B. and Östman, C., eds., Svenskans beskrivning 18, Lund: Lund University Press, pp. 75-86.
Carlson, Lauri and Vilkuna, Maria
-- [1990]. "Independent Transfer Using Graph Unification". In H. Karlgren, ed., Proceedings of the 13th International Conference of Computational Linguistics, Vol. 3, Helsinki, pp. 60-66
Heikkilä, Juha and Voutilainen, Atro
-- [1993]. "ENGCG: An Efficient and Accurate Parser for English Texts". In Conference Abstracts. The 1993 Joint International Conference. The Association for Computers and the Humanities & The Association for Literary and Linguistic Computing, June 16-19, 1993, Georgetown University, Washington DC, pp. 67-70.
Järvinen, Timo
-- [1994]. "Annotating 200 Million Words: The Bank of English Project". In Proceedings of Coling-94, ed. COLING 94 Organizing Committee, Kyoto, Japan, pp. 565-568.
Karlsson, Fred
-- [1990]. "Constraint Grammar as a Framework for Parsing Unrestricted Text". In H. Karlgren, ed., Proceedings of the 13th International Conference of Computational Linguistics, Vol. 3, Helsinki, pp. 168-173.
-- [1990]. "Parsing Text in terms of Constraint Grammar". In Ö. Dahl and K. Fraurud, eds., Papers from the Second Nordic Conference on Text Comprehension in Man and Machine, Stockholm: University of Stockholm, Department of Linguistics, pp. 85-94.
Karlsson, Fred, Voutilainen, Atro, Heikkilä, Juha and Anttila, Arto
-- [1991]. "Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text, with an Application to English." In Natural Language Text Retrieval: Workshop Notes from the Ninth National Conference on Artificial Intelligence (AAAI-91), Anaheim, California, 4 pp.
Koskenniemi, Kimmo
-- [1990] "Finite-State Parsing and Disambiguation". In H. Karlgren, ed., Proceedings of the 13th International Conference of Computational Linguistics, Vol. 2, Helsinki, pp. 229-232.
Koskenniemi, Kimmo, Tapanainen, Pasi and Voutilainen, Atro
-- [1992]. "Compiling and Using Finite-State Syntactic Rules". In Proceedings of the 14th International Conference on Computational linguistics, Vol. I, Nantes, pp. 156-162.
Smeaton, Alan, Voutilainen, Atro and Sheridan, Paraig
-- [1990]. "The Application of Morpho-Syntactic Language Processing to Effective Text Retrieval". In ESPRIT'90 Conference Proceedings, Commission of the European Communities, Amsterdam: Kluwer Academic Publishers, pp. 619-63
Tapanainen, Pasi and Järvinen, Timo
-- [1994]. "Syntactic Analysis of Natural Language Using Linguistic Rules and Corpus-based Patterns". In Proceedings of Coling-94, ed. COLING 94 Organizing Committee, Kyoto, Japan, pp. 629-634.
Tapanainen, Pasi and Voutilainen, Atro
-- [1994]. "Tagging accurately - Don't guess if you know". In Proceedings of the Fourth ACL Conference on Applied Natural Language Processing, ACL, Stuttgart, pp. 47-52.
Voutilainen, Atro
-- [1993]. "NPtool, a Detector of English Noun Phrases". In Proceedings of the Workshop on Very Large Corpora, Ohio State University, pp. 42-51.
-- [1994c]. "A Noun Phrase Parser of English". In R. Eklund, ed., Proceedings of `9:e Nordiska Datalingvistikdagarna', Dept. of Linguistics, Computational Linguistics, Stockholm University, pp. 301-310.
-- [1995]. "A Syntax-Based Part of Speech Analyser". In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin: Association for Computational Linguistics, pp. 157-164.
Voutilainen, Atro and Heikkilä, Juha
- [1994]. "An English Constraint Grammar (ENGCG): A Surface-Syntactic Parser of English". In U. Fries, G. Tottie and P. Schneider, eds., Creating and Using English Language Corpora, Amsterdam and Atlanta: Rodopi, pp. 189-199.
Voutilainen, Atro and Järvinen, Timo
-- [1995]. "Specifying a Shallow Grammatical Representation for Parsing Purposes". In Proceedings of the Seventh Conference of the European Chapter of the Association for Computational Linguistics, Dublin: Association for Computational Linguistics, pp. 210-214.
Voutilainen, Atro and Tapanainen, Pasi
-- [1993]. "Ambiguity Resolution in a Reductionistic Parser". In Proceedings of the
Sixth Conference of the European Chapter of the Association for Computational Linguistics 1993, Utrecht, pp. 394-403.
11.1.4. Articles in refereed series published abroad
Karetnyk, David, Karlsson, Fred and Smart, Godfrey
-- [1991]. "Knowledge-Based Indexing of Morphosyntactically Analysed Language". Expert Systems for Information Management 4:1, pp. 1-30.
Karlsson, Fred
-- [1992]. "SWETWOL: A Comprehensive Morphological Analyzer for Swedish". Nordic Journal of Linguistics 15, pp. 1-4
Magnuson, Tina, Granström, Björn, Carlson, Rolf and Karlsson, Fred
-- [1990]. "Phonetic Transcription of a Swedish Morphological Analyzer". Phonum 1,
pp. 58-61.
11.1.6. Text books published abroad
Karlsson, Fred
-- [1991]. Gramática básica del finés. Translated by Ursula Ojanen & al. Ediciones
de la Universidad Autónoma de Madrid, Colección de Estudios. Madrid. 287 pp.
11.2. Publications published in Finland
11.2.2. Articles in books published in Finland
Carlson, Lauri
-- [1990]. "Kielitiede ja konekääntäminen". In S. Stark and K. Tyystjärvi, eds., Ajattelevatko koneet? - Tekoäly, tietotekniikka ja robotiikka, Helsinki: Suomen tekoälyseuran julkaisuja
Carlson, Lauri and Honkela, Timo
-- [1993]. "Luonnollisen kielen käsittely". In E. Hyvönen, M. Karanta and M. Syrjänen, eds., Tekoälyn ensyklopedia, Helsinki: Gaudeamus, pp. 233-243.
Karlsson, Fred
-- [1993]. "Kielitiede". In E. Hyvönen, M. Karanta and M. Syrjänen, eds., Tekoälyn
ensyklopedia, Helsinki: Gaudeamus, pp. 47-52.
11.2.4. Articles in refereed series published in Finland
Anttila, Arto
--[1990]. "Three Approaches to Describing Conversion". In K. Jokinen and J-O Östman, eds., SKY 1990. Largely Lexical. The 1990 Yearbook of the Linguistic Association of Finland, Helsinki: Suomen kielitieteellinen yhdistys, pp. 111-128.
Birn, Juhani
-- [1990]. "Strikt ytbaserad processyntax". In Andersson E. and Sundman M., eds., Svenskans beskrivning 17, Åbo: Svenska institutionen vid Åbo Akademi pp. 73-83.
Carlson, Lauri
-- [1990]. "Design of Unification-Based Transfer Lexicon". In K. Jokinen and J-O Östman, eds., SKY 1990. Largely Lexical. The 1990 Yearbook of the Linguistic Association of Finland, Helsinki: Suomen kielitieteellinen yhdistys, pp. 65-76.
-- [1993]. "Dialogue Games with Finnish Clitics". In S. Shore and M. Vilkuna, eds., SKY 1993. The 1993 Yearbook of the Linguistic Association of Finland, Helsinki: Suomen kielitieteellinen yhdistys, pp. 73-96.
Jokinen, Kristiina
-- [1990]. "Derivation and the Two-Level Model". In K. Jokinen and J-O Östman, eds., SKY 1990. Largely Lexical. The 1990 Yearbook of the Linguistic Association of Finland, Helsinki: Suomen kielitieteellinen yhdistys, pp. 9-18.
Karlsson, Fred
-- [1993]. "Jaakon painia". Virittäjä 97:4, pp. 676-677.
Vilkuna, Maria
-- [1990]. "Unification-Based Lexical Transfer". In K. Jokinen and J-O Östman, eds.,
SKY 1990. Largely Lexical. The 1990 Yearbook of the Linguistic Association of Finland, Helsinki: Suomen kielitieteellinen yhdistys, pp. 49-64.
11.2.5. Dissertations published in Finland
Voutilainen, Atro
-- [1994]. Designing a Parsing Grammar. Publications of the Department of General Linguistics, University of Helsinki, No. 22, 79 pp.
-- [1994]. Three Studies of Grammar-Based Surface Parsing of Unrestricted English
Text. Publications of the Department of General Linguistics, University of Helsinki,
No. 24, 36 pp. [summary of Voutilainen (1994a) and the chapters written by him in
the monograph listed under 1.1]
11.2.6. Text books published in Finland
Karlsson, Fred
-- [1994]. Yleinen kielitiede. Helsinki: Yliopistopaino. 302 pp.
-- [1994]. Fenlanyu yufaxue. Suomen peruskielioppi kiinaksi. Helsinki: Yliopistopaino. 348 pp.
Karlsson, Fred and Koskenniemi, Kimmo
-- [1990]. Beta-ohjelma kielentutkijan apuvälineenä. Helsinki: Yliopistopaino. 48 pp.
Karlsson, Fred, Koskenniemi, Kimmo and Kukkonen, Pirkko
-- [1992]. Kielitieteellisen analyysin harjoituksia. Helsinki: Yliopistopaino. 78 pp. [2nd
edition 1993]
11.2.7. Non-refereed publications published in Finland
Karlsson, Fred
-- [1994]. Linguistics in the Light of Citation Analysis. Publications of the Department of General Linguistics, University of Helsinki, No. 23. 24 pp.
Voutilainen, Atro, Heikkilä, Juha and Anttila, Arto
-- [1992]. Constraint Grammar of English. A Performance-Oriented Introduction. Department of General Linguistics, University of Helsinki, Publications Number 21. 83
pp.
11.3. Publications in print
Birn, Juhani
[forthcoming 1995] "A Syntax-Geared Approach to Determiners and Pronouns in Swedish Constraint Grammar". In Fred Karlsson, ed., Computational morphosyntax: Report on Research 1989-1994. 60 pp.
Karlsson, Fred and Karttunen, Lauri
- [forthcoming 1995]. "Subsentential Parsing". In Annie Zaenen, ed., Survey of the State of the Art in Speech and Natural Language Processing. 4 pp.
-- [forthcoming 1995]. "Morphological Defectivity". In Laurie Bauer, ed., Handbook of Morphology. Berlin and New York: Mouton De Gruyter. 15 pp.
Kytö, Merja and Voutilainen, Atro
-- [forthcoming 1995]. "Applying the Constraint Grammar Parser of English to the Helsinki Corpus". ICAME Journal 19, Bergen: Norwegian Computing Centre for the Humanities. 25 pp.
-- [forthcoming 1995]. "Developing the English Constraint Grammar for the Analysis of English Historical Texts". In Abstracts for the 12th International Conference on Historical Linguistics 13-18.8.1995. ICHL, University of Manchester, 1 p.
Voutilainen, Atro
-- [forthcoming 1995]. "The Design of a (Finite-state) Parsing Grammar". In Emmanuel Roche and Yves Schabes, eds., Finite State Devices for Natural Language Processing. Cambridge, Mass.: The MIT Press. 30 pp.
-- [forthcoming 1995]. "A System for the Extraction of Noun Phrases from English Text". Journal of Natural Language Engineering 1:2. 35 pp.
-- [forthcoming 1995]. "Creating a Dependency-Oriented Parse Bank for English" In Fred Karlsson, ed., Computational morphosyntax: Report on Research 1989-1994. 30 pp.