[General Linguistics | Teaching 1999-2000]
Language Technology conducts research and develops methods for processing of natural language. People communicate, process and store knowledge and experience in natural language, as speech or as text. We tend to think of language as simple and transparent. The reality is different: although language has regularities, it is also extremely ambiguous. This fact becomes clear when language technology tries to teach or program a computer to understand or produce natural language.
The description of a language like Finnish or English for parsing or generation requires good mastery of the language and its structure. Language technology also poses many challenges for computation. Language technology is an interesting and inspiring interdisciplinary field for students interested in language both from humanistic and technical points of view.
Language technology is one of the two subjects of the Department of General Linguistics. Students in Language technology obtain many-sided theoretical insights and skills in the field and its methods and specialise in some of its many domains of application, for instance
The subject leads to a Master of Arts degree in Language Technology. Due to the spread of information and communications technology in all fields, the mass of available information is growing explosively, and degrees in Language Technology are in great demand. The program aims at arranging for most students to do their Masters thesis as paid work for a company (a potential employer).
In addition to applications, one can also specialise in Language Technology research and development. A considerable number of research and development projects are on the way within a national research program. Students can work for projects already at undergraduate level. A student with an undergraduate degree can apply for graduate study, which is being organised as a joint undertaking with other Nordic countries.
To expand opportunities of study in Language Technology, a network of Studies in Language Technology has been founded as a cooperation between a number of departments in different universities. The network is nationwide and it includes departments and laboratories from eight universities, including two technical universities. The network aims to expand and develop university studies in language technology to meet present and future demands. This is done by including language technology into degrees in neighboring disciplines as courses or as a minor subject, and by expanding the study of language technology as a major subject. The network gets special funding from the Ministry of Education in 2001-2003.
Units involved in the network include University of Helsinki (Department of General Linguistics, Department of Phonetics, Center for Media Education, Department of Computer Science), University of Joensuu (Department of Foreign Languages / General Linguistics and Phonetics), University of Jyväskylä (Department of Applied Linguistics and Department of Computer Science, University of Oulu (Department of English), Technical University of Tampere (Department of Digital and Computing Technology, Department of Software Technology ja Signal Processing Laboratory), University of Tampere (Department of Philology I / German Philology, Department of Information Research, Department of Translation Studies, Department of Speech Sciences, Department of Finnish and General Linguistics, Department of Computer Science), Helsinki University of Technology (Laboratory of Acoustics and Sound Processing, Laboratory of Information Technology, Laboratory of Computing Technology) and Turku University (Department of Finnish and General Linguistics / Phonetics, Research Unit for Cognitive Neuroscience, Department of Classical and Romance Languages / Translation and Interpretation of French).
Studies in Language Technology consist of basic, intermediate, and advanced studies.
A minor subject in Computer Science is required as part of a Masters degree in Language Technology, taking up at least 15 credits, usually 25 credits (basic studies, or approbatur). Students with a background in languages can settle with a less extensive minor in computer science. In general, it is recommendable to include in the degree also intermediate studies in computer science (cum laude, 35 credits which includes the approbatur). The intended specialisation is good to take into account when choosing courses in computer science. Individual courses in computer science can also be included in Language Technology studies by agreement with the professor.
Students with basic studies in General Linguistics can directly continue into intermediate studies in Language Technology. A student who wants to specialise in both general linguistics and Language Technology can take language technology as a minor or vice versa, as long as courses with identical content are not included in the two studies. Alternatively, students can include a good number of General Linguistics courses into Language Technology studies. Basic studies in Language Technology include a number of methods courses and an introduction to Language Technology, as specified below. A student continuing from basic studies in General Linguistics places these courses within intermediate studies in Language Technology, if they are not covered by courses in other studies.
Students transferring from other subjects to Language Technology majors should note that they have to include basic studies in Computer Science in their degree. The degree usually includes courses in General Linguistics as well. Students who start in Language Technology do well to include these subjects in their studies from the start.
Individual agreement in combining requirements is possible especially in advanced and intermediate level. It is best to agree about combinations and changes in plans with the professor ahead of time. Combinations which have been accepted previously are recorded and available to students.
Language Technology as a minor subject is available within the Language Technology Studies network for students in many different fields, for instance, languages, translation, phonetics, cognitive science and computer science. Also individual courses in Language Technology can be included in those studies where applicable. Students majoring in Language Technology can include special courses in these or other subjects as specified below, or include them as minor subjects in their degree.
Language Technology offers courses in linguistic computing for all language students in the faculty. Such courses include Ctl130, Ctl160 and Ctl121. They can be included into the major subject (when allowed), or they can form a short minor subject in Language Technology comprising 10-14 credits.
There is an interim period of 2 study years, during which students can complete their studies according to either the old or the new requirements.
The basic studies and short minor subjects are compiled by prof. Carlson, and the intermediate and advanced ones by prof. Koskenniemi. When applying for registration, the student must supply a transcript from the study register detailing the courses to be included. The teacher responsible for a course passes students on the course in a test held in connection with the course or in one common repeat test. Any further queries concerning passing a course or requirement or getting credit in other ways should be directed to the person indicated as responsible for the requirement.
This section outlines the course of studies in Language Technology. Dependencies between different requirements are marked in italics.
The student must show for each section that they have acquired the relevant knowledge and skills to the extent indicated by the number of credits, either as courses in Language Technology or an equivalent. In the latter case, if the studies are included in some other degree requirement, the student must take a corresponding amount of other courses in Language Technology.
The model can be adapted individually by agreement. The requirements are detailed in the next section.
Ctl111 Short minor (10-14 credits) 490050-4
The short minor is intended for students of other departments who will not complete full length basic studies. The short minor must minimally include from basic studies Ctl100 the knowledge and skills required under points 2-3.
Ctl101 BASIC LINGUISTICS FOR LANGUAGE TECHNOLOGY (1 credits)
A concise introduction to general linguistics geared to Language Technology applications. Can be replaced by an introductory course of general linguistics. Responsible: prof. Carlson.
Ctl102 BASIC PHONETICS FOR LANGUAGE TECHNOLOGY (1 credits)
A concise introduction phonology geared to Speech Technology applications. Can be replaced by an introductory course in phonetics. Responsible: Martti Vainio.
Ctl103 EXERCISES IN MORPHOPHONOLOGY (1 credits)
Ctl104 EXERCISES IN MORPHOSYNTAX (1 credits)
Ctl105 EXERCISES IN SEMANTICS AND PRAGMATICS (1 credits)
The exercises can be replaced by the general linguistics courses Cyk 130,140,150 or as directed self study through the language technology studies network. Responsible: prof. Carlson.
Cyk106 EXERCISES IN PHONETICS (1 credits)
The exercises can be replaced by a practical course in phonetics or as directed self study through the language technology studies network. Responsible: Martti Vainio.
Ctl130 INTRODUCTION TO UNIX OPERATING SYSTEM (1 credits) 490130-9
The course does not presupposes prior computing skills and it gives basic skills in using the Unix operating system. Responsible: prof. Koskenniemi.
Ctl160 COMPUTER PROCESSING OF TEXT CORPORA (2 credits) 490160-0
A course and exercises for learning to produce and process computer text corpora and search them for linguistic data. Literature: B. Kernighan, R. Pike, The UNIX Programming Environment; A. Aho, B. Kernighan, P. Weinberger, The AWK Programming Language; D. Cameron, B. Rosenblatt, Learning GNU Emacs; R. Schwartz, Learning Perl. The course presupposes rudimentary programming skills and introduction to the Unix operating system (e.g. Ctl130). The course is suited for language students. Responsible: prof. Koskenniemi.
Ctl120 DISCRETE MATHEMATICS FOR LANGUAGE TECHNOLOGY (2 credits) 490120-2
Lectures and exercises. Literature: B. Partee, A. ter Meulen, R. Wall, Mathematical Methods in Linguistics,J. Merikoski, A. Virtanen, P. Koivisto, Diskreetti matematiikka I, Matematiikan, tilastotieteen ja filosofian laitos, Tampereen yliopisto, Nro B 42, 1998. The course does not presuppose skills beyond basic secondary school mathematics. It is useful as background for minor subject studies in computer science. To be taken at outset of studies. Responsible: prof. Koskenniemi.
Ctl121 (2 credits) STATISTICS FOR CORPUS LINGUISTICS Basic concepts and practical methods for statistical processing of text corpora. Literature: e.g. T. McEnery, A. Wilson, Corpus linguistics; D. Biber, S. Conrad, R. Reppen, Corpus linguistics; R. Garside, G. Leech, A. McEnery, Corpus annotation, Oakes, M., Statistics for Corpus Linguistics. The course is suited for language students. Responsible: prof. Koskenniemi.
Ctl122 (2 credits) PROBABILITIES AND PATTERN RECOGNITION FOR LANGUAGE AND SPEECH TECHNOLOGY
Background for understanding statistics used in language and speech technology amd machine learning. Literature: J. Tou, R. Gonzales, Pattern Recognition Principles, Addison Wesley, 1974. Rabiner, L, Juang, B-H., Fundamentals of Speech Recognition, Prentice Hall, 1993. Robert J. Schalkoff: "Pattern Recognition: Statistical, Structural, and Neural Approaches" (J. Wiley, 1992), "Pattern Recognition", S. Theodoridis and K. Koutroumbas, 1999, Academic press. Responsible: prof. Koskenniemi.
Ctl190 INTRODUCTORY COURSE OF LANGUAGE TECHNOLOGY (2-5 credits) 490190-1
The course surveys the possibilities, applications and methods of Language Technology. The course and final exam counts for 2 credits, total with exercises (Ctl191) and essay (Ctl192) 5 credits. Responsible: prof. Koskenniemi.
Ctl132 AUTOMATIC PHONOLOGICAL AND MORPHOLOGICAL ANALYSIS (2 credits)
Lectures or book exam. R. Sproat, Morphology and Computation; articles. Responsible: prof. Koskenniemi.
Ctl142 AUTOMATIC SYNTACTIC ANALYSIS (2 credits)
Lectures or book exam. e.g. F. Karlsson, A. Voutilainen, J. Heikkilä, A. Anttila, toim., Constraint grammar, in part; articles.
Vfo121 (1 credits) INTRODUCTION TO THE ACOUSTICS OF SPEECH, see phonetics
VfoXXX (2 credits) COMPUTER PROCESSING OF SPEECH CORPORA, see phonetics
Vfo133 (2 credits) PRACTICAL COURSE OF PHONETICS, see phonetics
Ctl161 (1-2 credits) THEORY OF LANGUAGE LEARNING FOR LANGUAGE TECHNOLOGY, ks. UJy
Ctl162 (1-2 credits) INTRODUCTION TO THE ASSESSMENT OF LANGUAG SKILLS, ks. UJy
Ctl163 (1-2 credits) PRODUCTION OF LANGUAGE LEARNING MATERIALS, ks. UJy
Ctl116 (1-2 credits) COMPUTER ASSISTED TRANSLATION
See MonAKO. Responsible: prof. Carlson.Ctl118 (1-2 credits) COMPUTER ASSISTED TERMINOLOGY
See MonAKO. Responsible: prof. Carlson.Ctl134 (2 credits) METHODS OF INFORMATION RETRIEVAL TaY Po4.
Gives an overall picture of the structure of documents and data bases, information retrieval models, evaluation of information retrieval, and directions in IR research. Literature: Buckland, Information and information systems. Greenwood Press 1991, Kuhlthau, Seeking meaning. Ablex 1993. Choo, Information management for the intelligent organization. Information Today 1995. Responsible: prof. Koskenniemi.
Ctl210 (3 credits) DOCUMENT PROCESSING FOR LANGUAGE TECHNOLOGY
i.a. using Emacs editor, managing large files,, Perl scripts, using morphological analysis programs, producing LaTeX dokuments, producing SGML documents under Emacs PSGML mode, Unix Make, RCS version management under Emacs, HTML, SGML, XML. Responsible: prof. Koskenniemi.
Ctl211 (2 credits) BASIC PROGRAMMING FOR LANGUAGE TECHNOLOGY
The aim is to learn to use basic programming constructs (i.a. data types, control structures, loops, subpgrograms, input/output) and to learn good programming habits. Programming language: Perl. Responsible: prof. Koskenniemi.
Ctl270 PROGRAMMING LANGUAGE PROLOG (2 credits) 490170-7
Lectures and exercises. The course teaches the basics of Prolog. Literature: W.F. Clocksin, C.S. Mellish, Programming in Prolog; L. Sterling, E. Shapiro, The Art of Prolog; F. Pereira, S. Shieber, Prolog and Natural-Language Analysis. Presupposes basic discrete mathematics (Ctl120) and introduction to Unix (e.g. Ctl130). Can be replaced with the course of symbolic programming in Computer Science. Responsible: prof. Koskenniemi.
Ctl271 (1 credits) LISP
The course teaches the basics of Lisp. Can be replaced with the course of symbolic programming in Computer Science. Responsible: prof. Koskenniemi.
Ctl253 PARSING METHODS IN LANGUAGE TECHNOLOGY (2 credits) 490253-1
The course teaches the central parsing algorithms and processing methocs in Language Technology. Lectures, exercises. Literature, parts of e.g. Jurafsky, D., Martin, J. Speech and Language Processing, Prentice-Hall 2000, G. Gazdar, C. Mellish, Natural Language Processing in Prolog; S. Shieber, An Introduction to Unification-based Approaches to Grammar; C. Pollard, I. Sag, Information-Based Syntax and Semantics; R. Sproat, Morphology and Computation. Presupposes basic discrete mathematics (Ctl120), Unix (Ctl130) and Prolog (Ctl170), introductory knowledge of computer science, and basics of syntax, morphology, and phonology (at least Cyk110, Cyk130, Cyk140). Responsible: prof. Koskenniemi.
Ctl254 STATISTICAL METHODS OF LANGUAGE TECHNOLOGY (2 credits) Introduction to the use of statistical methods in Language Technology. Literature: C. Manning, H. Schütze, Foundations of Statistical Natural Language Processing. MIT.Press 1999. Responsible: prof. Koskenniemi.
Ctl290 INTRODUCTORY SEMINAR (2 credits) 490290-0
Talk, poster, or web page. Majors: plan of candidate thesis and presentation of the plan. Responsible: prof. Koskenniemi.
Ctl290 WRITTEN ESSAY (2 credits) 490290-0
Independent work on e.g. processing of a text corpus and a written report of ca. 20 pages. For majors, the 2+2 credits including the introductory seminar and the written essay comprises the candidate's thesis, which constitutes the subject of the maturity exam. The proseminar is optional for minors students. Responsible: prof. Koskenniemi.
Ctl285 PRACTICE (1-2 credits) 490285-8
The aim is to get experience at work, including teaching. Can be passed by acting as course assistant under guidance of a lecturer or in some other supervised teaching task, or through office practice, to be reported in writing. Responsible: prof. Koskenniemi.
Vfo235 AUDITORY PHONETICS AND SPEECH PERCEPTION (2-3 credits), see phonetics
Vfo242 ARTICULATORY PHONETICS (2-3 credits), see phonetics
Ctl281 BASICS OF SPEECH RECOGNITION (2 credits), see HUT. Responsible: Martti Vainio.
VfoXXX BASICS OF SPEECH SYNTHESIS (2-3 credits), see phonetics
Ctl263 WEB BASED PEDAGOGY FOR LANGUAGE LEARNING (2-4 credits), ks. UJy
Ctl211 MACHINE TRANSLATION (2-4 credits)
Introduction to theoretical models and practical solutions in machine translation and computer assisted translation. Responsible: prof. Carlson.
Ctl311 INTELLECTUAL PROPERTY RIGHTS (1-2 credits), Responsible: Koskenniemi.
Ctl312 COMMERCIAL LANGUAGE TECHNOLOGY (1-2 credits)
The task is to give a general picture of commercial activities based on language technology in Finland, the Nordic countries and globally, and go into the special problems of turning linguistics and language technology into products and services. Responsible: prof. Koskenniemi.
Ctl335 WORK EXPERIENCE (1-2 credits)
Can involve holding a special course e.g. on the topic of the Masters thesis or by work experience in a language technology company, to be reported in writing. Responsible: prof. Koskenniemi.
Ctl330 SEMINAR (2 credits) 490380-2
The seminar is intended for students preparing Masters or PhD theses. Teaches presentation and academic discussion skills. Responsible: prof. Koskenniemi.
Ctl331 PRO GRADU THESIS (20 credits)
Ctl332 HISTORY OF LANGUAGE TECHNOLOGY (2 credits), Responsible: prof. Koskenniemi.
Ctl310 FINITE STATE AUTOMATA (2-5 credits)
Literature e.g. Roche, E., Schabes, Y. (toim.,) Finite-state Language Processing; B. Watson, Taxonomies and toolkits of regular language algorithms; articles. Responsible: prof. Koskenniemi.
Ctl350 THEORY OF PARSING (2-5 credits) 490350-1
Lecture and exercises. Literature: parts of J. Hopcroft, J. Ullman, Introduction to Automata Theory, Languages, and Computation; A. Aho, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools; M. Tomita, Efficient Parsing for Natural Language. Presupposes knowledge of methods of Language Technology, e.g. Ctl253. The course presupposes a solid background in computer science. Responsible: prof. Koskenniemi.
Ctl362 COMPUTATIONAL MODELS OF SYNTAX AND SEMANTICS (2-3 credits)
Unification based models of syntax and semantics, logical semantics, discourse representation theory. Responsible: prof. Carlson.
Ctl363 NATURAL LANGUAGE GENERATION (2-3 credits)
Literature: Reiter,E, Dale, R., Building Natural Language Generation Systems . Responsible: Wilcock.
Ctl364 DIALOGUE MANAGEMENT (2-3 credits)
Theory of dialogue modeling and practical applicationsLiterature:... Responsible: prof. Carlson.
Ctl365 MACHINE LEARNING (2-3 credits)
Inductive learning, learning of classifiers, applications. Literature: T. Mitchell, Machine Learning. Responsible: prof. Koskenniemi.
Ctl381 ADVANCED COURSE IN SPEECH RECOGNITION (2-4 credits), see HUT, TUT
VfoXXX ADVANCED COURSE IN SPEECH SYNTHESIS (2-4 credits), see phonetics
Ctl391 INFORMATION RETRIEVAL AND INFORMATION EXTRACTION (2-4 credits)
Basics of information retrieval and information extraction for Language Technology: algorithms, applications, user aspects. Responsible: prof. Koskenniemi.
Ctl392 PROCESSING OF STRUCTURED DOCUMENTS (3 credits) 581290-5 Models and languages for searching, modifying, and transforming structured (XML) documents. Presupposes courses in XML metalanguage, formal language theory, HTML, sound programming skills. Responsible: prof. Koskenniemi.
Ctl393 (2-3 credits) COMPUTER AIDED LANGUAGE LEARNING (CALL), see. UJy. Responsible: prof. Koskenniemi.