Research Seminar in Language Technology - Spring 2017

Course Information

This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests

Place and Time:

  • Thursdays 14-16, U40 sali 25

Registration and Announcements:

  • Please subscribe to our e-mail list lt-research by sending a message to with the following text in the body of your e-mail:

    subscribe lt-research

Tentative Schedule:

January, 19: Jouna Pyysalo
Title: Proto-Indo-European Lexicon: The generative etymological dictionary of Indo-European languages
February, 2: Arvi Hurskainen
Title: Machine translation between structurally different languages
February, 16: Anssi Yli-Jyrä
Title: Why a derivational model of discontinuous parsing is desirable?

Abstract: There are good reasons to study probabilistic parsing in the more general case where the trees are discontinuous, containing long-distance dependencies and dislocations that break the constraints of projective constituents or dependencies. But probabilistic models for discontinuous parsing are more challenging to design than for continuous parsing. For example, shift-reduce parsing does not have any principled way to compute probability of the parse.

We give the first model that defines plausible probability distribution over the space of all discontinuous parses. A latent-variable probabilistic grammars is generalized directly to discontinuous parsing. The chosen formalization covers both constituent parsing and dependency parsing and can be used for practical probabilistic reranking of parses.

March, 2: Timo Honkela
Title: Machine Learning and Language Technology in the Peace Machine Concept
March, 16: Yves Scherrer
Title: Language technology for closely related language varieties - Case studies on normalization and tagging
March, 30: Seppo Nyrkkö
Title: R experiment with learning (and forgetting) semantic model
April, 20: Jörg Tiedemann
Title: What is neural MT and what can we expect from it? (organised as a Kites event and the MT special interest group)
April, 27: Geraint A. Wiggings (Computational Creativity Lab, Queen Mary University of London)
Title: Creativity, deep symbolic learning, and the Information Dynamics of Thinking
Abstract: I present a hypothetical theory of cognition which is based on the principle that mind/brains are information processors and compressors, that are sensitive to certain measures of information content, as defined by Shannon (1948). The model is intended to help explicate processes of anticipatory and creative reasoning in humans and other higher animals. The model is motivated by the evolutionary value of prediction in information processing in an information-overloaded world.

The Information Dynamics of Thinking (IDyOT) model brings together symbolic and non-symbolic cognitive architectures, by combining sequential modelling with hierarchical symbolic memory, in which symbols are grounded by reference to their perceptual correlates. This is achieved by a process of chunking, based on boundary entropy, in which each segment of an input signal is broken into chunks, each of which corresponds with a single symbol in a higher level model. Each chunk corresponds with a temporal trajectory in the complex Hilbert space given by a spectral transformation of its signal; each symbol above each chunk corresponds with a point in a higher space which is in turn a spectral representation of the lower space. Norms in the spaces admit measures of similarity, which allow grouping of categories of symbol, so that similar chunks are associated with the same symbol. This chunking process recurses “up” IDyOT’s memory, so that representations become more and more abstract.

It is possible to construct a Markov Model along a layer of this model, or up or down between layers. Thus, predictions may be made from any part of the structure, more or less abstract, and it is in this capacity that IDyOT is claimed to model creativity, at multiple levels, from the construction of sentences in everyday speech to the improvisation of musical melodies.

IDyOT’s learning process is a kind of deep learning, but it differs from the more familiar neural network formulation because it includes symbols that are explicitly grounded in the learned input, and its answers will therefore be explicable in these terms.

In this talk, I will explain and motivate the design of IDyOT with reference to various different aspects of music, language and speech processing, and to animal behaviour.

May, 11: dry-runs for NoDaLiDa presentations
Senka Dobrac and Pekka Kauppinen: OCR and post-correction of historical Finnish texts
Tommi Jauhiainen, Krister Lindén and Heidi Jauhiainen: Evaluation of language identification methods using 285 languages
Anssi Yli-Jyrä: The Power of Constraint Grammars Revisited
June, 1: Lidia Pivovarova
Title: Convolutional neural networks for text classification
PULS is a news surveillance project targeting the business domain. Our news monitoring system collects about 6-8 thousand news items per day and tracks hundreds of thousands of entities and activities. Many of the tasks we face in the PULS project are text classification problems: by topic, sentiment, or event type. In this talk I will describe how these problems can be solved by using convolutional neural networks and present a series of our most recent experiments.