Research Seminar in Language Technology - Spring 2018

Course Information

This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests

Place and Time:

  • Thursdays 14:15-16, Topelia, B107
    (and sometimes Metsätalo B610)

Registration and Announcements:

  • Please subscribe to our e-mail list lt-research by sending a message to with the following text in the body of your e-mail:

    subscribe lt-research

Tentative Schedule:

January, 18: Johannes Bjerva
Title: One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis
Johannes is visiting us in Helsinki at the moment and before he leaves he will give us some insights about mulilingual and multitask models for various NLP tasks.
Abstract: When learning a new skill, you take advantage of your pre-existing skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language, you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the shared vocabulary between these two languages will likely make it easier to learn the new language. In my thesis, I looked at learning multiple tasks, learning multiple languages, and the combination of the two, in the context of Natural Language Processing (NLP). Although these two types of learning may seem different on the surface, I show that they share similarities, and answer research questions related to when and why multitask or multilingual modelling is useful.
January, 25
No seminar today!
February, 8: Yves Scherrer and Jörg Tiedemann
Title: NLPL - The Nordic Language Processing Laboratory
In this presentation, we would like to introduce NLP, the Nordic Language Processing Laboratory, which is a collaboration of academic research groups in Natural Language Processing (NLP) in Northern Europe. Our vision is to implement a virtual laboratory for large-scale NLP research by (a) piloting innovative ways to share high-performance computing and data resources across country borders, (b) by pooling competencies within the user community and among expert support teams, and (c) by enabling internationally competitive, data-intensive research and experimentation on a scale that would be difficult to sustain on commodity computing resources.
February, 15: Uliana Petrunina
Title: Ambiguity of participles in Russian and its resolution using weights and constraint grammar rules
Uliana is a visiting PhD from the University of Tromsø. She will stay with us in Helsinki for the spring term.
March, 1: LT Group Meeting
Place: Topelia, B107
March, 8: No seminar (DHN instead)
Digital Humanities in the Nordic countries (DHN)
March, 15: Alessandro Raganato
Title: Building Multilingual Resources and Neural Models for Word Sense Disambiguation
March, 22: Elin Ehsani
Title: Contructing a WordNet for Turkish using manual and automatic annotation and graph-based text analysis using WordNet relations
Here is also a link to KeNet, the Turkish WordNet.
April, 5: LT Group Meeting
Place: Metsätalo B610
April, 12: Timo Honkela
Title: From disambiguation to meaning negotiation:
Mathematical framework and computational modeling
Abstract: In a classical attempt to formalize the relationship between syntax and semantics, it was assumed that each word has only one meaning (notably by Richard Montague). Since then a lot of effort has been put to disambiguation. In disambiguation, the basic assumption is that a word/phrase may have several meanings, one of which is the correct one in a particular context. Nowadays, this is considered within statistical machine learning as a classification task. Strictly speaking, the world is not as simple as that. The space of meanings is potentially continuous or at least much more finegrained than usually considered. In the Conceptual Space theory (Gärdenfors 2000), the space of meanings is spanned as a continuous space of quality dimensions. From the mathematical point of view, this representation is very different from the classical one in which the meanings are discrete sets. Furthermore, to be realistic, considering meanings as continuous and contextual is not sufficient. Namely, the meaning of each word/expression needs to be considered to be to some degree subjective. In order to facilitare successful communication, disambiguation process is necessary but not sufficient, we need also meaning negotiation. This includes consideration of the subjective interpretations as different positions in the continuous space as well as differences in the spaces of each individual. One mathematical framework to address these phenomena is that of tensors. Contextual and subjective meanings can be modeled as Subject-Object-Context tensors (Honkela et al. 2012). As this approach suggest novel points of view into epistemology through “data driven philosophy”, it also has potentially substantial practical applications in diverse areas of language technology and its applications including but not limited to the concept of Peace Machine (Honkela 2017).
April, 19: Aleksi Sahala
Title: PMI and the semantics of a dead language
April, 26: LT Group Meeting
Place: Metsätalo B610
May, 3: MeMAD UH project team
Title: Methods for managing audiovisual data - MeMAD
Abstract: MeMAD project provides novel methods for efficient re-use and re-purpose of multilingual audiovisual content. These methodologies revolutionize video management and digital storytelling in broadcasting and media production. We go far beyond the state-of-the-art automatic video description methods by making the machine learn from the human. The resulting description is thus not only a time-aligned semantic extraction of objects but makes use of the audio and recognizes action sequences. Read more about the project at our website.
In this presentation we will provide a brief overview of the project and we will, furthermore, focus on multimodal machine translation, which is one of our work packages. We will discuss recent experiments in relation to the multimodal translation task at WMT 2018 and we are looking forward to get further feedback and suggestions.
May, 17: No seminar / no group meeting today
LingDA conference
May, 18: Markus Saers (Note: different date and time!!)
Place: Metsätalo B610, Time: 13:00
Title: Transduction grammar induction and training
May, 24: Tuomo Hiipala
Place: Metsätalo B610
Title: Towards computational descriptions of multimodality in diagrams and information graphics
Abstract: Multimodality – or how multiple modes of expression, such as natural language, photographs, drawings and diagrammatic elements co-operate in communicative situations – is currently gaining interest among various fields of study. In this presentation, I discuss how linguistic theories and methods have been applied to the description of multimodality in diagrams and information graphics. I also outline the main challenges in their computational processing and present ongoing work on a new dataset for automatic diagram understanding, whose annotation has been informed by state-of-the-art theories of multimodality.
May, 31: 2 presentations by Emily Öhman and Andras Kornai
Place: Metsätalo B610
Emily Öhman: Sentimentator: Emotion Annotation and Detection
Andras Kornai: Structure Learning in Weighted Languages (joint work with Attila Zseder, Gabor Recski, and Gabor Borbely)
Abstract: We present Minimum Description Length techniques for learning the structure of weighted languages. MDL is already widely used both for segmentation and classification tasks, and here we show it can be used to formalize further important tools in the descriptive linguists’ toolbox, including the distinction between accidental and systematic gaps in the data, the detection of ambiguity, the selective discarding of data, and the merging of categories. The talk is a continuation of the first three author’s MOL13 paper.
August, 2: Adam Jardine, Rutgers University
Place: Metsätalo B610
Time: 3-4pm
Title: Comparing string representations to graph representations for phonological tone
Abstract: What role does representation play in phonological complexity? This talk introduces a notion of computationally restrictive autosegmental grammars, as well as a method for directly comparing the expressivity of these grammars to that of string grammars. Research into the computational nature of phonological patterns has found that the complexity of phonology is extremely restricted. Drawing on patterns from dialects of Japanese and Bantu languages, this talk shows that patterns in tonal phonology do not fit well into string-based complexity classes.