Research Seminar in Language Technology - Autumn 2017

Course Information

This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests

Place and Time:

Registration and Announcements:

  • Please subscribe to our e-mail list lt-research by sending a message to with the following text in the body of your e-mail:

    subscribe lt-research

Tentative Schedule:

September, 14
Title: Reports from conferences and on-going/coming projects
September, 21: Anssi Yli-Jyrä
Title: A High Coverage (99.994%) Limit on Two-Sided Embedding in Universal Dependencies Treebanks
Abstract: A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing. The encoding gives rise to a high-coverage bounded-depth approximation of the space of noncrossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.
September, 28: Mika Hämäläinen
Title: From Poem Generation to an Online Dictionary
October, 5: Seppo Nyrkkö
Title: Ontology-related, learning tagging model on parsed text
October, 19: Niklas Laxström
Title: Continuous machine translation improvement with feedback loops
Abstract: In order to decide what to research next I have been studying the Wikipedia article translation process. The throughput of the process depends on many factors such as availability of source text, motivated translators, easy of use, and the quality of automatic machine translation. I will present the current system and how it has already been optimized. I will then explore, based on previous work, how the current use of machine translation could be changed to be more interactive in order to create a positive feedback loop that improves the system as it is being used.
November, 1: FinMT (NOTE: Wednesday!)
Title: Workshop on Machine Translation
November, 16: Mark Granroth-Wilding
Title: Unsupervised learning of cross-lingual representations with no prior linguistic knowledge.
November, 23: Aarne Talman
NOTE: unusal time: 14:45
NOTE: unusual place: U40, common room at the 6th floor (B610)
Title: Natural Language Inference - Another Triumph for Deep Learning?

Abstract: The main task of Natural Language Inference (NLI) research is to build computational systems that are able to recognise valid inferences as well as contradictions from text input. NLI has central importance when studying natural language understanding, computational semantic as well as artificial intelligence more generally, as humans are clearly capable of quite sophisticated natural language inferences. Traditionally NLI researchers have focused on rule-based approaches, however, recently after the publication of the Stanford NLI corpus containing more than 570K sentence pairs, there has been a growing interest in deep learning approaches to NLI. These approaches offer significant benefits over the rule-based approaches, however they are still in many ways far from solving the problem.

In this talk I explain the two approaches to NLI and outline their strengths and weaknesses. I argue that the current approaches are still highly limited and far from solving the problem of NLI. I then present some directions I plan to take in my new PhD research project, where I aim to combine the strengths from both of the above mentioned approaches.

December, 14: Jörg Tiedemann
NOTE: unusual place: Metsätalo, common room at the 6th floor (B610)
Title: Learning to understand languages with cross-lingual grounding
Abstract: Translated texts are semantic mirrors of the original text and the significant variations that we can observe across languages can be used to disambiguate the meaning of a given expression using the linguistic signal that is grounded in translation. We are interested in massively parallel corpora consisting of hundreds up to a thousand different languages and how they can be applied as implicit supervision to learn abstractions that could lead to significant improvements in natural language understanding. As a side-effect, we can also see how multilingual models can pick up relationships between languages building a continuous space representing reasonable language clusters. I will talk about some initial results and plans for the future and I would like to get your feedback about those ideas.