This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests
Title: Reports from conferences and on-going/coming projects
September, 21: Anssi Yli-Jyrä
Title: A High Coverage (99.994%) Limit on Two-Sided Embedding in
Universal Dependencies Treebanks
Abstract: A recently proposed encoding for noncrossing digraphs can be used to implement
generic inference over families of these digraphs and to carry out first-order
factored dependency parsing. It is now shown that the recent proposal can be
substantially streamlined without information loss. The improved encoding is
less dependent on hierarchical processing.
The encoding gives rise to a high-coverage bounded-depth approximation of the
space of noncrossing digraphs. This subset is presented elegantly by a
finite-state machine that recognizes an infinite set of encoded graphs. The set
includes more than 99.99% of the 0.6 million noncrossing graphs obtained from
the UDv2 treebanks through planarisation.
Rather than taking the low probability of the residual as a flat rate, it can be
modelled with a joint probability distribution that is factorised into two
underlying stochastic processes – the sentence length distribution and the
related conditional distribution for deep nesting. This model points out that
deep nesting in the streamlined code requires extreme sentence lengths. High
depth is categorically out in common sentence lengths but emerges slowly at
infrequent lengths that prompt further inquiry.
September, 28: Mika Hämäläinen
Title: From Poem Generation to an Online Dictionary
October, 5: Seppo Nyrkkö
Title: Ontology-related, learning tagging model on parsed text
October, 19: Niklas Laxström
Title: Continuous machine translation improvement with feedback loops
Abstract: In order to decide what to research next I have been studying the Wikipedia article translation process. The throughput of the process depends on many factors such as availability of source text, motivated translators, easy of use, and the quality of automatic machine translation. I will present the current system and how it has already been optimized. I will then explore, based on previous work, how the current use of machine translation could be changed to be more interactive in order to create a positive feedback loop that improves the system as it is being used.
November, 1: FinMT (NOTE: Wednesday!)
Title: Workshop on Machine Translation
November, 16: Mark Granroth-Wilding
Title: Unsupervised learning of cross-lingual representations with no prior linguistic knowledge.
November, 23: Aarne Talman
NOTE: unusal time: 14:45
NOTE: unusual place: U40, common room at the 6th floor (B610)
Title: Natural Language Inference - Another Triumph for Deep Learning?
Abstract: The main task of Natural Language Inference (NLI) research is to build computational systems that are able to recognise valid inferences as well as contradictions from text input. NLI has central importance when studying natural language understanding, computational semantic as well as artificial intelligence more generally, as humans are clearly capable of quite sophisticated natural language inferences. Traditionally NLI researchers have focused on rule-based approaches, however, recently after the publication of the Stanford NLI corpus containing more than 570K sentence pairs, there has been a growing interest in deep learning approaches to NLI. These approaches offer significant benefits over the rule-based approaches, however they are still in many ways far from solving the problem.
In this talk I explain the two approaches to NLI and outline their strengths and weaknesses. I argue that the current approaches are still highly limited and far from solving the problem of NLI. I then present some directions I plan to take in my new PhD research project, where I aim to combine the strengths from both of the above mentioned approaches.
December, 14: Jörg Tiedemann
NOTE: unusual place: Metsätalo, common room at the 6th floor (B610)
Title: Learning to understand languages with cross-lingual grounding
Abstract: Translated texts are semantic mirrors of the original text and the
significant variations that we can observe across languages can be
used to disambiguate the meaning of a given expression using the
linguistic signal that is grounded in translation. We are interested
in massively parallel corpora consisting of hundreds up to a thousand
different languages and how they can be applied as implicit
supervision to learn abstractions that could lead to significant
improvements in natural language understanding. As a side-effect, we
can also see how multilingual models can pick up relationships between
languages building a continuous space representing reasonable language
clusters. I will talk about some initial results and plans for the
future and I would like to get your feedback about those ideas.