This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests
September, 6: LT Group Meeting
September, 13: LT Reading Group
Phrase-Based & Neural Unsupervised Machine Translation, Lample et al., EMNLP 2018
September, 20: LT Reading Group
A neural interlingua for multilingual machine translation and Learning Joint Multilingual Sentence Representations with Neural Machine Translation
September, 27: Stergios Chatzikyriakidis
Title: What kind of Natural Language Inference are NLP systems learning?
September, 28: FoTran 2018
October, 4: LT Seminar: Senka Drobac
Title: Improving OCR of historical newspapers and journals published in Finland by adding Swedish training data
October, 11: LT Group Meeting
October, 25: Aarne Talman and Miikka Silfverberg
Title: Aarne Talman: State-of-the-art Natural Language Inference Systems Fail to Capture the Semantics of Inference
November, 8: LT Seminar and Reading Group
Presenter: Talha Çolakoğlu
Title: Tackling the Problem of Non-Canonical Text Normalization for Turkish Using Deep Learning Techniques
Abstract: Text normalization is a crucial aspect of natural language processing, since a large portion of text data is only available as noisy user-generated content nowadays. For the task of Turkish text normalization, there are five main problems that are recognized in the literature: letter case correction, diacritic restoration, vowel restoration, accent normalization, and spelling correction. The current state of the art in this task has a cascaded approach, combining various rule-based, lexical lookup-based, and statistical techniques, which cannot generalize well. Previous works have shown that deep learning solutions to NLP tasks do much better in generalizing. In addition to this, with deep learning, it is possible to generate and use semantic data attached to words and sentences, which were unavailable for previous approaches, to solve problems in text normalization.
November, 15: Raul Vazquez and Tommi Jauhianen
Raul Vazquez: Multilingual NMT with a language-independent attention bridge
Tommi Jauhianen: Language Identification of Digital Text
November, 22: Lyubov Nesterenko
Title: Passive-regressive: grammatical voice alternations modeling and feature analysis
Abstract: Voice is a complex phenomenon that is highly pragmatic in nature. Its function is mapping the semantic structure of the verb (semantic roles) to the syntactic structure.
The following sentences describe the same situation but what is different about them is the voice construction.
a. John bought this house for $250000 in 1980.
b. This house was bought by John for $250000 in 1980.
There are many explanations (some of them controversial) to this phenomenon in linguistics literature. The goal of the study is to determine which contextual and semantic factors influence the choice of a voice construction (active vs. passive) and to check the hypotheses made by linguists. Logistic regression is used for modeling the problem, and to compare voice alternations of different languages in similar contexts, I use parallel texts data.
November, 29: Ana V. Gonzalez Garduño
Title: Goal Oriented chit chat: Hybrid approach for improving dialogue generation
Abstract: The work described here deals with the use of query ranking and information retrieval in order to better inform hierarchical encoder decoder models. In addition, topics such as Reinforcement learning for dialogue management and evaluation practices will be touched upon.
December, 13: Umut Sulubacak
NOTE: The seminar is in Metsätalo B610 (coffee room on the 6th floor)
Title: The Many Faces of Multimodal Machine Translation: A Budding Survey.
December, 20: canceled!