Research Seminar in Language Technology - Autumn 2015

Course Information

This is the time table of the general research seminar in language technology. Presentations will be given primarily by doctoral students but also other researchers in language technology including external guests

Place and Time:

  • Thursdays 14-16, different rooms, 10.9. - 8.12.2015.

Registration and Announcements:

  • Register for the mailing list lt-research.

Schedule:

10 September: Jörg Tiedemann
The Amazing Utility of Parallel Corpora
P724 (Porthania, Yliopistonkatu 3, 7th floor)
In this talk I would like to present a selection of my research interests in connection with parallel corpora. I intend to give a short introduction to OPUS before talking about the use of parallel corpora in language technology. I will briefly look at data-driven machine translation but also mention other useful applications that take advantage of the rich linguistic knowledge hidden in parallel corpora.
24 September: Miikka Silfverberg
Morphological Disambiguation using Probabilistic Sequence Models
P723 (Porthania, Yliopistonkatu 3, 7th floor)
The topic of my talk is data-driven morphological disambiguation for morphologically rich languages such as Finnish. I will first briefly present the task of morphological tagging and probabilistic sequence models which are commonly applied in morphological tagging. I will then talk about the distinction between generative and discriminative sequence models and the problems that are associated with applying both classes of models to morphologically rich languages. After this introduction, I will present my own work on morphological tagging:
(1) A novel weighted finite-state implementation for generative taggers, (2) Experiments on faster estimation for taggers and smaller tagging models and (3) Experiments on improving tagging acuracy by utilizing sub labels of complex morphological labels.
8 Oktober: Seppo Nyrkkö
Statistical word vector space exploration
P723 (Porthania, Yliopistonkatu 3, 7th floor)
In this presentation, I am describing a computational, statistical model to spot and identify recurring terms and newly introduced concepts in continuous text flow, analysed using a dependency parser. The key features in this model are incremental data driven learning, restricted memory usage and term identification by ambiguous references. A further aim is in matching and evaluating the found terms against a pre-defined set of semantic web ontology concepts, their subclasses and superclasses. I'll explain the vector space model involved in the term modelling, and the procedure of bootstrapping and training the concept vectors. To support the introduction, I'll give a demonstration of OntoR: a statistical ontology toolkit, built on the R statistical computing language.
22 Oktober: Kun Ji
Aggregation of Federated Terminology Service Data Using Ontology Matching
P723 (Porthania, Yliopistonkatu 3, 7th floor)
Nowadays, data tend to be everywhere in our daily life, but most of them remain distributed or organized in a way that hinder the efficient information retrieval. Existing terminology integration systems bias to identifying equivalent relation by similarity calculation syntactically. Only very few of them adopt semantic matching techniques, which are actually crucial for linking data in a real sense. This talk gives an introduction to aggregate heterogeneous web-based terminology resources through ontology matching. The research will be conducted on TermFacroty (TF), an environment with a collection of facilitates based on semantic web technologies. TF aggregation features in constructing as many relation tags as possible to connect more data across databases with duplicates identified and kept separately.
5 November: Robert Östling
room: P724
I will give an overview of some of the projects I have been involved in during my time at Stockholm University, aiming both to disseminate previous results and to inspire future research cooperation. The following projects or topics will be covered, in varying detail:
  • Bayesian models for word alignment.
  • Typological investigations with parallel texts.
  • Mapping dialectal variations in the Swedish language area.
  • Word frequency and duration across signed, spoken and written modalities.
  • Entropy estimations in simple large-scale N-gram language models.
  • Part of speech tagging for Swedish and Icelandic.
  • Automated grading of Swedish essays.
19 November: Kun Ji
Room: PR sali 15 (Main building, room 15)
Aggregation of Federated Terminology Service Data Using Ontology Matching
Nowadays, data tend to be everywhere in our daily life, but most of them remain distributed or organized in a way that hinder the efficient information retrieval. Existing terminology integration systems bias to identifying equivalent relation through calculation of similarity syntactically. Only very few of them adopt semantic matching techniques, which are actually to aggregate heterogeneous web-based terminology resources through ontology matching. The research will be conducted on TermFactory (TF), an environment with a collection of facilitates based on semantic web technologies. Aggregation on TF features in construction as many relation takes as possible to connect more data across databases with duplicates identified and kept separately.
26 November: Maarit Koponen
Machine translation post-editing
Sali 9 (Main building, Fabianinkatu 33, 3rd floor)
3 December: Senka Drobac
Hyper-minimization of finite state transducers
U35 sh 113 (Unioninkatu 35)
This Thursday I will talk about my research in hyper-minimization of finite state transducers. The presentation will mostly cover following topics:
  • Finite state transducers and their application in language technology
  • Hyper-minimization of finite state transducers, why minimization isn't enough
  • Flag diacritics as solution for long dependencies
  • Hyper-minimization methods:
    • Automatic insertion of flag diacritics
    • Lossy hyper-minimization
    • Research results, conclusions
10 December: Shanshan Wang
P617 = Porthania (Yliopistonkatu 3, 6th floor)