Yliopiston etusivulle In English
Helsingin yliopisto
clt236: XML - lukuvuosi 2009-2010


Yleisen kielitieteen laitos

PL 24 (Unioninkatu 40)

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

6. Statistical Annotation Tools.

  • Lecture notes
    • The OpenNLP tools include a sentence boundary detector, a tokenizer, a POS tagger, a phrase chunker, a sentence parser and a name finder.
    • The tools are based on Maximum Entropy statistical models.
    • The OpenNLP tools are written in Java. They can be used by themselves, or as plugins with other Java frameworks including WordFreak and UIMA.
    • When the tools are used by themselves, the output is plain text, not XML.
      When used with WordFreak, output is in WordFreak XML annotation format.
      When used with UIMA, output is in XML Metadata Interchange (XMI) format.
  • Further reading
  • Practical work: Statistical annotations

Assignment 3.

© 2001-2009 Graham Wilcock