Yliopiston etusivulle In English
Helsingin yliopisto
clt350: Tilastolliset jäsennysmenetelmät (Statistical Parsing Methods) - syksy 2008

Yhteystiedot

Yleisen kielitieteen laitos
PL 9 (Siltavuorenpenger 20 A)
00014 HELSINGIN YLIOPISTO

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 29307

7. Further Topics: Named Entity Recognition

5.1. OpenNLP Name Finder

  • Practical work
    • OpenNLP name finder recognizes several different types of entities. It uses a separate maximum entropy model for each type. There are 7 ready-made models: person, location, organization, date, time, money, percentage.
    • Copy the script clt350-opennlp-namefinder to your directory and make it executable. This script runs OpenNLP sentence detector followed by OpenNLP name finder. It takes input from stdin and sends output to stdout.
    • Use it like this to find named entities in Sonnet 130:
      ./clt350-opennlp-namefinder <sonnet130.txt >names.txt &
    • Try named entity recognition with bigger texts and corpora: Jane Austen's Northanger Abbey, or half a million words in Jane Austen's six main novels.

5.2. Training New Models

Assignment 3

© 2007-2008 Graham Wilcock