Yliopiston etusivulle In English
Helsingin yliopisto
clt350: Statistical Parsing Methods - lukuvuosi 2009-2010


Nykykielten laitos

PL 24 (Unioninkatu 40)

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

7. Training Named Entity Models

5.1. OpenNLP Name Finder

  • Practical work
    • OpenNLP name finder recognizes several different types of entities. It uses a separate maximum entropy model for each type. There are 7 ready-made models: person, location, organization, date, time, money, percentage.
    • Copy the script clt350-opennlp-namefinder.sh to your directory and make it executable. This script runs OpenNLP sentence detector followed by OpenNLP name finder. It takes input from stdin and sends output to stdout.
    • Use it like this to find named entities in Sonnet 130:
      ./clt350-opennlp-namefinder.sh <sonnet130.txt >names.txt &
    • Try named entity recognition with bigger texts and corpora: Jane Austen's Northanger Abbey, or half a million words in Jane Austen's six main novels.

5.2. Training New Models

Assignment 3

© 2007-2010 Graham Wilcock

Hae laitoksen sivuilta:

Laitoksen etusivulle | Tiedekunnan etusivulle | Yliopiston etusivulle

Copyright © 2003-2005 Helsingin yliopisto. Kaikki oikeudet pidätetään.