Yliopiston etusivulle In English
Helsingin yliopisto
clt234: Natural Language Processing Applications - lukuvuosi 2009-2010

Yhteystiedot

Nykykielten laitos

PL 24 (Unioninkatu 40)
00014 HELSINGIN YLIOPISTO

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

8. Named Entity Recognition.

  • Lecture notes
  • Further reading
  • Practical work
    • nltk.ne_chunk() is a classifier-based named entity recognizer, described at the end of NLTK 7.5.
    • Find named entities in the Penn Treebank corpus, using nltk.ne_chunk() on tagged sentences as in NLTK 7.5.
    • How can you find named entities in Northanger Abbey? This novel by Jane Austen is not in the NLTK Gutenberg corpus. Download the plain text file from Project Gutenberg.
    • You need to do sentence segmentation, tokenization and part-of-speech tagging before doing named entity recognition.
    • Make a linguistic processing pipeline, as shown in NLTK 7.1 Information Extraction Architecture, using
      nltk.sent_tokenize(),
      nltk.word_tokenize(),
      nltk.pos_tag(),
      nltk.ne_chunk().

Assignment 3.

© 2008-2010 Graham Wilcock

Hae laitoksen sivuilta:

Laitoksen etusivulle | Tiedekunnan etusivulle | Yliopiston etusivulle

Copyright © 2003-2005 Helsingin yliopisto. Kaikki oikeudet pidätetään.