Yliopiston etusivulle In English
Helsingin yliopisto
CLT231: Introduction to Natural Language Processing - 2010-2011

Yhteystiedot

Nykykielten laitos

PL 24 (Unioninkatu 40)
00014 HELSINGIN YLIOPISTO

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

7. Word Categories.

  • Lecture notes
  • Further reading
  • Practical work
    • In IDLE, run the NLTK Concordance Search application by:
      >>> import nltk
      >>> nltk.app.concordance()
      
      1. Load "English: Brown Corpus (simplified)".
      2. Look at nouns by searching for N, adjectives by searching for ADJ, verbs by searching for V.
      3. Look for patterns like "DET ADJ student", "DET ADJ students".
      4. Compare past tense verbs VD and past participles VN.
      5. Compare "English" in English/ADJ and English/NP.
      6. How many consecutive nouns can you find?
        Search for N, then N N, then N N N, and so on.
    • Using Python statements in IDLE, as shown in A Simplified Part-of-Speech Tagset, make a frequency distribution news_tags_fd for the news category of the Brown Corpus. What are the five most common tags in the news genre?
    • Make a frequency distribution romance_tags_fd for the romance category of the Brown Corpus. What are the five most common tags in the romance genre? Do the two genres differ? Suggest possible reasons.
    • Graphically plot the frequencies of the news tags by news_tags_fd.plot(cumulative=True).
      Estimate the percentage of words included in the first five tags.
    • Graphically plot the frequencies of the romance tags by romance_tags_fd.plot(cumulative=True).
      Estimate the percentage of words included in the first five tags.
    • Using the Brown Corpus news category, find the most common parts of speech that occur before a noun, as shown in Nouns.
    • Find the most common verbs in the Wall Street Journal using the Treebank Corpus, as shown in Verbs.
© 2006-2010 Graham Wilcock

Hae laitoksen sivuilta:

Laitoksen etusivulle | Tiedekunnan etusivulle | Yliopiston etusivulle

Copyright © 2003-2005 Helsingin yliopisto. Kaikki oikeudet pidätetään.