Yliopiston etusivulle In English
Helsingin yliopisto
clt231: Introduction to Natural Language Processing - lukuvuosi 2009-2010


Nykykielten laitos

PL 24 (Unioninkatu 40)

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

3. Word Frequencies.

  • Lecture notes
  • Further reading
  • Practical work
    • Open Python IDLE from the Start menu and do:
      >>> import nltk
      >>> from nltk.book import *
    • Frequency Distributions
      Make a frequency distribution fdist2 for Sense and Sensibility (text2).
      What are the 50 most frequent words (tokens) in the novel?
      How many times does the word Mrs occur in the novel?
    • Fine-grained Selection of Words
      Find all the words from Sense and Sensibility that are longer than 10 letters.
      Find all the words longer than 10 letters, that occur more than 10 times.
    • Counting Other Things
      In Moby Dick, 4-letter words are more frequent than 2-letter words.
      Using the method shown, check if this is true for Sense and Sensibility.
      Which functions defined for NLTK frequency distributions give the same results as Python functions that you already know? For example:
      Compare fdist2.N() and Python len(text2).
      Compare fdist2['very'] and Python text2.count('very').

Assignment 1.

© 2006-2009 Graham Wilcock

Hae laitoksen sivuilta:

Laitoksen etusivulle | Tiedekunnan etusivulle | Yliopiston etusivulle

Copyright © 2003-2005 Helsingin yliopisto. Kaikki oikeudet pidätetään.