Yliopiston etusivulle In English
Helsingin yliopisto
clt231: Introduction to Natural Language Processing - lukuvuosi 2009-2010

Yhteystiedot

Nykykielten laitos

PL 24 (Unioninkatu 40)
00014 HELSINGIN YLIOPISTO

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

6. Conditional Frequencies.

  • Lecture notes
  • Further reading
  • Practical work
    • Using IDLE as an editor, as shown in More Python: Reusing Code, write a Python program generate.py to do the following.
    • In Generating Random Text with Bigrams, a function generate_model() is defined. Copy this function definition exactly as shown.
    • Make a conditional frequency distribution of all the bigrams in Jane Austen's novel Emma, like this:
      emma_text = nltk.corpus.gutenberg.words('austen-emma.txt')
      emma_bigrams = nltk.bigrams(emma_text)
      emma_cfd = nltk.ConditionalFreqDist(emma_bigrams)
      
    • Try to generate 100 words of random Emma-like text:
      generate_model(emma_cfd, 'The', 100)
      
    • To avoid getting stuck in a loop, the generation function needs to make a choice from the probable continuation words. Modify the function like this:
      words = list(cfdist[word])
      word = random.choice(words)
      
    • Before using functions from the random module, your program needs:
      import random
      
    • Now try again to generate 100 words of random Emma-like text:
      generate_model(emma_cfd, 'The', 100)
      
      Repeat this several times to check if the texts are random.
    • Make a conditional frequency distribution of all the bigrams in Melville's novel Moby Dick, like this:
      moby_text = nltk.corpus.gutenberg.words('melville-moby_dick.txt')
      moby_bigrams = nltk.bigrams(moby_text)
      moby_cfd = nltk.ConditionalFreqDist(moby_bigrams)
      
    • Now generate 100 words of random Moby Dick-like text:
      generate_model(moby_cfd, 'The', 100)
      
      Repeat this several times to check if the texts are random.
    • Can you observe different styles in the two types of generated texts?

Assignment 2.

© 2006-2009 Graham Wilcock

Hae laitoksen sivuilta:

Laitoksen etusivulle | Tiedekunnan etusivulle | Yliopiston etusivulle

Copyright © 2003-2005 Helsingin yliopisto. Kaikki oikeudet pidätetään.