Yliopiston etusivulle In English
Helsingin yliopisto
CLT231: Introduction to Natural Language Processing - 2010-2011

Yhteystiedot

Nykykielten laitos

PL 24 (Unioninkatu 40)
00014 HELSINGIN YLIOPISTO

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 28313

6. Text Genres and Word Frequencies.

  • Lecture notes
  • Further reading
  • Practical work
    • Using IDLE as an editor, as shown in More Python: Reusing Code, write a Python program generate.py to do the following.
    • In Generating Random Text with Bigrams, a function generate_model() is defined. Copy this function definition exactly as shown.
    • Make a conditional frequency distribution of all the bigrams in Jane Austen's novel Emma, like this:
      emma_text = nltk.corpus.gutenberg.words('austen-emma.txt')
      emma_bigrams = nltk.bigrams(emma_text)
      emma_cfd = nltk.ConditionalFreqDist(emma_bigrams)
      
    • Try to generate 100 words of random Emma-like text:
      generate_model(emma_cfd, 'The', 100)
      
    • To avoid getting stuck in a loop, the generation function needs to make a choice from the probable continuation words. Modify the function like this:
      words = list(cfdist[word])
      word = random.choice(words)
      
    • To use functions from the random module, your program needs:
      import random
      
    • Now try again to generate 100 words of random Emma-like text:
      generate_model(emma_cfd, 'The', 100)
      
      Repeat this several times to check if the texts are random.
    • Make a conditional frequency distribution of all the bigrams in Melville's novel Moby Dick, like this:
      moby_text = nltk.corpus.gutenberg.words('melville-moby_dick.txt')
      moby_bigrams = nltk.bigrams(moby_text)
      moby_cfd = nltk.ConditionalFreqDist(moby_bigrams)
      
    • Now generate 100 words of random Moby Dick-like text:
      generate_model(moby_cfd, 'The', 100)
      
      Repeat this several times to check if the texts are random.
    • Can you observe different styles in the texts generated by the two generation models? Show them to someone else and check if they can tell which texts were produced by which generation model.

Assignment 2.

© 2006-2010 Graham Wilcock

Hae laitoksen sivuilta:

Laitoksen etusivulle | Tiedekunnan etusivulle | Yliopiston etusivulle

Copyright © 2003-2005 Helsingin yliopisto. Kaikki oikeudet pidätetään.