Yliopiston etusivulle Suomeksi
Helsingin yliopisto
clt262: Avoimen koodin kieliteknologia (Open Source Language Technology) - syksy 2005


Yleisen kielitieteen laitos
PL 9
Siltavuorenpenger 20A
00014 Helsingin yliopisto

Puhelin: +358 (09) 1911 (vaihde)
Faksi: +358 (09) 191 29307

Kurssin materiaalit

1. Installing open source software.

1.1. Installing Xerces.

1.2. Installing jEdit.

  • Lecture notes: jEdit.
  • Practical work:
    • Install jEdit in your own directory.
    • Learn about jEdit by studying the jEdit help menu.

2. Reusing published examples.

2.1. Basic XSLT processing with Java.

2.2. Transforming CSV spreadsheet files to HTML.

3. Building software with Ant.

3.1. Installing Ant

Apache Ant
  • Lecture notes: Apache Ant project
  • Practical work:
    • Install Ant in your own directory.
    • Set ANT_HOME to the location where you install it.
      export PATH=$PATH:$ANT_HOME/bin

3.2. Using Ant to compile and execute.

4. Open source speech technology.

4.1. Java Speech API.

4.2. Installing FreeTTS.

  • Lecture notes: FreeTTS
  • Practical work (on your own workstation using earphones):

4.3. Optional: Non-Java open source speech tools.

5. Open source language technology.

5.1. Open source NLP tools.


5.2. Installing OpenCCG.

  • Lecture notes: OpenCCG
  • Practical work:
    • Download OpenCCG version 0.8.5 (not 0.8.6) to your own directory.
    • Follow the instructions in README about setting environment variables, building OpenCCG using Ant, and finally "Trying it out".
    • Try: tccg> the teacher bought the policeman a book

6. Testing with JUnit.

6.1. Installing JUnit.

  • Lecture notes: JUnit.org
  • Practical work:
    • Install JUnit in your own directory and set JUNIT_HOME.
      Add junit.jar to your CLASSPATH, for example:
      export CLASSPATH=.:$JUNIT_HOME/junit.jar
    • Installation test: cd $JUNIT_HOME
      All on one line: java junit.swingui.TestRunner junit.samples.AllTests

6.2. Testing with JUnit.

6.3. Optional: Extreme Programming.

7. Version control: RCS and CVS.

7.1. RCS and CVS.

  • Lecture notes: CVS: Concurrent Versions System
  • Practical work:
    • In your Assignment 2 directory, make a new subdirectory "RCS".
      Checkin your Ant buildfile, with description "Assignment 2 buildfile".
    • Checkout the buildfile. Rename targets "prepare", "clean" to "init", "tidy".
      View the change by: rcsdiff build.xml
    • Checkin the buildfile, with log message "Renamed targets".
      View the file's history by: rlog build.xml

7.2. Downloading open source software using CVS.

  • Lecture notes: Midiki: Mitre Dialogue Kit
  • Practical work: Download the latest version of Midiki using CVS.
    • Execute this CVS command (all on one line): cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/midiki checkout -P midiki
    • Use Ant to build Midiki: cd midiki; ant build-midiki
    • Run Midiki (all on one line): java -cp target/midiki-0.1.4.jar org.mitre.dm.qud.domain.diagnosis.diagnosis_dm

8. Developing with IDEs.

8.1. Integrated Development Environments (IDEs).

8.2. Avoiding problems when using IDEs.

8.3. Make your own IDE by extending jEdit.

9. Open source databases.

9.1. Relational database systems.


9.2. Structured Query Language (SQL).

10. Java and SQL.

10.1. Java Database Connectivity (JDBC).

10.2. Java, SQL and XML.

11. WordNet: a lexical database.

11.1. Princeton WordNet.

  • Lecture notes: WordNet
  • Practical work:
    • Start the WordNet browser on venus: venus$ wnb &
    • Nouns: Query hyponyms of "student": what kind of student are you?
      Compare hypernyms of "student" and "professor": what's the difference?
      Compare hyponyms of "professor" and "lecturer": is WordNet US English?
    • Adjectives: Compare "big", "large, "great". What are their antonyms?
      Which combinations of "big/large/great sister/uncle/toe" are collocations?

11.2. Other WordNets.

12. WordNet, SQL and Java.

12.1. WordNet SQL databases.

12.2. Java API for WordNet.

  • Practical work: Open source JWNL (Java WordNet Library)
    • Install JWNL in your own directory and set your CLASSPATH
      (add jwnl.jar, utilities.jar, commons-logging.jar).
    • Edit file_properties.xml to specify the location of WordNet:
      <param name="dictionary_path" value="/usr/share/wordnet"/>
    • Compile Examples.java and run it:
      java net.didion.jwnl.utilities.Examples file_properties.xml
  • Assignment 5: Accessing WordNet from Java.

13. GATE: an IDE for NLP.

13.1. GATE.

13.2. ANNIE.

14. GATE, Java and WordNet.

14.1. GATE data stores.

  • Lecture notes: Serial data stores in GATE
  • Practical work:
    • Run GATE and create a serial data store in your directory. Load Sonnet130 and save it as XML in the data store.
    • Run ANNIE and save Sonnet130 as XML again (with a new name).
    • Problem solving: When I ran Sentence Splitter on Sonnet130, it went on and on without finishing. How did I find out what was wrong? (Hint and Answer)

14.2. GATE, Java and WordNet.

© 2004-2005 Graham Wilcock