clt236: XML - lukuvuosi 2009-2010


Yleisen kielitieteen laitos

14.8.2009 saakka:
PL 9 (Siltavuorenpenger 20 A)

17.8.2009 alkaen:
PL 24 (Unioninkatu 40)

Puhelin +358 (09) 1911 (vaihde)
Faksi +358 (09) 191 29307

4. Introduction to Linguistic Annotation.

  • Lecture notes
  • Further reading
  • Practical work: In-line annotations
    • With in-line markup, the text and annotations are mixed together.
    • Edit the XML version of Sonnet 130 in jEdit. Add annotations for sentence boundaries by marking the start of each sentence with <s> and the end of each sentence with </s>. Delete the DOCTYPE line because the DTD does not allow <s>. Save the file as MySonnet130.xml. Use jEdit XML plugin to check that the file is well-formed XML.
    • Was this difficult? Is the file well-formed XML? Did you find a successful way to fit the <s> elements and the <line> elements together?
    • Edit the XML version of Sonnet 13 the same way. Mark sentence boundaries with <s> and </s>. Delete the DOCTYPE line. Save the file as MySonnet13.xml. Is the file well-formed XML? Use XML plugin to check.
    • In Sonnet 13, several sentence boundaries occur in the middle of lines (for example in lines 1, 6, 13, 14). It's basically impossible to merge the <s> and <line> elements together satisfactorily in one well-formed XML file.
  • Practical work: Stand-off annotations
© 2001-2009 Graham Wilcock