Teaching Natural Language Generation in an XML Framework -------------------------------------------------------- Graham Wilcock University of Helsinki 00014 Helsinki, Finland graham.wilcock@helsinki.fi 1. Introduction XML-based techniques for natural language generation are described by Wilcock (2001), based on practical experience in developing an XML-based generation component for a spoken dialogue system (Jokinen and Wilcock, 2001). The basic approach is to construct a pipeline of XSLT transformations corresponding to the different NLG processing tasks. Wilcock (2002) gives a tutorial introduction to this approach and to the basic NLG tasks. A related software demo showing NLG integrated with XML web technology is described by Wilcock (2003). Ongoing work aims at adapting this framework for use in teaching natural language generation courses for language technology students. The description here is largely taken from Wilcock (2003). 2. The Demonstration System The demonstration system performs bilingual generation of responses, in Finnish and English, as part of a Helsinki bus timetable enquiry system. The responses depend on the dialogue context and can vary from full sentences to short elliptical phrases. The system demonstrates only generation, without speech recognition, language understanding or dialogue management. 2.1 Input Agenda The starting point is an agenda, a set of concepts marked with Topic and NewInfo tags (Jokinen and Wilcock, 2001). A number of different starting agendas are provided, and their contents can be changed as desired. route new-info 81 time new-info 11:37 place depart topic herttoniemenranta Figure 1: An Agenda The agenda is represented as an annotation graph. Figure 1 shows an agenda for a response following the enquiry When does the next bus leave from Herttoniemenranta? The departure-place is marked as topic, and the route-number and departure-time are marked as new information. The response (generated step-by-step in the next sections) will be Number 81 leaves from there at 11:37. 2.2 Text Planning In text planning, the content determination stage extracts the concepts from the annotation graph. The discourse structuring stage creates a text plan tree (here called a response plan) using the form of template-based generation described by Wilcock (2001). NumFromDepMsg bus-number 81 departure-time 11:37 departure-place herttoniemenranta Figure 2: A Text Plan Text plans are XML tree structures containing variable slots, filled in later by the microplanning stages. In this example there is only one message, typical in spoken dialogue responses. In multi-paragraph text generation there are large numbers of messages. Note that departure-place is Topic, bus-number and time are NewInfo. In the teaching system, tracing can be switched on so the text plan is displayed. 2.3 Microplanning The microplanning stages are a sequence of XSLT transformations (Wilcock, 2001). The text plan tree is replaced by a text specification tree, here called a response specification. At later stages of the pipeline, further information is added to the tree or nodes in the tree are replaced by new nodes. In the referring expression stage of microplanning, domain concepts are replaced with linguistic referring expressions. leave number 81 from-place from there at-time at 11:37 Figure 3: A Text Specification In Figure 3 the concepts of Figure 2 have been replaced by linguistic specifications. In the lexicalization stage, the words are inserted with their dependents, using a form of head-dependency structure. In the referring expressions stage, the departure-place concept which was marked as Topic in Figure 2 has been pronominalized as there. If the same departure-place concept were marked as NewInfo, it would be realized by the actual text value of the departure-place. In the teaching system, tracing can be switched on so the text specification is displayed. 2.4 Realization The realization stage produces Java Speech Markup Language.
number 81 leaves from there at 11:37
Figure 4: Speech Markup The words of Figure 3 provide the main content. In JSML
marks sentence boundaries, tells the speech synthesizer that "81" should be pronounced "eighty-one" not "eight one". The JSML is passed to the FreeTTS speech synthesizer which produces the spoken response, in this case Number 81 leaves from there at 11:37. References K Jokinen and G Wilcock. 2001. Confidence-based adaptivity in response generation for a spoken dialogue system. SIGdial-2001, Aalborg, Denmark. G Wilcock. 2001. Pipelines, templates and transformations: XML for Natural Language Generation. 1st NLP and XML Workshop, Tokyo. G Wilcock. 2002. XML-based Natural Language Generation. XML Finland 2002, Helsinki. G Wilcock. 2003. Integrating Natural Language Generation with XML Web Technology. EACL-2003, Budapest.