Natural Language Generation (NLG) of discourse cues for different reading levels


The aim is to investigate how to design and build an NLG system that can tailor its output to the reading level of the reader. The NLG system,
GIRL (Generator for Individual Reading Levels), is being evaluated with real users of known reading levels. That is, we vary the ways in which the system generates discourse relations and investigate whether the output documents are then easier or harder to read than before.

The problem

This figure shows three short paragraphs A, B, and C. Notice that all three contain the same three pieces of information. A has each piece in separate sentences. B has everything in one sentence with commas between and so does C. The three pieces of information are linked by two discourse relations, as shown in the tree structures beneath the paragraphs. Discourse cues (marked in green) tell the reader what the relations are.
A relation joins two pieces of information, each playing a semantic role in the relation. Semantic roles are shown marked in red (e.g. 'cause' and 'result'). Note that the order of 'but' and 'though' semantic roles is 'though:but' in A and B, but is reversed to 'but:though' in C. In A and B, the cue phrases are short and are placed in front of 'but' and 'result' semantic roles. However, in C, they are longer and are placed in front of 'though' and 'cause' roles. So, this is the problem in a nutshell. What are all the choices for generating cue phrases? Which of these choices are easier and harder to read and comprehend? How can we implement the choices in an NLG system? How can we implement a mechanism for choosing between them based on reading level? Will the resulting system produce text appropriate for different reading levels?

Summary of project activities (updated June 2002)

First I carried out literature surveys.

Next, I analysed a small parallel corpus of documents (one version of each was the original, and the other version had been rewritten by an expert to make it more readable). This analysis showed differences in discourse structures between the readable and less readable documents. However, from such a tiny study we cannot generalise from the findings.

Another analysis looked at a small corpus of reports written by basic skills tutors for adult students who had just completed a literacy test. Again, the corpus was too small to generalise from the findings.
Evaluation Architecture Next, I designed and implemented a web-based architecture as shown in this figure. This was used for evaluating the NLG system with real users. This implementation uses a Jakata Tomcat server. The server modules are all Java servlets and packages. The client-based modules in the web browser are small JavaScripts. The application:
  • registers a user,
  • administers a skill-based literacy test,
  • generates an on-the-fly user-tailored report on the results of the literacy test,
  • measures the reading time for this report,
  • and finally administers some comprehension questions about the user's report.
After each interaction with the user, the application updates tables in a relational database using the SQL database query language. This happens after registration, after each literacy test question, after reading the generated report and after each comprehension question. Thus, the system is robust and can handle user exits or client machine crashes during any part of the evaluation (apart from during timed reading!). Here is a sample feedback report generated by that version of the NLG system. Next, I used the above architecture in a pilot experiment to test an initial hypothesis: Do discourse cues improve readability? That is, do they decrease reading times and increase comprehension? I reported on this experiment at CLUK2002, see publications.
Knowledge structure for discourse relations Next, I implemented a new version of GIRL with an object-oriented knowledge representation for discourse relations, as shown in this figure. Each relation contains data representing the type of relation, a cue phrase (or phrases), a model for generating the relation and pointers to the objects holding the semantic roles (these can be other relations, or messages). Messages are simple strings in the present implementation, but could also hold data for building syntactic and lexical structures.
The cue phrases and discourse relation generation models, shown as undefined in the figure, are chosen by a Discourse Relation Planner module. Next, I investigated more closely the choices involved in generating discourse cues: their placement (e.g. 1st or 2nd discourse segment), their positioning (before, mid-, or after the segment) and the punctuation occurring between segments. I carried out a corpus analysis using the British National Corpus (BNC2), analysing 12 discourse cue phrases (100 examples of each). When the RST Discourse Treebank, a corpus of Washington Post articles marked up with Rhetorical Structure Relations, became available in 2002, I analysed the six rhetorical relations that were most used by tutors in our small corpus of tutor-written feedback reports.

Using the results of the corpora analyses in the new version of GIRL, I began further pilot experiments with adult literacy students. In October and November 2002, I will be conducting trials at Moray College, South Shields College, Southampton City College and Banff and Buchan College. This time, instead of timing reading automatically when the reader clicks on a button to say he/she has finished, I will be making digital recordings of readers reading aloud. These recordings are being manually analysed and annotated with word start and end times and reading errors. To do this I am using the SpeechViewer application from Oregon Graduate Institute's CSLU Toolkit. I am grateful to the Olson Reading Lab, University of Colorado, for allowing me access to their manual on coding oral reading errors.

Why am I doing this?

At present, NLG systems do not have the ability to tailor documents for different reading levels. We do not know enough about how to write documents for different reading levels. In fact we suspect that very few writers are skilled enough to do this. Neither do we yet know technically how this can be accomplished by a computer system. It would be very useful for NLG systems to have this facility. The 1999 Moser Report found that: 'Some 7 million adults in England - one in five adults - if given the alphabetical index to the Yellow Pages, cannot locate the page reference for plumbers.'. Easier reading, even for highly skilled readers, promotes better comprehension and saves time. Generating documents that are easier to read is not a frivolous aim. Imagine the consequences if safety information is difficult to read and understand. Most people have to read and process a huge number of informative texts every day and everyone would benefit if these were easier to read.


My PhD work is supported by the Engineering and Physical Sciences Research Council (EPSRC).
Page by Sandra Williams. Please email any comments or queries to: