Department of General Linguistics
October 2000
| Architecture | Characteristics | Example | |
| FSA | transition networks | menu driven | CSLUrp |
| CF | recursive transition networks | subdialogues | Nuance |
| ATN/UG | augmented transition networks | information states | Philips, Trindi |
| AI | planning, inference, optimisation | dialogue games | SRI Autoroute, Trains |
| SM | statistics, machine learning | best practice | ATT, Verbmobil |
A related classification is in terms of bottom up (data or goal driven) and top down (script or plan-driven) architectures. In a bottom-up system, the system uses its partner as a source of data to fill in a form. In a top-down system, the system follows a script or plan for the dialogue. The difference is one of degree: a bottom up system has a loose script and narrow success conditions, a top down system has a narrow script and loose success conditions. Bottom up (client talks, system reacts) and top down (system talks, client reacts) strategies have different applications, the strategy can vary with client and listening conditions.
The dialogue models described in http://cslu.cse.ogi.edu/HLTsurvey/ch6node5.html: dialogue grammars on the one hand, and plan-based models or joint action theories of dialogue on the other hand, appear to exemplify the FSA/CF, and AI architectures, respectively. While the first two seem too rigid, the drawback of the third approach is that plan-recognition and planning is combinatorially intractable in the worst case, and in some cases, are undecidable.
The TRINDI system appears to exemplify an intermediate type, where the dialogue is driven by the current and preferred information states of the participants. There is scant work to date on applying soft computing methods to dialogue management.
The studies on human-machine dialogue have historically followed two main theoretical guidelines traced by research on human-human dialogue. Discourse analysis, developed from studies on speech acts [Sea76], views dialogue as a rational cooperation and assumes that the speakers' utterances be well-formed sentences. Conversational analysis, on the other hand, studies dialogue as a social interaction in which phenomena such as disfluencies, abrupt shift of focus, etc., have to be considered [Lev83]. Both theories have contributed to the design of human-machine dialogue systems; in practice, freedom of design has to be constrained so as to find an adequate match with the other technologies the system rests on. For example, dialogue strategies for speech systems should recover from word recognition errors.
More generally, we may want to frame the entire circuit from speech recognition to speech synthesis in stochastic or quantifiable terms, allowing choice between symbolic and soft computing methods anywhere in the circuit.
In this section, the TRINDI checklist is organised along a model of idealised dialogue (Carlson 1983) to estimate what it takes in each case to achieve given behavior. The requirements are divided between speech technology (ST), language technology (LT), database design (DB), dialogue management (DM), and soft computing (SC).
Q2. Is utterance interpretation sensitive to deictic context?
Example: now = system date
ST: recognition of indexical words
LT: recognition of indexical phrases, reduction rules to basic indexicals here, now, I, you, this
DM: Access to and conversion to/from system date/other indexical features (location, players)
Q12. Can the system recognise noisy input?
Example: (traffic noise in background)
ST: assessment of speech recognition success
DM: change of strategy, e.g. from bottom up to top down strategy
Q19. Is it possible to get connected to a human operator?
Q20. Does the system explicitly make it clear that it is not a human?
Example: the bus = last mentioned bus (bus 10), other buses = not bus 10
ST: recognition of relevant words
LT: recognition of relevant phrases, interpretation rules
DM: Access to move history
Q6. Can the system deal with ambiguous designators?
Example: the downtown bus = bus to/from downtown
LT: recognition and representation of ambiguity
DM: resolution of ambiguity, possibly backtracking?
Q7. Can the system deal with negatively specified information?
Example: Not before Sunday = on or after Sunday
ST: recognition of negation words
LT: inference rules from negative to positive information
DB: negation in query language
Q9. Can the system deal with inconsistent information?
Example: today is Monday, tomorrow is Sunday
LT: interpretation rules check consistency
DM: Ask for correction, revise form accordingly
Q10. Can the system deal with belief revision?
Example: I want to go in the morning ... no, in the afternoon
DM: Acknowledge change, revise form
Q21. Is the domain adequately covered, i.e. are all aspects of the domain that the user might want information about covered?
LT: grammar of recognised fragment
SC: recognition of other domain related talk
Q23. Can the system keep track of several entities of the same type at the same time?
DM: representation of alternative threads of dialogue, backtracking
DM: representation of subdialogue hierarchy
SC: recognition of topic and topic shift
Q15. -concerning system functions and abilities?
SC: recognition of metadialogue
Q16. Is it possible to get a system tutorial concerning the system constraints?
DM: include help along with dialogue (type "please answer by number only")
Q22. Can more than one type of information be obtained from the system?
LT: form driven IE from answer
DM: bottom up dialogue strategy
Q4. -Different information-?
LT: form driven IE from answer
DM: bottom up dialogue strategy
SC: topic recognition as fallback
Q5. -Less information-?
LT: form driven IE from answer
DM: form driven dialogue planning
Q11. Can the system deal with no answer to a question at all?
DM: top-down dialogue strategy afer timeout (system takes initiative)
Q13. Does the system give different feedback depending on the quality of recognised speech?
DM: top-down dialogue strategy at bad recognition levels
SC: topic tracking on basis of recognition rate
Q17. Can the system repeat an utterance on request?
DM: bottom-up dialogue strategy (system listens for user acknowledgement or requests for repetition)
Example: When do you want to go? - I want to go to B. - Where do you want to go?
DM: Bottom up dialogue strategy
SC: Topic tracking
Q18. Can the system reformulate an utterance on request?
LT: Nondeterministic generation
In this paper, it is suggested that the project should pay attention to the underlined desiderata. How, will be discussed in the next section.
| Speech recognition | TaY/CS (speech recognizer), HY/ling (language model) TAIK, HUT (soft computing) |
| NL parser | HY/ling (off shelf tagging/parsing + ad hoc IE) |
| Database | TaY/CS (off shelf) |
| NL generator | HY (off shelf + ad hoc) |
| Speech synthesis | HY (speech project?) |
| Dialogue manager | TaY/CS, HY (symbolic processing), TAIK, HUT (soft computing) |
Table 1
The Jaspis development architecture has been chosen as a common development platform. http://www.cs.uta.fi/hci/SUI/SUI2000/materiaali/jaspis.pdf.
http://www.itl.nist.gov/speech/tests/
http://www.itl.nist.gov/iaui/894.01/publications/darpa99/index.htm
at(Bus,Stop,Time)
saying that a given bus (possibly identified as bus(Line,Departure)is scheduled to be at stop Stop at time Time. The facts can be more limited (e.g. only terminal stops are scheduled) or indirect (e.g. only average time between departures is given), but in either case they can be reduced to the above.
Arbitrary routes can be built up by sequencing such facts. In this project, route scheduling will not be addressed, which means that we restrict attention maximally to comparisons of individual pairs of facts.
Here is a list of entities, queries and facts that can be expressed
(in fake SQL):
| the bus stops at Hervanta | at(Bus,hervanta,T) |
| the bus goes via Hervanta | at(Bus,hervanta,T) |
| end station of the bus | select stop from at where time = (select max(time) from at where bus = Bus) |
| the bus goes from station to Hervanta | at(Bus,T1,station) and at(Bus,T2,hervanta)and T1<T2 |
| what time is it? | now |
| where are we? | here |
| which bus is this? | this |
| the next bus | select bus from at where time = (select min(time) from at where stop = here and time > now) |
| the next stop | select stop from at where bus = this and time =(select min(time) from at where bus = this and time > now) |
| how many buses? | select count(bus) from at |
Table 2
Query := Subject, Predicate, Source?, Goal?, Loc? LocTime? TimeFrame? Duration? AbsFreq?
Subject := Attr Bus Number
Attr := Place GEN | numeron Number | seuraava, viimeinen, kutosen, monesko, mikä, montako
Predicate := NEG Verb kO?
Verb := on, seisoo, odottaa, lähtee, tulee, pysähtyy, menee, ajaa, kulkee, liikennöi, …
Bus := bussi, linja-auto, auto, vuoro, nysse…
Number := 1,2,…
Source := Place (ELA|ABL)
Goal := Place (ILL|ALL)
Loc := Place (INE|ADE)| Place GEN kautta
Place := asema, yliopisto, Hervanta, Amur, Pispala, Mannerheiminkatu, …
LocTime := DateTime (INE|ADE|ESS) | ennen DateTime PTV | DateTime GEN jälkeen | Number TimeUnits GEN kuluttua | arkipäivisin, pyhinä, aattona, milloin, kuinka pian, miten pian …
DateTime = ClockTime|TimeofDay|DayofWeek|Date…
TimeFrame = Number TimeUnits INE | miten nopeasti, nopeimmin, kuinka pian, missä ajassa, mihin mennessä
Duration := Number TimeUnits | kuinka kauan, miten pitkään
AbsFreq := Number kertaa | Number STI| monestiko
RelFreq := AbsFreq TimeFrame | Number TimeUnits GEN välein | miten usein, miten tiheästi
ehtiikö, pääseekö bussilla B -> Meneekö B
on tulossa menossa, matkalla -> tulee, menee nyt
haluaisin/pitäisi päästä -> meneekö bussia, milloin menee bussi, mikä bussi menee
kauanko joutuu odottamaan bussia B -> milloin bussi B tulee
ajaa A:n ja B:n väliä -> ajaa A:sta B:hen
mikä on nopein keino päästä B:hen?->mikä bussi on ensinnä B:ssä?
Ei, vaan keskustasta.
Mitä muita busseja sinne menee?
Entä Hervannasta?
Seuraava bussi Hervantaan kiitos.
In fact the parsing or IE problem itself, i.e. mapping of recognised bits of speech on information states, can also in its entirety be framed as a soft computing problem to find out how given patterns of recognised inputs affect the probabilities of given information states.
Mitä kautta/reittiä bussiB kulkee? Mikä on nopein yhteys Hervantaan?
Onko linjalla matalalattiabussia? Mistä ovesta mahtuvat lastenvaunut?
Paljonko maksaa? Mistä lippuja saa ostaa? Onko alennuksia? Saako bussissa maksaa? Minkä yhtiön bussi B on?
Mistä (miltä laiturilta) lähtee seuraava bussi Hervantaan?
Missä on taksiasema (neuvonta, vessa, kioski, ruokakauppa...)?
Talar ni svenska? Do you speak English? Govorite po-russki?
Mitäs tästä voi kysyä? No mitä mun nyt pitää sanoa? Maksaaks tää jotain?
Mitä? Miten se oli? Anteeksi, ei kuulu. Toistakko vielä.
Haloo? Kuuluuko? Onko siellä ketään? Tää meni ihan mykäks. Toimiiks tää?
No niin, koetetaas uuelleen. Ainiin, mä en painanu tosta.
Hei, älä mee, oota mä koklaan tätä. Hehe. (Älä viitti. Tuu jo!)
Voi voi, eihän tästä mitään tule. Eikös tähän ole minkäänlaista käyttöohjetta?
Vitsi tää ohjelma on syvältä. Turpa kiinni. Arvaa. (kirosanoja)
Mulla ois nyt seuraava ongelma. Sitten vielä toinen kysymys.
Hetkinen. Tota noin, ootas vähän. Annas olla.
Kiitos, näkemiin.
Apua! Poliisi!
At the same time, the system must be on the lookout for possible topic shifts in the answer, i.e. if the recognition rate is markedly low relative to the current vocabulary, a fallback strategy to search for possible topic shift is applied, with a corresponding change in recognition vocabulary and dialogue plan.
Thus one task is to design a feedback circuit from speech recognition success to dialogue manager's dialogue strategy choice.
Thus another task is to design a feedback circuit from the database manager to the dialogue manager's questioning strategy choice.
More generally, information state characterised as a choice from a set of partially filled query forms can be considered as a random variable, a probability distribution over a set of such states.
It is an open question whether the system needs to maintain a separate representation of the client's information state.
For instance, ellipsis and anaphora recognition requires access to both the syntax of a previous move and its semantics (in the form filling paradigm):
Q: Mitä muita busseja sinne menee?
Q: Entä takaisinpäin?
A: Hervantaan lähtee maanantaina 31.2. kaksisataa vuoroa. Mihin aikaan päivästä haluat matkustaa?
Q: Tarkoitin huomenaamuna.
A: Hervantaan lähtee huomenna sunnuntaina 30.2. kello 6 ja 9 välillä 10 vuoroa 15 minuutin välein. Haluatko tarkempaa tietoa?
A: Hervantaan lähtee maanantaina 31.2. kaksisataa vuoroa. Mihin aikaan päivästä haluat matkustaa?
Q: Aamulla.
A: Kello 6 ja 9 välillä lähtee Hervantaan 20 vuoroa 5 minuutin välein. Tarkennanko?
A: Paikallisvuoro 10 lähtee Hervantaan nyt ja on perillä tunnin kuluttua. Pikavuoro 10x Hervantaan lähtee 5 minuutin kuluttua ja on perillä 20 minuutin kuluttua.
Q: Mikä bussi menee nopeimmin Hervantaan?
revealing a supposed plan to get to Hervanta soonest possible. It is best to leave such plan recognition implicit, as it is in natural human conversation.