Emerging Multidisciplinary Educational Issues in the Area of Spoken Dialogue Communication

  Michael F McTear  


  Georgios Kouroupetroglou 

University of Ulster,  School of Information and Software Engineering, Shore Road, Newtownabbey BT37 0QB Northern Ireland mf.mctear@ulst.ac.uk 


University of Athens, Department of Informatics, Panepistimioupolis, Ilisia,  GR-15784 Athens, Greece koupe@di.uoa.gr 

Spoken language interaction with and through computers has become a practical possibility scientifically and commercially as a result of recent research and technological developments. There are two types of system that support this type of interaction.Systems that enable humans to interact in spoken natural language with a computer application, such as a database or an interactive tutorial, in order to retrieve information or obtain tutorial support, are known as Spoken Dialogue Systems (SDS). Systems that support oral communication between users of different natural languages or between speech-impaired and non-disabled persons are usually described under the term Mediated Interpersonal Communication (MIC). Both types of system require for their successful operation an integration of the various components of spoken language technology, including, among others, speech recognition, natural language processing, dialogue modelling, and speech synthesis.

The objective of this paper is to focus on the multidisciplinary aspects of spoken language interaction with and through computers, and, in particular, to examine the extent to which humanities is involved in current speech technology educational and research programmes in Europe. One view that has emerged from recent surveys of Spoken Language Engineering (SLE) teaching and professional practice across Europe is that, although disciplines from humanities and informatics have common interests and co-operate in research and development in several overlapping sub-areas of spoken dialogue communication, this is not yet reflected in academic education [1,2]. By examining some of the methods used for the development of SDS and MIC and the nature of these systems, this paper will focus on and discuss emerging multidisciplinary academic issues involving informatics and humanities. In particular, the following issues will be considered that lie at the interface between these disciplines:

-methods for developing spoken dialogue systems;
-software to support speech technology education;
-studies of the analysis of interpersonal communication.

The speech technology components of spoken dialogue systems are usually developed by engineers specialising in speechrecognition and speech synthesis. However, to develop a complete system it is necessary to draw on findings from a large number of disciplines including Human-Computer Interaction, psychology, psycholinguistics, special education, conversational analysis, sociology and ethno methodology. A curriculum for spoken dialogue systems needs to include the study of various methods in common use for establishing system requirements, including: literature research; interviews with users to elicit the information required to construct the domain and task models; field-study observations or recordings of humans performing the tasks; field experiments, in which some parameters of the task are simulated, full-scale simulations, and rapid prototyping; analysis of corpora of human-human and human-computer dialogues. This analysis provides information about the structure of the dialogues - for example, whether task-oriented dialogues consist mainly of a single topic - and about the range of vocabulary and language structures involved. A multidisciplinary perspective allows students from informatics and humanities to appreciate each others' perspectives. For example: although it is desirable that spoken dialogue systems should incorporate features of human-human interaction, current technology is restricted by limited speech recognition capabilities, limited vocabulary and grammatical coverage, and limited ability to tolerate and recover from error, as will be shown in an example comparing a human-human dialogue for extracting information about train times with a human-computer dialogue for the same task.

Building a spoken dialogue system is an important educational experience that can be shared by students from different disciplines, thanks to toolkits such as the CSLU toolkit, which was developed by the Center for Spoken Language Understanding at Oregon Graduate Institute of Science and Technology to support research and the development of applications in spoken language systems [3]. The toolkit can be used in a number of different ways. For those students who have some knowledge of dialogue structure, the toolkit can be used to create simple state-based dialogues as well as sub-dialogues and repair dialogues. Speech recognition is automatically provided by entering the words to be recognised in a list. These words are automatically rendered in Worldbet labels. However, as the pronunciations are based on dictionaries created in the USA, these pronunciations may differ considerably from those of speakers of English from other parts of the world. The student can create their own pronunciation models of the words to be recognised by simply altering the Worldbet labels. More advanced students can use visualisation tools provided with the toolkit such as the Speech Viewer, which is a program for displaying, playing, recording, and manipulating speech waveforms, as well as viewing spectrograms. The paper will report on some experiences in using the toolkit to enable non-informatics students to gain an understanding of the nature of spoken dialogue technology.

Interpersonal communication is a complex but essential element of social behaviour involving two or more individuals. It can be characterised in different ways: face-to-face, distant or remote, real time, asynchronous, using natural, symbolic or iconic languages. A classification of interpersonal communication is attempted with respect to several variables, such as the direction of communication, the signalling system(s) used and the channels through which these signals are transmitted. An important part of the population has difficulties in interpersonal communication due to: alalia, dysarthria, aphasia, aglosia, deafness, hearing impaired, mental retardation, etc. For those people improvement of interpersonal communication by MIC means a better quality of life, better prospects for employment, and social inclusion. Furthermore, such solutions are part of the universal service and the universal accessibility principles of the information society. Findings from the theory of conversation and the much more important theory of speech acts are used in an account of the peculiarities of mediated communication. The paper will illustrate how a multidisciplinary approach to interpersonal communication is essential for the development of systems that support Mediated Interpersonal Communication and particularly how this work can benefit from a contribution by students and researchers in Humanities.

The research reported in this abstract has been carried out with support from grant 25409-CP-2-97-1-NL-ERASMUS-ETN from the European Commission through the SOCRATES/ERASMUS programme for Thematic Networks.

[1] P. Green, C. Espain, G. Bloothooft, G Chollet, V. Dermatas, A. Drygajlo, J Hass, G. Kubin, J-P. Martens, P. McKevitt, M. McTear, G. Meyer, A. Peinado, V. Sanches, M. Tatham, "Spoken Language Engineering", in G. Bloothooft et. al., editors, "The Landscape of Future Education", vol 1, Utrecht:: OTS Publications, 1997.
[2] C. Espain, G. Bloothooft, A. Bonafonte, K. Fellbaum, J Hass, R. Hoffman, G. Kouroupetroglou, P. McKevitt, M. McTear, V. Sanches, K. Sgarbas, "Spoken Language Engineering", in Bloothooft, G., et al, editors, "The Landscape of Future Education in Speech Communication Sciences, 2 Proposals", Utrecht: OTS Publications, 1998.
[3] http://www.cse.ogi.edu/CSLU/toolkit/