Our goal at CSLR is to create the next generation of conversational systems. We believe these systems will revolutionize society by enabling people to interact with machines using natural communication skills. Ordinary people will be able to access information through conversational interactions using telephones and other accessible and inexpensive devices.Conversational Systems are interactive and domain specific. They are built to interact conversationally with users about a specific domain and may serve as intelligent interfaces to applications or databases. Our goal is to produce the next generation of conversational capability. Such systems must collaborate with users to achieve their goals.
Our approach is to conduct spoken language research in the context of working systems. It is necessary to build working systems in order to evaluate the performance of new technologies in real-world applications, and to collect speech data in these tasks. Our plan is to deploy conversational systems for on-line information access over the public telephone network. These systems will use the CSLU Toolkit, which supports telephony applications. In addition to incorporating research advances into the toolkit, authoring tools will be developed to enable other researchers and developers to design and deploy conversational systems, and to share research advances and applications. This work is supported by grants from the National Science Foundation, the Office of Naval Research, and DARPA.
The development conversational system is a grand challenge, requiring advances in all areas of human language technology. In order to be useful, they must be robust, work in a variety of acoustic environments, adapt quickly to changing conditions, and degrade gracefully under adverse conditions. The systems must deal with a wide range of speakers with different accents, dialects and idiosyncratic behaviors.
Our work is focusing on a set of key research challenges, and the testing of technology advances. We are studying several research topics that are common to both types of systems:
- Pronunciation Variability - The phone sequences produced in fluent speech are often quite different from the sequences produced by concatenating canonical pronunciations from dictionaries. Word pronunciations change as a function of linguistic context, accent and dialect. Modeling pronunciation variation can reduce a major source of errors in systems recognizing spontaneous speech.
- Confidence Measures - Conversational systems, like human perceivers, must estimate the confidence of what was recognized. Estimating recognition confidence is necessary for rejecting extraneous speech and for determining the appropriate action to be taken by the system (e.g., accessing information or engaging in dialogue repair).
- Rapid Adaptation to speakers and channels - Because conversational systems will be used by many different people, they must adapt quickly to different speakers and acoustic environments. There are too many variations caused by speaker and channel differences to rely on pre-computed models for each task.
- Optimizing information from many different sources - There are many possible ways of extracting information from spoken input. Many features can be extracted at many different levels of granularity-from acoustic features to semantic features. We are experimenting with architectures to optimally combine heterogeneous features to improve speech understanding.
- Robustness to spontaneous speech - While there has been some progress in recent years coping with the many disfluencies and other phenomena that characterize spontaneous speech, much research remains to be done.
Meeting gisting is a project underway at CSLR in collaboration with Raj Reddy at Carnegie Mellon University. The goal of this work is to monitor participants? speech during a meeting and produce a summary of the meeting. This is a challenging project because of the many errors produced during automatic transcription of speech, because of the need to detect topic changes and irrelevant material, and because of the difficult in summarizing information that is represented across conversations.
CSLR has been awarded a grant from the National Science Foundation to develop accessible language resources to stimulate and support research and education in language technologies. This work is being conducted in collaboration with researchers and developers at CSLU OGI. The work aims to extend the capabilities of the CSLU Toolkit to support research and development of conversational systems, and to introduce interactive language technologies to researchers and students in educational environments. As well, the work is intended to make language resources generally available to interested users through the Internet, and to support sharing of technology and applications. Researchers at CSLR are now working with faculty and students at the University of Colorado, Boulder to introduce spoken language systems into laboratories and classrooms throughout the campus.