The Video shown here has been produced by Dr Tom Moir of Massey University in New Zealand. The Massey Speech Project is an ongoing reseach endeavour into the use of speech recognition in realistic noisy environments such as factories, open offices and homes. The speech recognition technology used here is Microsoft SAPI5.3 (a significant improvement on the earlier windows XP SAPI5.1 version).
The Avatar used in this video demonstration is a Beta version of Denise - a creation from Guile 3D a world leader in Virtual Human artwork and associated technologies. The Avatar is implemented as a Microsoft Agent on VISTA. The Avatar is linked in this context with Speech Synthesis and AIML (a natural language, case based reasoning engine from the ALICE Foundation). The Video demonstrates the speech recognition of short sentences against the background music coupled with the ability of the Virtual Human to translate the commands into actions such as retrieving information from the internet.
If we look at EVIL Limited's Speech Recognition Landscape which is shown below, we can position the capability being demonstrated here by Denise as being in the top left quadrant near point A.
Dr. Moir has also demonstrated the ability for a Virtual Human like Denise to respond to speech commands to activate and deactivate electrical devices such as house lights. It is interesting to consider the potential for integrating her in some way to sensor devices so that Denise could for example determine if it was getting dark or cold and to pro-actively suggest via voice synthesis that she turn on the lighting and or heating!
There are a number of related research areas shown by the different quadrants of the landscape. For example speech recognition engines are also used in more conversational modes involving longer and more complicated sentences. In these conversational modes the speech engine often requires more training and has the challenge of detecting when the user is posing a question as opposed to perhaps just expressing a point of view or providing some information.
Some research in Speech Recognition is also looking at the detection of emotion in speech, by detecting changes in tones and word patterns, for example detecting laughter. Integration of the speech recognition engines with AIML provides a whole new world of possiblities. The natural language processing and case based reasoning capabilities within AIML can be built on to provide quite sophisticated applications. Activating internet searches, or elctronic devices not just in response to direct speech commands but also anticipatng what might be required.
For example if the speech recognition engine detects anger and passes that as an indicator to the AIML engine the latter could be designed to respond with a placative conversation strategy. If the AIML engine has built up a personalised profile of the individual talking to it then it could even be designed to turn on that indivuals favourite music as part of its placative strategy. Of course this could have the opposite effect depending on the indiviual and their emotional state!!
The demonstration video looks at Microsoft Agents deployed on a local computer, however developments in Voice over IP (VOIP) mean that the deployment of web based Virtual Human Speech Recognition is also an interesting area for research.
There are indeed many exciting ways in which this research into integrating Speech recognition with AIML engines can enhance Virtual Human capabilities.
We would like to thank Dr. Tom Moir for his permission to include this work in our Show Case.
You can find out more about this work and the associated software by visting the following site: