Voice Quality Enhancement for Robots

Voice Quality Enhancement (VQE) is an important software component in robotic applications which have an audio-based human / machine interface. The VQE modules for robots include sound source localization, beamforming, acoustic echo cancellation, and audio ducking.

The high-level audio requirements of a robot are:

Capture and react to the acoustic environment
Playback audio to the user as feedback on the status of an action item
Allow the user to interrupt playback audio

For the first requirement, the robot will need to clearly capture the voice commands from the user. The robot may also need to physically turn towards the user. A microphone array captures the spatial information of acoustic scene. Sound source localization software processes this information and indicates to the robot the location of the user. Acoustic beamforming software filters the audio from this direction, improving the SNR.

Requirement 2 interferes with requirement 1, thus the need for requirement 3. First, the robot must not react to audio that is being that is being played for the user. Secondly, the robot must to be able to react to user even during audio playback. Acoustic Echo Cancellation (AEC) software is need to meet all of these requirements. In addition, it is common to implement an audio ducking module into the system. Once it has been determined that a user is trying to speak, the audio being played out the loudspeaker can be temporarily attenuated to help the robot hear the commands of the user.

Additional VQE modules that can further improve the audio quality of the robot include noise reduction, parametric equalization and dynamic range control.