Acoustic echo control in telecommunications terminals

Get Complete Project Material File(s) Now! »

Sound recording in telecommunications termninals

In this section, we present the problem due to acoustic echo and noise.

Acoustic echo

In a phone conversation, the voice signal is transmitted through a communication network to a device equipped with at least one loudspeaker and one microphone. The loudspeaker plays the sound from the far-end speaker to the near-end speaker while the microphone records the voice of the near-end speaker. The voice of the near-end speaker is then transmitted to far-end speaker. But in some cases, part of sound emitted by the loudspeaker propagates in the near-end environment and is coupled the microphone of the device. As a result, the far-end speaker does not only receive the voice of the near-end speaker but also receives a delayed version of his voice: this effect is referred to as acoustic echo. As illustrated in Figure 2.1, this coupling is composed of the direct path and of the reflected paths between the loudspeaker and the microphone.

Linear echo

The coupling between the transducers of the device, also referred to as the echo path can be modeled by a finite impulse response filter. The echo signal can then be written as d(n) = h(n) ⋆ x(n) (2.1) where h(n) represents the impulse of the echo path and x(n) represents the loudspeaker signal. Figure 2.2 shows an example of echo paths measured with a mock-up handsfree mobile phone in an office environment. The mock-up phone used consists simply of a plastic box equipped with a loudspeaker and two microphones. One can refer to Annex 5.A for details about the description of the design of the mock-up phone. The use of a mock-up phone instead of a real one permits to focus only on the acoustic interactions that occur in a device. The microphones are placed such that one of them is slightly closer to the loudspeaker than the other. We denote h1(n) and h2(n) the impulse response between the loudspeaker and the first and second microphones respectively.
We see from Figure 2.2 (a) the main delay (first peak) is not the same for each microphone. The echo path is composed of the direct path and indirect paths (reflections) between the loudspeaker and the microphone of interest. The main delay for each impulse response is related to the direct path (i.e. distance) between the loudspeaker and the microphone considered [Kuttruff, 2000]. The closer the microphone is from the loudspeaker, the shorter the direct path. In our case, the main delays are of 0.2ms and 0.4ms for the first and second microphone respectively. It is also of interest to note that the amplitude of this first peak is different for each microphone. This is due to the fact that the amplitude of a propagating acoustic wave is inversely proportional to the distance between its source and the point at which it is measured. In our case, the closer the microphone is from the microphone, the higher the amplitude of the main delay will be.
The peaks that follow the main one are due to the reflections of the sound from the loudspeaker in the surrounding environment. We can see from Figure 2.2 (a) that the reflections are different for each microphone. The microphones are placed at different position on the devices and do not pick up the same reflections at the same time. The sound from the loudspeaker propagates in all directions, creating an infinite number of waves. Reflections occur when the wave encounters an obstacle: part of the incident wave then continues to propagate in a different direction (that of the reflective wave) before being picked up by the microphone. The frequency responses of the measured impulse responses are showed in Figure 2.2 (b). We can see that the acoustic path impacts on the spectral components of the loudspeaker signal: all the frequency are not equally attenuated.

Mechanical coupling

The sound wave played by the loudspeaker actually results from the vibrations of the membrane of the loudspeaker, the vibrations themselves being generated by the electric wave received from the network. The microphone records sound by transforming acoustic waves into electric waves. In the case of mobile terminals, the loudspeaker and microphone are in the same enclosure. Part of the coupling between the transducer of the phone is due to the proximity between the terminal transducers.

Speech enhancement algorithms

We have explained how ambient noise and acoustic echo degrade speech quality in mobile terminals. Solutions to tackle these disturbances have been widely investigated in the literature. In this section, we present some state-of-the-art echo control and noise reduction algorithms. In Section 2.2.1, we present existing echo cancellation algorithms while in Section 2.2.2 we present noise reduction algorithms. Lastly, an example of speech enhancement is presented Section 2.2.3.

Table of contents :

List of figures
List of notations
List of abbreviations
1 Introduction
1.1 State-of-the-art approaches to speech enhancement
1.2 Contributions
1.2.1 Single-microphone echo cancellation
1.2.2 Dual-microphone echo cancellation
2 Acoustic echo control in telecommunications terminals
2.1 Sound recording in telecommunications termninals
2.1.1 Acoustic echo
2.1.2 Ambient noise
2.2 Speech enhancement algorithms
2.2.1 Echo processing
2.2.2 Noise reduction
2.2.3 Summary of speech enhancement algorithms
2.3 Assessment tools
2.3.1 Objective metrics
2.3.2 Subjective tests
2.4 Conclusions
I Single microphone echo processing
3 Frequency and subband domains related filtering methods
3.1 Short time Fourier transform
3.1.1 From STFT to filter bank structure
3.1.2 From STFT to overlap add
3.2 Filter bank related filtering methods
3.2.1 Subband domain weighting
3.2.2 Time domain filtering
3.3 Discrete Fourier transform related filtering methods
3.3.1 Circular convolution
3.3.2 Linear convolution
3.3.3 Link between circular and linear convolution
3.3.4 Alternative interpretation of the linear convolution
3.4 Comparative assessment of the different filtering methods
3.4.1 Experimental setup
3.4.2 Results
3.4.3 Synthesis
3.5 Conclusion
3.A Time domain aliasing in circular convolution
3.A.1 Proof of aliasing in circular convolution
3.A.2 Illustration of time domain aliasing in circular convolution
3.B Properties of the proposed interpolation function
4 Synchronized adaptive echo cancellation and echo postfiltering
4.1 System overview
4.1.1 Adaptive echo cancellation
4.1.2 Echo postfiltering
4.2 System control
4.2.1 Synchronization approach
4.2.2 Enhanced variable stepsize
4.2.3 Summary
4.3 Experiments
4.3.1 System setup
4.3.2 Convergence of the AEC
4.3.3 Assessment of the global echo control scheme
4.4 Conclusion
II Dual microphone echo processing
Introduction
5 Echo cancellation for dual channel terminals
5.1 Echo problem in dual channel terminals
5.1.1 Signal model
5.1.2 Handsfree scenario analysis with mock-up phone
5.1.3 Handset devices analysis with handset scenario
5.2 Proposed echo processing scheme
5.2.1 Adaptive echo cancellation
5.2.2 Echo postfiltering
5.2.3 Synthesis
5.3 Power level difference double-talk detector
5.3.1 Double-talk detector
5.3.2 Usage within proposed echo processing scheme
5.4 Power level difference based echo suppression gain rule
5.4.1 Gain rule
5.4.2 Relative transfer function estimation
5.4.3 Analysis of the proposed gain rule
5.4.4 Summary about the gain rule computation
5.5 Performances of the proposed double-talk detector and gain rule
5.5.1 Experimental setup
5.5.2 Influence of the threshold value on the DTD
5.5.3 Assessment of the PLD based gain rule
5.5.4 Performance with data from mock-up phone
5.5.5 Assessment with handset recording
5.5.6 Informal listening tests
5.6 Conclusions
5.A Recording setup
5.A.1 Description of mock-up phone
5.A.2 Signal recording setup
6 Dual microphone based echo postfilter
6.1 Residual echo power spectrum estimate for dual microphone devices
6.1.1 Echo power spectrum estimate
6.1.2 Analysis of the proposed DM PSD estimate behavior
6.2 Non-linear echo suppression
6.2.1 Problem description
6.2.2 Limitations of the proposed DM PSD estimate
6.2.3 Residual echo PSD estimate in presence of loudspeaker nonlinearities
6.2.4 Summary of non-linear echo suppression
6.3 Experiments
6.3.1 Setup
6.3.2 Linear echo processing
6.3.3 Non-linear echo suppression performance
6.4 Conclusion
7 Conclusion and perspectives
7.1 Single microphone echo control
7.2 Dual-microphone echo control
7.3 Perspectives
A annulation d’écho acoustique pour terminaux mobiles à un ou deux microphones
A.1 Introduction
A.2 La prise de son dans la téléphonie mobile
A.2.1 Qu’est-ce que l’écho acoustique?
A.2.2 Qu’est ce que le bruit ?
A.3 Solutions existantes
A.3.1 L’annulation d’écho acoustique
A.3.2 La réduction de bruit
A.3.3 Notre objectif
A.4 Annulation d’écho pour terminaux à un microphone
A.4.1 La réduction conjointe de bruit et d’écho
A.4.2 L’annulation d’écho par l’approche synchronisée
A.5 Annulation d’écho pour terminaux à deux microphones
A.5.1 Problématique
A.5.2 Solutions proposées
A.5.3 Observations et conclusions
Bibliography