Human Hearing Attributes for Localization of Sound Sources

Get Complete Project Material File(s) Now! »

Chapter 3. HRTF Filter Design and Implementation

The Head-Related Transfer Function (HRTF) represents the natural filtering of a sound source as it interacts with our bodies on its way to the inner ear. This normal listening process is shown in the left illustration of Figure 3.1. The right illustration of Figure 3.1 shows the restoring of the HRTFs to the listener with a head-mounted system. These systems have the listener immersed within the environment and relay on the acoustic information being relayed to them. The relayed information of the microphone array’s response does not contain the correct HRTFs required to spatial perceive the location of a sound source. In preparation for the restoring of the missing HRTFs with the mapping process of Chapter 4, this Chapter explores how the HRTFs are modeled and interpolated.
Figure 3.1) The natural filtering of a sound source by the HRTF on its way to the inner ear (Left). The restoring of the HRTFs to the listener with a head-mounted system (Right).

Introduction into HRTFs

The HRTFs describe how a given sound wave is transformed by the diffraction and reflection properties of the head, torso, and pinna (outer ear). This time invariant process is illustrated in Figure 3.1 and defined by:
The HRTFs directional dependences are set relative to the spherical head-related coordinate system, where the origin is located halfway between the entrances of the two ear canals. The elevation angle has been omitted from the HRTFs because the scope of this work only pertains to locating of a sound source azimuth direction. The elevation throughout this thesis will pertain to an elevation of unless otherwise stated.
Obtaining the HRTF can be determined using either a direct or an indirect approach. The direct approach is simply the actual measurement of the subject’s HRTF. The indirect approach uses some form of 3D imaging processing, which processes the image with finite element analysis of the head [15].
In (3.1), direct measurements of the HRTFs are obtained by noting the ear responses while playing a known broadband noise sound at specific points of azimuth and elevation. The measured input sound s( ) used in (3.1) varies depending on the approach. The most common approach is a free-field transfer function approach [16], [17]. With this approach, the HRTFs are obtain by relating a subject’s previous ear responses to the measured signal using the same.
This method is used to obtain the KEMAR HRTFs of [19], the HRTFs of the ‘Listen Group’ subjects [20] and those of Subject 3 in the listener test discussed in Chapter 5. The free-field transfer function assumes a free-field approach. This approach requires the measurement to be conducted in an anechoic environment and at a considerable distance (within the far-field region) from the sound source. Detailed schematics showing the procedure for obtaining the HRTFs of the ‘Listen Group’ are outlined in [20] and discussed in Appendix B for Subject 3.

Why HRTFs are used

The mapping process of Chapter 4 uses the HRTFs to map the microphone array’s response to match the characteristics of a human subject. HRTFs are used since they contain the ITD, ILD and the spatial modifications of the head, torso, and pinna. These spatial cues are needed in estimating the azimuth, elevation and range of a sound source.
The ITD is the main spatial cue used in locating the azimuth of a sound source at frequencies below 800 Hz. The ITD is the measured difference in the time it takes for an incoming signal to reach one ear verses the other, seen in the time domain of the HRTF in Figure 3.2 as the delay between the start of the impulse response of the individual left and right HRTF. Theoretically, the ITD can be found by approximating the head as a sphere, for an infinitely distant source the ITD is found by the Woodworth simple formula [21]:
where is the azimuth angle, is the radius of the sphere, and is the speed of sound. Woodworth’s formula (3.4) assumes that the wave-length is much smaller than the diameter of the scatter. This approximation proved a remarkably close solution to the exact solution.
The spherical solution of (3.4) does not take the elevation characteristics of the ITD into account. The ITD change with elevation is attributed to the non-spherical shape of the head and to the ears being displaced behind and below the center of the head [21]. The ellipsoidal model in [21] shows that when the ears are offset, the ITD model yields the correct ITD for both azimuth and elevation. It should be noted that the Extended Woodworth equation provides an additional scaling factor of for elevation.
The ILD is the main spatial cue in locating the azimuth of a sound source at frequencies above 1.6 kHz. The ILD is the attenuation of frequency caused by the diffraction of sound around a person’s head. The ILD can be seen in both the time domain and frequency domain of the HRTF in Figure 3.2 as the difference in level between the individual left and right HRTF.
There are two common methods that attempt to theoretically match the ILD to a human subject. These methods are the spherical head model [22] and the ‘snowman’ model [13], which adds a torso to the spherical head model. Both of these models can explain the major features of a pinna-less system.
The spatial modifications of the pinna are seen in the frequency domain of the HRTF in Figure 3.2 as peeks and dips. Since these spatial modifications are substantially more complex than those of the ILD and ITD, a good theoretical model has yet to be found. This, coupled with the fact that a pinna-less system yields poor performance in the spatial perception of the listener [23], the filters developed to give the microphone array the missing spatial cues will be discussed in terms of measured HRTFs instead of using a mathematical model.
Figure 3.2) HRTF of the KEMAR manikin [19] in the time domain and frequency domain for a sound source originating at 45° Left and 20° above the manikin with the ITD and ILD emphasized.

Introduction into HRTF Filter Design

There is currently no closed form solution that compares to the measured HRTFs of the listener. For this reason, a large dataset of measured HRTFs dominates the addition or returning of the spatial cues to a given sound. The filters based on the measured HRTF are either the HRTFs themselves or a simplification of the HRTFs. The simplification of the subject’s HRTF can be placed into two categories: an interpolation of the HRTFs and a generalization of the HRTFs. Both the generalization process and the interpolation process are seen in regards to reducing the large computational load that the HRTFs place on a system’s memory. This thesis will not discuss this process in terms of lowering the complexity of the HRTFs, but instead in the context of adding a dynamic process that incorporates head movement.

Generalization of the HRTFs

There are numerous ways to generalize the HRTFs. This thesis focuses on the minimum phase generalization of the HRTFs. The minimum phase generalization of the HRTF comes from the realization that the excess phase resulting from subtracting the original phase response from the minimum phase counterpart has been found to be approximately linear [11]. The HRTF can then be modeled with a minimum phase filter along with a pure delay. The realization of minimum phase HRTFs has shown that they present no perceptual consequences [24].
The minimum phase generalization of the HRTF, starts with a decomposition of the HRTF into three separate parts: a minimum phase counterpart function , an all-pass function , and a linear phase or pure delay :
where is the original HRTF [25]. The minimum phase counterpart function, determined by the log magnitude spectrum of through the Hilbert transform.
The Hilbert transform relates the real and imaginary parts of a complex signal. It was developed to solve a special case of Hilbert problems. The Hilbert transformation is used since the minimum phase approximate properties of the log magnitude spectrum and phase spectrum are Hilbert transform of each other [26], [27].
The all-pass function, defined as  the excess phase component. The minimum phase generalizations assume that the excess phase component does 20 not have a bearing on the spatial awareness of the listener and hence can be neglected [28]. Then the minimum phase HRTF is given as:
The minimum phase HRTF does not reproduce all of the phase components of the HRTF, thus the higher frequency phase components are lost. The minimum phase HRTFs presenting no perceptual consequences would indicate that the loss of the higher frequency phase components are not of great perceptual importance [24] [29]. Having no perceptual consequences may be contributed to the accuracy of the low frequency phase component of the HRTF being accurately represented by the minimum-phase HRTF [29]. The phase components of the original HRTF and the minimum phase HRTF is shown in Figure 3.3 for low frequencies and Figure 3.4 for the high frequencies to illustrat Figure 3.4) The phase components of the original HRTF and the minimum phase HRTF at relatively high frequencies

Interpolation of HRTF Datasets

The measured HRTFs are typically measured at specific increments of azimuth and elevation points on a sphere around a subject. The resulting spatial dataset may lack the desired spatial location, requiring an interpolation of the HRTF dataset. These interpolation methods also lend themselves to a dynamic process, where head movements have been applied to the HRTFs to provide a realistic feel of moving sound with no noticeable discontinuity.

Chapter 1. Introduction
1.1 Thesis Motivation
1.2 Problem Investigation
1.3 Thesis Objectives and Overview
Chapter 2. Background
2.1 Microphone Array Fundamentals
2.2 Human Hearing Attributes for Localization of Sound Sources
Chapter 3. HRTF Filter Design and Implementation 
3.1 Introduction into HRTFs
3.2 Introduction into HRTF Filter Design
Chapter 4. HRTF Matching
4.1 Classic Approach of Beamforming
4.2 HRTF Matching
Chapter 5. Listener Test of Binaural HRTF Matching 
5.1 First Listener Test of the Binaural HRTF Matching with Fixed Head Position
5.2 Second Listener Test of the Binaural HRTF Matching with Head Movement
5.3 Third Listener Test: Validation of Front/Back Confusion
5.4 Additional comment on the Second Listener Test
Chapter 6. Discussion of Process Limitations and Future Work.
6.1 Limitation of preprocessing control algorithm
6.2 Conclusion of Limitation
6.3 Future Work
Chapter 7. Conclusion
Binaural Hearing Effects of Mapping Microphone Array’s Responses to a Listener’s Head-Related Transfer Functions

Related Posts