Statistical speech recognition – Project topics materials

Get Complete Project Material File(s) Now! »

Applications of speech recognition

For multi-modal communications in cognitive neuroscience robotics, speech recognition is one of the most important and effective means of natural communication between humans and robots [84, 148]. The areas of application of speech recognition technology include, but not limited to, the following.
• Controlling phones, electronic devices, lighting, etc. for smart homes [185].
• Machine control for industrial usage [172].
• Dictation, organisation of documents and database queries for office usage [95].
• Stock jobbing and bank assignments in banking systems [75].
• Weather forecast, event notes, time table informations and booking for public transport
and information services [123].
• Surgeon aids and diagnoses systems for medical services [191].
• Healthcare systems for elderly and disabled people [62].
Additionally, there are several speech recognition systems developed for projects from industry, institutes, and universities. One of the large research projects is Smartkom [152], in which the spoken dialogue system is used for device control and information seeking in databases. Speech recognition systems have been also implemented on many robotic systems. Examples are, HERMES from the University of Munich [27] and CARL from the University of Aachen, where the user can use spoken language for doing easy jobs. Moreover, there are other robots, such as CARL from Aveiro University [114] and Armar from Karlsruhe University, both of which can fulfil kitchen tasks. BIRON [182] at the University of Bielefeld can learn artifacts’ names if the user shows them. The Japanese robots HRP-2 [74] and JIJO-2 [118] are used for office buildings to show people their room numbers and the path to their locations. There is also a current research at the University of Auckland targets developing a healthcare robot, HealthBot, which provides vital signs measurements, medication scheduling and reminding, falls detection and entertainment for older people [76].
When robots are developed to serve older people, it is essential to provide these robots with a multi-modal interaction [56, 62]. There are several interaction modalities, which can be used for this purpose, such as voice and graphical user interfaces. The choice among these modalities has to consider the population to whom the robot is dedicated. Authors in [56] have compared the interactions of older and younger users with a speech-based smart home system.
The results illustrated that older users were less likely to speak to the system in a way that was easy for the system to understand. Consequently, lower task success is achieved because of agerelated changes that usually affect many interrelated aspects of cognition, such as information processing speed, mental flexibility, fluid intelligence, and memory [167]. Therefore, for more effective human robot interaction, it is recommended by the authors in [56, 62] to engage a graphical user interface with voice user interface to guide older people through the functions supported by the service robots.

READ Iron oxide-rich melt separation from mafic magma: the case study from Cihai skarn-related magnetite deposit, Eastern Tianshan, NW China

Summary

In this chapter, the basic building blocks of ASR systems as well as the methods used in building search spaces for speech decoding have been discussed. In addition, the performance metrics used in evaluating the performance of ASR systems were described. Moreover, the common paradigms used in integrating speech recognition systems with machines along with some of the current applications of speech recognition systems have been presented in this chapter.

Acoustic modelling

Acoustic modelling is the process of estimating a set of acoustic model parameters that maximizes the probability of the corresponding phonetic unit. The quality of an acoustic model is measured in terms of its ability to discriminate the corresponding phonetic unit from other competing phonetic units. However, this discrimination is a challenging task given the variabilities of environment, speaker, context, and speech signal. In this section, we review some elements contributing to the quality of acoustic modelling.

Hidden Markov model

Hidden Markov model (HMM)-based acoustic modelling is the common choice for state-of-theart automatic speech recognition (ASR) systems [53]. As discussed in Chapter 2, a HMM is a stochastic finite-state machine consisting of a set of states and transitions, as shown in Fig. A HMM, denoted by , can be viewed as a double stochastic process [158]. In the first stochastic process, acoustic features are probabilistically modelled at each HMM state using a mixture of Gaussian components. The second stochastic process is responsible for modelling the topological structure using state transition probabilities.

1 Introduction
1.1 Motivations
1.1.1 Continuous speech decoding
1.1.2 Speech decoding accuracy
1.1.3 Command decoding accuracy
1.2 Objectives of the thesis
1.3 Contributions of the thesis
1.4 Outlines of the thesis
2 Speech Recognition Overview
2.1 Introduction
2.2 Statistical speech recognition
2.2.1 Signal analysis .
2.2.2 Acoustic modelling
2.2.3 Language modelling
2.2.4 Model parameter estimation
2.2.5 Lexical modelling
2.2.6 Speech decoding
2.2.7 Search space representation
2.2.8 Fast decoding techniques
2.2.9 Performance metrics
2.4 Applications of speech recognition
2.5 Summ
3 Modelling Speech Knowledge Sources
3.1 Introduction
3.2 Acoustic modelling .
3.3 Language modelling
3.4 Parameter estimation
3.5 Baseline models
3.6 Speech corpora
3.7 Summary
4 Search Space Construction
4.1 Introduction .
4.2 Weighted finite-state transducers .
4.2.1 Transducers operations
4.3 Knowledge sources representation
4.4 Search space construction
4.5 Baseline decoding graphs
4.6 Summary
5 Continuous Speech Decoder
6 Optimization on Decoding Graphs
8 UML-Based ASR Development
9 Conclusions and Future Perspectives
References