Automatic speech recognition (ASR)

(Downloads - 0)

Description

For more info about our services contact : help@bestpfe.com

Table of contents

1 Introduction
1.1 Motivation
1.1.1 Audio source separation
1.1.2 Speech and music separations
1.1.3 Single-channel and multichannel separation
1.1.4 Deep neural networks (DNNs)
1.2 Objectives and scope
1.3 Contributions and organization of the thesis
2 Background
2.1 Audio source separation
2.1.1 Sources and mixture
2.1.2 Source separation
2.2 Automatic speech recognition (ASR)
2.3 Time-frequency representation
2.4 State-of-the-art single-channel audio source separation
2.4.1 Time-frequency masking
2.4.2 Non-negative matrix factorization (NMF)
2.4.3 DNN based single-channel audio source separation
2.4.3.1 Basics of DNNs
2.4.3.2 DNN based separation techniques
2.5 State-of-the-art multichannel audio source separation
2.5.1 Beamforming
2.5.2 Expectation-maximization (EM) based multichannel audio source separation framework
2.5.2.1 Multichannel Gaussian model
2.5.2.2 General iterative EM framework
2.5.3 DNN based multichannel audio source separation techniques
2.5.3.1 Utilizing multichannel features for estimating a single-channel mask
2.5.3.2 Estimating intermediate variables for deriving a multichannel filter
2.5.3.3 Directly estimating a multichannel filter
2.5.3.4 Summary
2.6 Positioning of our study
3 Estimation of spectral parameters with DNNs
3.1 Research questions
3.2 Iterative framework with spectral DNNs
3.3 Experimental settings
3.3.1 Task and dataset
3.3.2 An overview of the speech enhancement system
3.3.3 DNN spectral models
3.3.3.1 Architecture
3.3.3.2 Inputs and outputs
3.3.3.3 Training criterion
3.3.3.4 Training algorithm
3.3.3.5 Training data
3.4 Source spectra estimation
3.5 Impact of spatial parameter updates
3.6 Impact of spectral parameter updates
3.7 Comparison to NMF based iterative EM algorithm
3.7.1 Source separation performance
3.7.2 Speech recognition performance
3.8 Impact of environment mismatches
3.9 Summary
4 On improving DNN spectral models
4.1 Research questions
4.2 Cost functions for spectral DNN
4.2.1 General-purpose cost functions
4.2.2 Task-oriented cost functions
4.3 Impact of the cost function
4.3.1 Experimental settings
4.3.2 Source separation performance
4.3.3 Speech recognition performance
4.4 Impact of time-frequency representations, DNN architectures, and DNN training data
4.4.1 Experimental settings
4.4.1.1 Time-frequency representations
4.4.1.2 DNN architectures and inputs
4.4.1.3 DNN training criterion, algorithm, and data .
4.4.1.4 Multichannel filtering
4.4.2 Discussions
4.5 Impact of a multichannel task-oriented cost function
4.5.1 Experimental settings
4.5.1.1 Task and dataset
4.5.1.2 An overview of the singing-voice separation system
4.5.1.3 DNN spectral models
4.5.2 Discussions
4.5.2.1 Task-oriented cost function
4.5.2.2 Comparison with the state of the art
4.5.2.3 Data augmentation
4.6 Summary
5 Estimation of spatial parameters with DNNs
5.1 Research questions
5.2 Weighted spatial parameter updates
5.3 Iterative framework with spectral and spatial DNN
5.4 Experimental settings
5.4.1 Task and dataset
5.4.2 An overview of the speech enhancement system
5.4.3 DNN spectral models
5.4.3.1 Architecture, inputs, and outputs
5.4.3.2 Training criterion, algorithm, and data
5.4.4 DNN spatial models
5.4.4.1 Architecture, input, and outputs
5.4.4.2 Training algorithm and data
5.4.5 Design choices for the DNN spatial models
5.4.5.1 Cost functions
5.4.5.2 Architectures and input variants
5.5 Estimation of the oracle source spatial covariance matrices
5.6 Spatial parameter estimation with DNN
5.7 Impact of different spatial DNN architectures
5.8 Impact of different spatial DNN cost functions
5.9 Comparison with GEV-BAN beamforming
5.10 Summary
6 Conclusions and perspectives
6.1 Conclusions
6.2 Perspectives
Bibliography

Automatic speech recognition (ASR)

For more info about our services contact : help@bestpfe.com

Laisser un commentaire Annuler la réponse

Sustainable tourism and ecological destination responsible management

Computerized crime tracking information system

New investigations into braess’ paradox

The implied reader’s sensitivities and the plot of numbers 16 and 17

Influence of hearing loss on the acquisition of information literacy

Sustainable tourism and ecological destination responsible management

The rubric and its impact on classroom written assignments and/or tests in the lycees

What is reading comprehension ?

Below are Agricultural Science Project Topics

Consumer Perceived Ethicality (CPE)

Current Implementation of Hands Overlay

Wireless Sensor Networks

Downloading and installation of Apps

The Birth of Game Semantics

History of Cellular Networks

About

Economics

Socially Responsible Investment (SRI)

Venture capitalist investment strategies

Sustainability and Sustainable Development

Marketing

Consumer Perceived Ethicality (CPE)

Online Personalized Advertisements

Customer experience management

For more info about our services contact : help@bestpfe.com

Produits similaires

A critical role for the frequency of Snail dependent pulsations

The momentum balance

Influence of the human skull on displacement profiles

Energy Minimization without Preemptions

Vous pourriez aussi aimer

Laisser un commentaire Annuler la réponse