GMM optimisation

somdn_product_page

(Downloads - 0)

Catégorie :

For more info about our services contact : help@bestpfe.com

Table of contents

Abstract
Acknowledgements
I Introduction
I.1 Speaker recognition terminology
I.2 Issues with current ASV status
I.3 Contributions and publications
I.4 Thesis structure
I.4.1 Part A
I.4.2 Part B
II A review of traditional speaker verification approaches
II.1 Speech as a biometric
II.2 Front-end: speaker features
II.2.1 Short-term features
II.2.2 Longer-term features
II.3 Back end: models and classifiers
II.3.1 Gaussian Mixture Models
II.3.2 GMM-UBM
II.3.3 Hidden Markov Models
II.3.4 Towards i-vectors
II.3.5 The HiLAM system
II.4 Performance Metrics
II.4.1 Receiver Operating Characteristic (ROC)
II.4.2 Equal Error Rate (EER)
II.4.3 Score normalisation
II.5 Challenges and Databases
II.5.1 NIST Speaker Recognition Evaluations
II.5.1.a The early years
II.5.1.b Broader scope and higher dimensionality
II.5.1.c Bi-annual big data challenges
II.5.1.d SRE16
II.5.2 RSR2015 corpus
II.5.2.a Training
II.5.2.b Testing
II.6 Summary
IIISimplified HiLAM
III.1 HiLAM baseline implementation
III.1.1 Preprocessing and feature extraction
III.1.2 GMM optimisation
III.1.3 Relevance factor optimisation
III.1.4 Baseline performance
III.2 Protocols
III.3 Simplified HiLAM
III.3.1 Middle-layer training reduction
III.3.2 Middle layer removal
III.4 Evaluation Results
III.5 The Matlab demo
III.6 Conclusions
IV Spoken password strength
IV.1 The concept of spoken password strength
IV.2 Preliminary observations
IV.2.1 The text-dependent shift
IV.2.2 The text-dependent overlap
IV.3 Database and protocols
IV.4 Statistical analysis
IV.4.1 Variable strength command groups
IV.4.2 Sampling distribution of the EER
IV.4.3 Isolating the influence of overlap
IV.5 Results interpretation
IV.6 Conclusions
V A review of deep learning speaker verification approaches
V.1 Neural networks and deep learning
V.1.1 Deep Belief Networks
V.1.2 Deep Auto-encoders
V.1.3 Convolutional Neural Networks
V.1.4 Long short-term Memory Recurrent Neural Networks
V.2 Deep learning in ASV
V.2.1 Feature extraction
V.2.2 Applications to i-vector frameworks
V.2.3 Back-ends and classifiers
V.3 End-to-end
V.3.1 Middle-level representations VS raw audio
V.3.2 Fixed topologies
V.4 Summary
VIAugmenting topologies applied to ASV
VI.1 Evolutionary strategies
VI.1.1 TWEANNs
VI.1.2 NEAT
VI.2 Application to raw audio classification
VI.3 Truly end-to-end automatic speaker verification
VI.3.1 Fitness function
VI.3.2 Mini-batching
VI.3.3 Training
VI.3.4 Network selection for evaluation
VI.4 Experiments
VI.4.1 Baseline systems
VI.4.2 NXP database and experimental protocols
VI.4.3 End-to-end system: augmentation and generalisation
VI.5 Further experiments: End-to-end system on NIST SRE16 data
VI.6 Conclusions
VIIAugmenting topologies applied to anti-spoofing
VII.1 A brief overview of anti-spoofing
VII.2 NEAT setup
VII.2.1 Ease of classification
VII.2.2Training
VII.2.3Testing
VII.3 Experimental setup
VII.3.1 Database, protocol and metric
VII.3.2 Baseline systems
VII.3.3 End-to-end anti-spoofing
VII.4 Experimental results
VII.4.1Evolutionary behaviour
VII.4.2 Spoofing detection performance
VII.5 Conclusions
VIIIConclusions
VIII.1 From the laboratory into the wild
VIII.2 Not all sentences are created equal
VIII.3 Truly end-to-end ASV
VIII.4 Truly end-to-end anti-spoofing
VIII.5 Closing thoughts and future work
Appendix A Published work
Bibliography

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *