(Downloads - 0)
For more info about our services contact : help@bestpfe.com
Table of contents
1 Vietnamese Text-To-Speech: Current state and Issues
1.1 Introduction
1.2 Text-To-Speech (TTS)
1.2.1 Applications of speech synthesis
1.2.2 Basic architecture of TTS
1.2.3 Source/filter synthesizer
1.2.4 Concatenative synthesizer
1.3 Unit selection and statistical parametric synthesis
1.3.1 From concatenation to unit-selection synthesis
1.3.2 From vocoding to statistical parametric synthesis
1.3.3 Pros and cons
1.4 Vietnamese language
1.5 Current state of Vietnamese TTS
1.5.1 Unit selection Vietnamese TTS
1.5.2 HMM-based Vietnamese TTS
1.6 Main issues on Vietnamese TTS
1.6.1 Building phone and feature sets
1.6.2 Corpus availability and design
1.6.3 Building a complete TTS system
1.6.4 Prosodic phrasing modeling
1.6.5 Perceptual evaluations with respect to lexical tones
1.7 Proposition and structure of dissertation
2 Hanoi Vietnamese phonetics and phonology: Tonophone approach
2.1 Introduction
2.2 Vietnamese syllable structure
2.2.1 Syllable structure
2.2.2 Syllable types
2.3 Vietnamese phonological system
2.3.1 Initial consonants
2.3.2 Final consonants
2.3.3 Medials or Pre-tonal sounds
2.3.4 Vowels and diphthongs
2.4 Vietnamese lexical tones
2.4.1 Tone system
2.4.2 Phonetics and phonology of tone
2.4.3 Tonal coarticulation
2.5 Grapheme-to-phoneme rules
2.5.1 X-SAMPA representation
2.5.2 Rules for consonants
2.5.3 Rules for vowels/diphthongs
2.6 Tonophone set
2.6.1 Tonophone
2.6.2 Tonophone set
2.6.3 Acoustic-phonetic tonophone set
2.7 PRO-SYLDIC, a pronounceable syllable dictionary
2.7.1 Syllable-orthographic rules
2.7.2 Pronounceable rhymes
2.7.3 PRO-SYLDIC
2.8 Conclusion
3 Corpus design, recording and pre-processing
3.1 Introduction
3.2 Raw text
3.2.1 Rich and balanced corpus
3.2.2 Raw text from different sources
3.3 Text pre-processing
3.3.1 Main tasks
3.3.2 Sentence segmentation
3.3.3 Tokenization into syllables and NSWs
3.3.4 Text cleaning
3.3.5 Text normalization
3.3.6 Text transcription
3.4 Phonemic distribution
3.4.1 Di-tonophone
3.4.2 Theoretical speech unit sets
3.4.3 Real speech unit sets
3.4.4 Distribution of speech units
3.5 Corpus design
3.5.1 Design process
3.5.2 The constraint of size
3.5.3 Full coverage of syllables and di-tonophones
3.5.4 VDTS corpus
3.6 Corpus recording
3.6.1 Recording environment
3.6.2 Quality control
3.7 Corpus preprocessing
3.7.1 Normalizing margin pauses
3.7.2 Automatic labeling
3.7.3 The VDTS speech corpus
3.8 Conclusion
4 Prosodic phrasing modeling
4.1 Introduction
4.2 Analysis corpora and Performance evaluation
4.2.1 Analysis corpora
4.2.2 Precision, Recall and F-score
4.2.3 Syntactic parsing evaluation
4.2.4 Pause prediction evaluation
4.3 Vietnamese syntactic parsing
4.3.1 Syntax theory
4.3.2 Vietnamese syntax
4.3.3 Syntactic parsing techniques
4.3.4 Adoption of parsing model
4.3.5 VTParser, a Vietnamese syntactic parser for TTS
4.4 Preliminary proposal on syntactic rules and breaks
4.4.1 Proposal process
4.4.2 Proposal of syntactic rules
4.4.3 Rule application and analysis
4.4.4 Evaluation of pause detection
4.5 Simple prosodic phrasing model using syntactic blocks
4.5.1 Duration patterns of breath groups
4.5.2 Duration pattern of syllable ancestors
4.5.3 Proposal of syntactic blocks
4.5.4 Optimization of syntactic block size
4.5.5 Simple model for final lengthening and pause prediction
4.6 Single-syllable-block-grouping model for final lengthening
4.6.1 Issue with single syllable blocks
4.6.2 Combination of single syllable blocks
4.7 Syntactic-block+link+POS model for pause prediction
4.7.1 Proposal of syntactic link
4.7.2 Rule-based model
4.7.3 Predictive model with J48
4.8 Conclusion
5 VTED, a Vietnamese HMM-based TTS system
5.1 Introduction
5.2 Typical HMM-based speech synthesis
5.2.1 Hidden Markov Model
5.2.2 Speech parameter modeling
5.2.3 Contextual features
5.2.4 Speech parameter generation
5.2.5 Waveform reconstruction with vocoder
5.3 Proposed architecture
5.3.1 Natural Language Processing (NLP) part
5.3.2 Training part
5.3.3 Synthesis part
5.4 Vietnamese contextual features
5.4.1 Basic Vietnamese training feature set
5.4.2 ToBI-related features
5.4.3 Prosodic phrasing features
5.5 Development platform and configurations
5.5.1 Mary TTS, a multilingual platform for TTS
5.5.2 Mary TTS workflow of adding a new language
5.5.3 HMM-based voice training for VTED
5.6 Vietnamese NLP for TTS
5.6.1 Word segmentation
5.6.2 Text normalization (vted-normalizer)
5.6.3 Grapheme-to-phoneme conversion (vted-g2p)
5.6.4 Part-of-speech (POS) tagger
5.6.5 Prosody modeling
5.6.6 Feature Processing
5.7 VTED training voices
5.8 Conclusion
6 Perceptual evaluations
6.1 Introduction
6.2 Evaluations of ToBI features
6.2.1 Subjective evaluation
6.2.2 Objective evaluation
6.3 Evaluations of general naturalness
6.3.1 Initial test
6.3.2 Final test
6.3.3 Discussion on the two tests
6.4 Evaluations of general intelligibility
6.4.1 Measurement
6.4.2 Preliminary test
6.4.3 Final test with Latin square
6.5 Evaluations of tone intelligibility
6.5.1 Stimuli and paradigm
6.5.2 Initial test
6.5.3 Final test
6.5.4 Confusion in tone intelligibility
6.6 Evaluations of prosodic phrasing model
6.6.1 Evaluations of model using syntactic rules
6.6.2 Evaluations of model using syntactic blocks
6.7 Conclusion
7 Conclusions and perspectives
7.1 Contributions and conclusions
7.1.1 Adopting technique and performing literature reviews
7.1.2 Proposing a new speech unit – tonophone
7.1.3 Designing and building a new corpus
7.1.4 Proposing a prosodic phrasing model
7.1.5 Designing and constructing VTED
7.1.6 Evaluating the TTS system
7.2 Perspectives
7.2.1 Improvement of synthetic voice quality
7.2.2 TTS for other Vietnamese dialects
7.2.3 Expressive speech synthesis
7.2.4 Voice reader
7.2.5 Reading machine



