Exploiting contextual information in Hidden Conditional Random Fields

Get Complete Project Material File(s) Now! »

Discrete and continuous state space

Except for a few non parametric approaches (like GPDM), models of sequential data are characterized by an internal structure. They are composed of unobserved (hidden) states taking their value in a discrete or continuous space.
For instance, Hidden Markov Models (HMM) and Hidden Conditional Random Fields (HCRF) use a set of hidden states. Each of the hidden states assign either a probability (HMM) or a score (HCRF) to the observations at each time step. Their states are discrete and mutually exclusive. It means that the hidden state variable is a K-multinomial: out of K possible states, only one is active at each time step.
On the other side, the internal structure of Recurrent Neural Networks (RNN [42], LSTM [39], . . . ) can be viewed as hidden states in a continuous space. The state (or activation) of each hidden unit is real valued and influences the final output. Some successful approaches also combine discrete state and continuous state space models such as RNN-RBM [9], DBN-HMM [36], LSTM-HMM [33], . . . . In RNN-RBM for example, the Restricted Boltzmann Machines (RBM) can be viewed as a discrete state space model because their hidden units are binary-valued. Yet RBM do not explicitly model the temporal nature of the data which amounts to the RNN part of the model. The focus of our work mainly concerns the widely popular family of discrete state space Markovian approaches for modeling time series.

Synthesis, classification and recognition with HMMs

In all the subsequent tasks, a training set is composed of di↵erent labels, or classes. The first step is to assign each label y a set of hidden states Sy modeled by a separate HMM with parameter ⇤y. Each HMM is then trained on every observation sequence of its class x 2 Xy. As a result, a HMM with parameters ⇤y then model p(Xy, y;⇤y) the joint probability of observation sequences and its label y (or class).
Synthesis : Once such a HMM is trained, it can be used directly for a basic form of synthesis. The principle is to sample from its distribution p(Xy, y;⇤y) to synthesize a new observation sequence. One begins by choosing an initial hidden state h1 by sampling from the initial distribution ⇡. Then, a first observation x1 is sampled from the emitting distribution p(x1 | h1;⇤). Next one samples from p(h2 | h1;⇤) to choose a second hidden state h2, sample from p(x2 | h2;⇤) and so forth until are reached an ending state or the number of observation samples desired.

Handling variability with HMMs

However, HMMs have several limitations and many variants have been proposed to improve upon them. One particular shortcoming is that HMM probability distributions are stationary in a given state. Concretely it means that a HMM models time series with piecewise constant distribution functions. This is a grossly way of modeling the variability of observation sequences. In the following, we will expose several approaches which introduce non stationary state distributions in Hidden Markov Models. Some of them especially rely on conditioning HMM distributions with external variables which will set the basis of our work.

Hidden Conditional Random Fields

Initially, Hidden CRF (HCRF) have been proposed as an extension of CRFs for dealing with more complex and structured data [34]. Indeed in CRF-based systems, there is usually one state per class (e.g. a POS tag) while there are several states corresponding to a given class in HRCF, alike in HMMs. The presence of several hidden states per label gives HCRF a clear advantage over CRF to model complex distributions.
Hence HCRF have been applied to signals such as gestures and images [65], handwriting [80] [23] , speech [74] [34] [68] or eye’s movements [22] whether for signal labeling or classification tasks. Figure 2.7 gives an example of such a network.
Alike HMMs when used in sequence labeling problems, a label y is assigned a set of hidden states Sy. As a result, to a sequence of labels y = (y1, . . . , yT ) corresponds a state sequences h = (h1, . . . ,hT ) 2 ST (where S is the union of Sy for all classes). We will note s(y) the set of all possible state sequences that correspond to a particular sequence of labels y.

CHMM relative to similar approaches

Handling variability is a major focus when dealing with sequences and signals. Variability may be the consequence of various e↵ects that may be eventually combined. As a consequence, one may distinguish between di↵erent kinds of variability. For instance a speech signal is fundamentally di↵erent if the speaker is a male or a female, and two speakers utter di↵erently a same word. This variability is usually modeled by multiplying models, e.g. by exploiting one model for male speakers and one model forfemale speakers.
There is a more fine-grained variability in that a single speaker never utters exactly the same way a single word. Also a human will never perform the same gesture exactly the same way. Such a variability depends on many factors that are usually unknown, like the emotion, the physical state, etc. This variability may be handled by increasing the number of Gaussian in Gaussian mixtures. Going further, there is another variability which is related to noise, to the recording material etc, this is usually handled through a preprocessing step which aims at removing this variability.
While there are historically standard ways to handle such kinds of variability, a number of other approaches have considered the benefit of explicitly including their modeling in the framework of markovian models. We introduce them here and discuss their di↵erence compared to CHMMs.

READ Major Findings in the Workshop with Green Cargo

Variable Parametric HMMs

A first attempt for conditioning HMM parameters on environment variables seems to be the work from [84] who proposed Parametric HMMs (PHMMs) for gesture recognition, context variables were related to the amplitude of the gestures. As we already said our modeling framework includes PHMM as a special case when ignoring parameterization of covariance matrices and transitions. A very similar approach (Multiple Regression HMM, orMR-HMM) has been proposed in [29] for speech recognition, using fundamental frequency as context variable. Basically MR-HMM may be viewed as PHMM with time dependent context variables ✓. These models are again embedded in our framework.
A second class of models called Variable Parameter HMMs (VPHMM) are closely related to our approach. This type of model has been introduced in [18], [17]. It was proposed in the context of speech recognition to improve robustness to noisy conditions. In this approach, the means as well as the (diagonal) covariance matrices are expressed as a polynomial function of a static scalar environment variable v:

Table of contents :

1 Introduction
2 Statistical models for time series modeling
2.1 Statistical models
2.1.1 Notations
2.1.2 Supervised learning
2.1.3 Model types
2.2 Tasks and evaluation measures
2.2.1 Isolated classification
2.2.2 Recognition
2.2.3 Synthesis
2.3 Generative Markov models
2.3.1 Hidden Markov Models
2.3.2 Handling variability with HMMs
2.4 Discriminative Markov models
2.4.1 Conditional Random Fields
2.4.2 Hidden Conditional Random Fields
2.5 Conclusion
3 Contextual Hidden Markov Models
3.1 Introduction
3.2 Single Gaussian Contextual HMM
3.2.1 Mean parameterization
3.2.2 Covariance parameterization
3.2.3 Transitions parameterization
3.2.4 Bayesian perspective
3.3 Training
3.3.1 With covariances parameterized
3.3.2 With transitions parameterized
3.3.3 Dynamic context
3.3.4 Gaussian mixtures
3.3.5 Tuning the gradient step size
3.4 CHMM relative to similar approaches
3.4.1 Variable Parametric HMMs
3.4.2 Maximum Likelihood Linear Regression
3.4.3 Context dependent modeling
3.5 Application to the classification of handwritten characters
3.5.1 Dataset
3.5.2 Preliminary results
3.5.3 Extended results
3.6 Conclusion
4 Contextual Hidden Conditional Random Fields
4.1 Introduction
4.2 Discriminative training of Hidden Markov Models
4.2.1 MMI
4.2.2 MCE
4.2.3 MWE/MPE
4.2.4 Discussion
4.3 Exploiting contextual information in Hidden Conditional Random Fields
4.3.1 HCRF as a generalization of HMM
4.3.2 Contextual HCRFs
4.3.3 Training Contextual HCRFs
4.3.4 Experiments
4.4 Conclusion
5 Exploiting Contextual Markov Models for synthesis
5.1 Motivation
5.2 Using HMMs for synthesis
5.2.1 Improved synthesis using non stationary HMMs
5.2.2 Synthesis with constraints
5.3 Speech to motion synthesis, an application
5.3.1 Related work
5.4 Speech to motion synthesis using Contextual Markovian models
5.4.1 Parameterizations
5.4.2 Training
5.4.3 Synthesis
5.4.4 Experiments
5.5 Conclusion
6 Combining contextual variables
6.1 Introduction
6.2 Dropout regularization
6.2.1 Dropout in CHMMs
6.3 Multistream combination of variables
6.3.1 Experimental setup
6.3.2 Contextual variables
6.3.3 CHMMs
6.3.4 Multistream CHMMs
6.4 Conclusion
7 Toward Transfer Learning
7.1 Design of a global model
7.1.1 Using a class code as contextual variables
7.1.2 Task & dataset
7.1.3 Preliminary results with one-hot class coding
7.1.4 Using a distributed representation of class as contextual variables .
7.1.5 Retraining discriminatively
7.2 Dynamic Factor Graphs
7.2.1 Continuous state space models
7.2.2 Analogy with Dynamic Factor Graphs
7.3 Conclusion
8 Conclusion & Perspectives
Appendices