Emotion elicitation and EEG acquisition
For a given subject, we call trial the combination of one elementary emotion elicitation (using one stimulus) and the self annotation (by the subject) of the emotion felt. For instance, as shown in Figure 2.2, the EMOEEG database protocol  requires that the participant annotates his/her emotion right after each stimulus. It is also the case for the two other databases used in this thesis : HCI MAHNOB  and DEAP .
In the audiovisual stimuli case, an alternative to post-stimulus assessment is to make the participant assess his/her emotional state dynamically, while the stimulus is watched, as it is the case for the Feeltrace  and Gtrace  annotation methods.
Even if such methods enable dynamic annotation, that is to say annotation which takes emotion variation across time, they have two major drawbacks, as stated in  :
— as the dynamic annotation has to be made while watching the video, it induces a lack of concentration, that can only be tackled by watching each stimulus twice, which results in an increase of the experimentation duration an the participant’s fatigue.
— watching each stimulus more than once may induce a habituation effect that would influence the participant’s annotation Emotional stimuli can have different natures, depending on the focus. One can get interested in emotion recognition during music listening [40, 41]. Others have used images or image blocks as stimuli [42–44], using pictures from databases such as the International Affective Picture System (IAPS, ). Musical stimuli present the disadvantage that » subjects are prone to misunderstand positive/negative valence as preferred/not preferred » . For instance, a music can be appreciated by the listener even if it makes him/her sad. As for image stimuli, even if they are an efficient way of eliciting emotion, they do not offer dynamic emotional responses. Therefore, in this thesis, the focus is put on audio-visually stimulated emotions, in order to be closer to realworld stimulation.
During each trial, the EEG signal is acquired by means of an EEG headset, as shown in Figure 2.3. An EEG headset is usually composed of 20, 32, or 64 electrodes. The names and positions of each electrode are defined by the 10-20 international system . Figure 2.4  shows the positions of 20 electrodes on the skull.
EEG-based affective datasets
Emotion recognition databases are numerous , but they mainly rely on modalities such as speech, facial expressions, or eye gaze. To the best of our knowledge, only a few EEG-based emotion recognition databases are publicly available. Tables 2.1 and 2.2 list those databases. In this thesis, the datasets used are HCI MAHNOB, DEAP, and EMOEEG.
EMOEEG, HCI MAHNOB and DEAP are multi-modal datasets where physiological responses to both visual and audiovisual stimuli are recorded, along with videos of the subjects, with a view to developing affective computing systems, especially automatic emotion recognition systems. The experimental setups involve various physiological sensors, among which electroencephalographic, electrocardiographic, electromyographic and electro-oculographic sensors, in addition to skin conductance data.
Commonly used features for EEG-based emotion classification
Features that are used for EEG-based emotion classification can be divided into three categories : time domain features, frequency domain features, and time-frequency domain features. In some reviews like  (Kim et al, 2013), such features are divided into only two categories, namely time domain and time-frequency domain features.
Time domain features versus time-frequency domain features
Classic time domain features such as the mean, power, or standard deviation, can be extracted from the EEG signals. More complex features, commonly used in time series analysis, such as first differences, second differences, kurtosis, or Hurst exponent, have also been used. Finally, time domain features were specifically for EEG analysis : for instance, the Hjorth features  named activity, mobility and complexity. Table 2.3 lists previous works where EEG time domain features were used in image and video-elicited emotion classification tasks. The performances obtained using such features are also indicated.
As for time-frequency domain features, commonly extracted features for EEG-based emotion classification are the Power Spectral Density (PSD) for each considered electrode in specific frequency bands (theta, slow alpha, alpha, beta, gamma) that are well known for their role in emotional and cognitive processes [29, 30]. For instance, « EEG alpha bands reflect attentional processing and beta bands reflect emotional and cognitive processing in the brain », according to Rowland et al.  and Klimesch et al. . Spectral moments of different orders and heuristic spectral shape descriptors have also been used [13, 31]. In the multi-channel case, the spectral power asymmetry between specific pairs of electrodes can be computed in the frequency bands mentioned earlier . Other approaches such as Common Spatial Patterns (CSP) [33–35] rather focus on the spatial aspect of the activity on the skull.
Influence of feature choice and other parameters on classification results
In this section, we present the intra-subject audio-visually elicited emotion binary classification results we obtained on HCI MAHNOB, DEAP and EMOEEG, studying the effects of different parameters on classification performance, and using classical EEG-based features. In the case of EMOEEG, intra-session classification is made. In other words, classification is made separately for each session (even if 3 subjects of this database participated to 2 sessions). Features are normalized by centering and scaling.
Unless otherwise specified, the results that are presented correspond to intra-subject (intra-session for EMOEEG) classification tasks, using a leave-one-out scheme. The scores presented are the mean across subjects (resp. sessions) of the subject-wise (resp. session-wise) F1-scores.
The results obtained for intra-subject classification tasks can be improved. Moreover, the fact that F1-scores are computed subject-wise (resp. session-wise) impairs their significance, as each subject (resp. session) corresponds to a limited number of stimuli (20, 30 or 50 depending on the database).
Therefore, even if inter-subject classification is more challenging, it offers two main advantages, in addition to the fact it opens the way to more generalizable systems :
— more data is available to train our classifiers, which are not limited to one subject (resp. one session) anymore.
— the significance of F1-scores is increased due to the fact classification is performed on a larger number of trials (resp. leave-one-session-out) fashion. Let us note that in the case of the HCI MAHNOB database, the results are better when emotional classes are determined using emotional keywords rather than valence and arousal levels. However, we consider these valence and arousal levels for the sake of comparison to the other databases.
Results obtained with NMF and conclusions
In this section, we study the emotion classification performance of NMF on the HCI MAHNOB and EMOEEG databases. We did not use the DEAP database in this part, because of the different nature of stimuli, namely music videos. EMOEEG and HCI MAHNOB are two multi-modal datasets where physiological responses, among which EEG, to audiovisual stimuli were recorded.
We call session the recording of a given subject at a given time of the day. In the case of EMOEEG, most subjects took one session whereas a few took two sessions. As for HCI MAHNOB, each subject took exactly one session, which means intra-session (resp. intersession) classification is equivalent to intra-subject (resp. inter-subject) classification.
Therefore, we talk about intra/inter-session in the case of EMOEEG, and intra/intersubject in the case of HCI MAHNOB. In the rest of the document, we will mention intra/inter-session in both cases : it will be also understood as intra/inter-session in the HCI MAHNOB case.
Table of contents :
1.1 Stimuli choice
1.2 Emotion annotation
1.3 Factors of variability for the EEG response
1.4 Objective and contributions
1.5 Organization of the document
2 Baseline EEG emotion classification
2.1 Emotion elicitation and EEG acquisition
2.1.1 Specific requirements
2.2 EEG-based affective datasets
2.3 Commonly used features for EEG-based emotion classification
2.3.1 Time domain features versus time-frequency domain features
2.3.2 Exploiting spatial information
2.4 Classifier training and evaluation metrics
2.5 Influence of feature choice and other parameters on classification results .
2.5.1 Extending the observation window of the signal
2.5.2 Impact of feature choice
2.5.3 Choice of classifier
2.5.4 Inter-subject classification
2.5.5 Threshold choice for valence and arousal classes
3 Group Nonnegative Matrix Factorization for EEG-based emotion recognition
3.1 Nonnegative Matrix Factorization
3.1.1 General principle
3.1.2 Divergence minimization
3.1.3 Specific use to EEG
3.2 Results obtained with NMF and conclusions
3.2.1 Intra-session classification
3.2.2 Inter-session classification
3.3 Group NMF
3.3.1 General method
3.3.2 Specific use to EEG
3.4 Results obtained with GNMF and conclusions
4 EEG-based Inter-Subject Correlation Schemes in a Stimuli-Shared Framework : Interplay with Valence and Arousal
4.1 The ISC principle
4.1.1 ISC score computation
4.1.2 Averaging Ri j to compute ISC eigenvectors
4.2 Different ISC computational schemes
4.2.1 Comparing subject signals globally vs pairwise
4.2.2 Choosing the data on which to compute the eigenvectors
4.3 Studying the effects of emotion on ISC
4.3.1 Assessing pairwise agreement
4.3.2 Assigning a subject pairwise annotation for a given stimulus when there is agreement
4.3.3 Effects of valence and arousal on ISC
4.4 Results on HCI MAHNOB
4.4.1 Results with Vall
4.4.2 Results with Vstim/pair
4.4.3 Linking the ISC level to the annotation agreement
4.5 Results on DEAP
4.5.1 Results with Vall
4.5.2 Results with Vstim/pair
4.6 Further discussion
4.6.1 Agreement is arbitrarily defined
4.6.2 ISC score variation from one scheme to another
4.6.3 Differences of ISC score variations along valence between HCI MAHNOB and DEAP
4.6.4 Effects of shrinkage
5 Towards an ISC-oriented Group Nonnegative Matrix Factorization for EEG-based emotion recognition
5.1 Multi-task GNMF-based feature learning
5.2 Results obtained with valence/arousal-based GNMF
5.3 Taking ISC into account explicitly
6.1 Conclusion and discussion