As mentioned in section 4.6, the problem of classifying an element in order to assign the corresponding class label consists in determining on which side of the decision surface it is located. For 2-class problems this is a relative simple task that can be solved by assessing the sign returned by the evaluation of the function that defines the decision boundary over the input element. Nevertheless, when the classification task involves more than 2 classes, this problem results more complicated. In fact, in this case it is necessary to generate multiple decision surfaces for separating all the considered classes, and to assign the corresponding labels according to the relative position of the input vectors to all boundaries within the classification space.
There are many possible ways for separating classes in order to set down the decision boundaries. Each approach varies in the number of functions that, according to the amount of classes, are required to divide the classification space into the associated regions. In this regard, the best approach is the one that satisfactorily adapts itself to the distribution of data while demanding a reasonable number of computations. In the following sections we will describe three of the most common methods, namely, one-versus-one, one-versus-all, and hierarchical approaches.
As the name suggests, the one-versus-one approach consists in generating a model to independently separate each one of the groups. To this end, it is necessary to train K(K 1)=2 diﬀerent models, where K represents the number of classes, on all possible pairs of groups and then, in the case of classification, to assign the labels of the testing set according to the class having the highest number of « votes » . This technique has the drawback of leading to ambiguities in the resulting classification. However, this issue does not represent its biggest inconvenience; since it can be tackled by generating convex regions, as described in section 18.104.22.168. In fact, the main problem is that, for a large K, such solution requires a considerable amount of time for setting all the required models. Similarly, for the evaluation of testing data, significantly computation is required.
Another common approach consists in constructing K separate models, where K represents the number of classes, in which the kth model is generated by grouping within one group all elements belonging to class Ck and, in the second class, all elements from the remaining K 1 classes. This is known as the one-versus-all approach. However, using the decisions of the individual classifiers can generate inconsistent results; in which an input is assigned to multiple classes simultaneously. This problem is sometimes addressed by choosing the label associated to the largest distance separating the input data from the decision boundary among all models . Unfortunately, this heuristic approach presents the inconvenience that the diﬀerent models were trained on diﬀerent tasks, and there is no guarantee that the distances generated by all models will have appropriate scales. Another problem that has the one-versus-all approach is that the training sets are unbalanced. For instance, if we have four classes each with equal numbers of training data elements, then the individual classifiers are trained on data sets comprising 75% of samples belonging to the group containing the K 1 classes, and only 25% samples of the group involving only the Ck class, which aﬀects the original symmetry of the task.
Hierarchical methods allow solving multiclass problems by using a tree of binary classifiers, whose root discriminates between two groups; each one containing a half of the classes. Each succeeding node includes again only one half of the classes from the selected group, and the process is recursively repeated until each node contains a single class, from which the final decision can be inferred .
The hierarchical approach can be a very convenient method, since even when several models are required to be generated during the training stage, only a few of them are used during the validation stage. However, there are two important drawbacks to be considered; first, if there is a single erroneous prediction throughout the classification chain, the final result will be wrong. And second, given that from the second stage all models are selected according to the previous decision, all the subsequent stages have to be applied consecutively, which can lead to long computations.
Figure 5.3 shows a representation of the hierarchical approach for a classification problem involving 4 diﬀerent classes. As it can be observed, two stages are required; the first one consists of a model that selects 2 of the classes, which are subsequently evaluated by another model that is applied in the second and last stage in order to make the final decision.
Conventional single-label classification involves learning from a set of examples that are associated with a single label l from a set of disjoint labels L, jLj> 1. As mentioned, if jLj= 2, the learning task is called binary classification, while if jLj> 2, then it is called multiclass classification. In multilabel problems the examples are associated with a set of labels Y L . In the past, multilabel classification was mainly motivated by medical diagnosis tasks. For example, it can be the case that a patient suﬀers from both hypertension and atrial fibrillation at the same time.
Nowadays multilabel classification methods are increasingly required by modern applications, such as protein function classification, and among others, music categorization . Essentially, there are two ways for addressing multilabel classification; problem transf ormation methods, which as its name indicates consists in transforming the multilabel task either into one or more single-label classifications problems, and algorithm adaptation methods; which extend specific learning algorithms in order to handle multilabel data directly.
In the case of motor imagery-based BCI systems, the multilabel approach arises when com-bined movements are included in the paradigm; which as mentioned, allows to drastically increase the number of aﬀorded commands. Now, if we consider the assumption that was justified by the analysis presented in sections 2.6.3 and 3.1.3 for the 4-class and 8-class databases respectively, it turns out that there is a particular way for grouping classes that allows to define a problem transformation method based on the manner of how the activity is distributed within the dif-ferent sources. In simple terms, this activity appears as an ERD modulation over the regions whose associated limb is engaged in the motor imagery. In the case of combined imaginations, such modulation can be characterized as the superposition of the activity that is independently generated by each source during simple tasks , . With this in mind we have proposed to group EEG data separately for each activity source; so that, all motor imageries involving the use of the associated body part, are gathered together into one class (hereinafter to be referred to as CERD), and the remaining conditions that do not include it in another class (hereinafter to be referred to as CIDLE). In this way, it is possible to reduce the entire problem into a series of binary tasks; which simplifies the complexity of the classification and boost feature extraction by allowing the use of the CSP algorithm, which is only suitable to discriminate between two classes. By using this type of partition we have implemented three new diﬀerent methods that vary in the manner of how the features extracted from each binary problem are classified, namely, One-step Multilabel (OsM) approach, Hierarchical Multilabel (HM) approach, and One-step Hiearchical Multilabel (OsHM) approach.
Before introducing the aforementioned problems we present, in order to simplify their descrip-tion, a convention for labeling each one of the diﬀerent tasks according to the presence/absence of motor imagery-related activity at each one of the main sources. If the sources are active, a value of \1″ is used to represent them; otherwise, the value corresponds to \0″. The number assigned to each source is used to generate a P -bit word, with P representing the number of body parts considered by the paradigm (2 for the 4-class database, and 3 for the 8-class one), where the less significant bit corresponds to the area around C4 (right side) and the most significant bit to the area around C3 (left side). Hereafter, in order to describe the proposed methods, we will refer to each class by considering the decimal equivalence of the aforementioned convention (see figure for a complete association between motor tasks and labels).
One-step Multilabel (OsM) approach
The One-step Multilabel (OsM) method considers, in the case of the 4-class problem formulated by the 4-class database, two feature extraction modules to independently determine whether the sources linked to the left hand and right hand are engaged during a motor task (see Figure 5.5). Similarly, in the case of the 8-class problem given by the 8-class database, besides these two models there is a third one to analyse the activity generated by the source associated to the feet (see Figure 5.6). Each module is defined as a binary problem where one class corresponds to the group of motor tasks showing ERD modulation at the corresponding source (i.e., motor imageries involving the use of the limb associated to that source), and the other class to the group of motor tasks that do not generate such activity and present similar patterns to the idle state (i.e., motor imageries that do not involve the use of the limb associated to that source).
The whole procedure involves two feature extraction models in the case of the 4-class problem (compared to the six and four models that are required to solve the same problem by using the one-versus-one and one-versus-all approaches respectively), and three feature extraction models for the 8-class one (compared to the 28 and 8 models that are required to solve the same problem by using the one-versus-one and one-versus-all approaches respectively). In both cases, outputs are concatenated to form a single vector, so that only one classification model is required.
When applied in combination with the CSP method, and by selecting 2 pairs of filters (4 in total), for each one of the two binary problems devoted to solve the 4-class task, the OsM approach generates a total of 8 features per trial. Whereas, in the case of the 8-class problem, the resulting vectors comprise 12 features per trial.
Hierarchical Multilabel (HM) approach
The second method consists of a hierarchical decision process that comprises three feature extraction models distributed in two consecutive stages for solving the 4-class problem given by the 4-class database (see Figure 5.7), and seven feature extraction models placed within three consecutive stages for solving the 8-class problem formulated by the 8-class database (see Figure 5.8). Such stages constitute a tree of binary classifiers whose root discriminates between two groups containing each one a half of the classes. Each succeeding node includes only the half of the classes that was selected in the previous stage, and the process is recursively repeated until the last node contains only two classes, from which the final decision can be inferred. Data partition is carried out in a similar manner as the OsM method; so that the presence/absence of motor imagery-related activity associated to one of the hands is analyzed during the first stage in order to determine whether or not it is engaged in the motor task. To this end, two groups are conformed by gathering together the motor tasks involving the selected hand in the ERD-class, and by putting together the motor tasks that do no involve it in the IDLE-class. Note that the hand selection to start the classification process is arbitrary, and that it is possible to start either with the left or the right hand. Once the decision is inferred, half of the classes are discarded. After this decision there are only two classes left in the case of the 4-class problem, whereas for the 8-class task there are four classes remaining (i.e., in both cases either the group of motor tasks involving the selected hand or the group that does not include it). Based on this criterion, two feature extraction and classification models are generated in the second stage; from which, according to the previous prediction, only one of them is applied during the validation phase. In this part of the process the aim is to identify whether or not the source linked to the hand that has not been analyzed presents ERD activity, given that the hand selected after the first stage has been already recognized as active or inactive in terms of motor imagery-related activity. In the case of the 4-class problem the second stage corresponds with the last one, given that each one of the two models corresponds to a binary problem including only two classes, from where the final decision can be inferred. One of these models correspond to the one formulated by the engagement of the hand analyzed in the previous stage, in which case the goal is to determine whether or not the other hand is active, so that the predicted label would result in the use of both hands if the second hand is also found to be engaged, or only the use of the hand that was already detected in the previous step if the second hand was found to be inactive. In the same way, the second model is generated under the assumption that the hand analyzed during the first stage was labelled as inactive, so that the goal is to determine whether the other hand is also inactive, so that the prediction would correspond to the rest state, or whether the second hand is active, in which case the predicted label would be assigned according to the manner in which data partition was proposed in the formulation of the problem. In the case of the 8-class problem, the second stage corresponds to an intermediate process that aims at determining the state of the second hand before to start analyzing the activity generated by the feet. Thus, if the inference of the first step has considered the ERD-class, the corresponding model generates a new ERD-class by gathering together the two motor tasks that involve the use of both hands; and in a new IDLE-class the two motor tasks that involve the use of the hand selected in the previous model but that do not involve the use of the hand under the current analysis. Contrarily, if the inference of the previous step has considered the IDLE-class, the corresponding model creates a new ERD-class by putting together the two motor tasks that involve the use of the hand under analysis but that do not include the use of the hand selected by the previous step; and a new IDLE-class by gathering together the motor tasks that do not involve the use of any hand. After the second stage there are only 2 classes remaining, from which the final decision can be inferred in the third and last step. To this end, there are four diﬀerent instances that are determined by the combination of decisions made by the preceding stages, each one of them defining a binary problem that considers the activity generated by the source linked to the feet. Thus, in the case that any hand was found to generate ERD activity, the classification task consist of distinguishing between rest and feet. Contrarily, if only the left hand was found to be involved in the motor task, the classification aims at discriminating between left hand and left hand in combination with feet. In the same way, if only the right hand has generated ERD activity, the task consists of classifying right hand versus right hand in combination with feet and, finally, if both hands were found to be involved during the motor task, the goal is to distinguish between both hands and both hands in combination with feet.
One-step Hierarchichal Multilabel (OsHM) approach
The third method is a combination of the first two approaches; it implements the same binary problems related to the sources associated to the hands, as it was done with the OsM method, and incorporates the four instances associated with the source linked to the feet that were formulated by the HM algorithm. The whole procedure involves 6 feature extraction models (compared to the 28 and 8 models that are required to solve the same problem by using the one-versus-one and one-versus-all approaches respectively), whose outputs are concatenated to form a single vector, so that only one classification model is required. In this way, by considering all the possibilities involving the use of the feet, more discriminative features are generated in comparison with those obtained by the OsM approach. And, given that all features are considered together, it is not necessary to apply diﬀerent classification stages as it was done with the HM approach, which speeds up the process and helps to compensate outliers.
Table of contents :
1.1 BCI definition
1.2 BCI architecture
1.2.1 Signal acquisition
1.2.2 Signal processing
22.214.171.124 Feature extraction
126.96.36.199 Feature classification
1.2.3 System output
1.3 BCI taxonomy
1.3.1 Dependent and independent BCIs
1.3.2 Exogenous and endogenous BCIs
1.3.3 Passive and active BCIs
1.3.4 Hybrid BCIs
1.3.5 BCI operating protocols
188.8.131.52 Synchronous protocols
184.108.40.206 Self-paced protocols
1.3.6 Continuous and discrete decoding
1.4 Brain Signals for BCI
1.4.1 Invasive recording techniques
220.127.116.11 Intracortical Recordings
1.4.2 Non-invasive recording techniques
18.104.22.168 Functional Magnetic Resonance Imaging (fMRI)
22.214.171.124 Functional Near Infrared
126.96.36.199 Magnetoencephalography (MEG)
188.8.131.52 Electroencephalography (EEG)
EEG signals and motor imagery
2.1 EEG signals
2.1.1 EEG brain rhythms
2.1.2 International 10-20 system
2.2 Primary motor cortex
2.3 Sensorimotor rhythms
2.3.1 Event-related desynchronization
2.3.2 Event-related synchronization
2.3.4 Spatial mapping of ERD/ERS
2.4 Event-related potentials
2.5 Time course of ERD/ERS
2.6 Combined movements
2.6.1 4-class database
184.108.40.206 Paradigm and time scheme
2.6.2 ERD/ERS% analysis
2.6.3 Statistical analysis
Robotic arm control
3.1 8-class database
3.1.1 Paradigm and time scheme
3.1.2 Oscillatory power analysis
3.1.3 Statistical analysis
4.1 Feature extraction
4.1.1 Covariance matrix
4.1.2 Normal distribution and eigenvalue decomposition
4.1.3 Common Spatial Patterns (CSP)
4.2 Analytical Common Spatial Patterns (ACSP)
4.3 Feature selection
4.3.1 Mutual information
4.4 Filter Bank Common Spatial Pattern (FBCSP)
4.4.1 Mutual Information-based Best Individual Feature (MIBIF) algorithm
4.5 Common Spatial Pattern by Joint Approximate Diagonalization (CSP by JAD)
4.5.1 Information theoretic feature extraction (ITFE)
4.6.1 Discriminant functions
220.127.116.11 Two classes
18.104.22.168 Multiple classes
22.214.171.124 Linear Discriminant Analysis (LDA)
126.96.36.199 Support Vector Machines (SVM)
4.6.2 Distance-based classification
188.8.131.52 Riemannian geometry
184.108.40.206.1 Classification in the Riemannian manifold
220.127.116.11 CSP and Riemannian geometry
Multiclass and multilabel approaches
5.1 Multiclass approaches
5.1.1 One-versus-one approach
5.1.2 One-versus-all approach
5.1.3 Hierarchical approach
5.2 Multilabel approaches
5.2.1 One-step Multilabel (OsM) approach
5.2.2 Hierarchical Multilabel (HM) approach
5.2.3 One-step Hierarchichal Multilabel (OsHM) approach
6.1 Experimental parameters
6.1.1 Classification algorithms
6.2.1 Cross-validation on the 4-class database
6.2.2 Cross-validation on the 8-class database
6.3 Results: summary