Tackling the problem of clustering in mcda

Get Complete Project Material File(s) Now! »

Modelling a decision aiding process

As part of the decision aiding process, several steps can be identified. These are the problem situation, the problem formulation, the selection of an evaluation model and the extraction of a final recommendation [Tsoukiàs 2007]. They may not be all explored during each decision aiding process, nor in any particular order.
In general, one of the first steps that need to be taken, when considering a decision problem, is to situate the problem. Some of the main questions that should be answered during this step are [Tsoukiàs 2007]:
• who has a problem?
• why is this a problem?
• who decides on this problem?
• what are the stakes?
• etc.
Although the dm may already consider some of these questions before committing to making a decision, the answers to these questions should also be explored when employing the aid of an analyst. During this step, the set of actors which take part in the decision process, the stakes each of them brings and the resources they commit on their stakes need to be identified. Exploring this step is also beneficial in helping to eliminate any misunderstandings between the dm and the analyst.
While the previous step has a more descriptive purpose, the step of formulating the problem has a more constructive character. At this point the analyst needs to reduce the real problem to a formal one. If the model is accepted by the dm, then different methods can be used afterwards in order to solve the decision problem. At this point, the alternatives, or potential decisions, need to be identified, along with the set of points of view on them, i.e. how the alternatives are evaluated, and finally the objective, or what the dm expects to find at the end of this process. Several decision aiding approaches will stop after this step, as they consider that understanding the problem is equivalent to solving it. Other approaches consider the formulation of the problem as given and will only consider the next steps. Regardless of these approaches, the step of proposing a problem formulation is very important in the decision aid process and is used further to construct the evaluation model.
This step represents the part that most classical decision aiding approaches focus on. With the alternatives defined and the criteria, which express the features of the alternatives that are important to the dm, constructed, different models and approaches can be employed in order to find a formal solution. A certain model, which is able to capture the preferences of the dm on the set of alternatives, needs to be first selected and tuned accordingly. This step is very important as the final result will be strongly reflected by this choice. The model is then used to construct a global relation between the alternatives which is then exploited by a selected method in order to provide a solution. An important aspect when selecting the evaluation model is to have this model emerge naturally from the problem formulation and not by first choosing a specific method and then formulating the problem around it [Tsoukiàs 2007].
The final recommendation is constructed by translating the formal solution from the previous step into a clear and easily understandable recommendation from the point of view of the dm.

Outranking relations

Instead of reducing the evaluations of each alternative to a single score, another approach is to compare the evaluations between pairs of alternatives systematically and aggregate these comparisons into a binary relation called an outranking relation. There are several methods of constructing these relations, among the most commonly used being the electre methods [Keeney and Raiffa 1976, Roy 1993] and the promethee methods [Brans and Mareschal 2002]. We mention also the rubis method [Bisdorff et al. 2008], which is defined in a bipolar setting. We will detail below the principle on which all these methods build the outranking relation.
An alternative x is said to outrank another alternative y if [Roy 1993]:
1. there is a qualified majority of weighted criteria on which x is performing at least as good as y;
2. there is no criterion on which y seriously outperforms x.
The first statement is modelled through local concordance degrees, which measure for each criterion i ∈ F if the evaluation of x on i is at least as good as that of y. Generally, these degrees are valued on a scale from 0 to 1. If x is not at least as good as y on criterion i, then the local concordance degree is 0, if x is at least as good as y then this degree is 1. If there is a measure of uncertainty with respect to the two statements, a value between 0 and 1 may be used, depending on the particular type of outranking relation begin used.
The criteria on which x has an evaluation at least as good as y are then put into balance, by additionally using a set of criteria importance weights and constructing a global concordance degree. The importance weights have, in this case, a different meaning than the weights used in value functions. As the outranking methods are inspired from the voting procedures in social choice theory, each criterion can be seen as a group of voters having the same opinion, while the weight assigned to them represents their strength, as in number. On the other hand, the weights using in value functions are interpreted as trade-offs between criteria. The condition of having the criteria independent is also required in the case of outranking relations. Unlike in mavt, dealing with both qualitative and quantitative scales is much simpler in this case. We no longer need to convert a qualitative scale into a numerical representation, but only to define an operator for assessing if an evaluation is at least as good as another or not. In order to deal with potential imprecision in the evaluations of the alternatives on the set of criteria, some of these operators take into account discrimination thresholds. In most cases, two such thresholds are defined on each criterion: an indifference threshold and a preference threshold, which are used to define intervals inside which a difference between evaluations can be considered as indifferent, or clearly significant respectively. Discrimination thresholds are also used to deal with the subjective perception of the dm with respect to the criteria evaluations. For instance, if a dm considers the problem of buying a car, focusing equally on only two criteria, the price and the fuel consumptions, if one of the cars is more expensive but has a fuel consumption lower than the second, then he is faced with a dilemma. If the first car is more expensive with only a little amount, which the dm considers negligible, then the initial dilemma is removed. He is therefore able to select the slightly more expensive car.

Eliciting the model parameters

As we have seen so far, the two main approaches of modelling the preferences of a dm rely on certain parameters that make them reflect these preferences with fidelity. They can be in the form of criteria weights, discrimination thresholds, cut levels, capacities or fuzzy measures. These parameters need to be elicited first so that the relations between the alternatives are well-modelled. Mousseau defines preference elicitation as the “process that goes through an interaction between the decision-maker and the analyst (or a software) and leads the decision-maker to express preference information within the framework of a selected multiple criteria aggregation procedure” [Mousseau 2005]. The elicitation of the model parameters can be input-oriented, also called an aggregation procedure or direct parameter elicitation, or output-oriented, also called disaggregation procedure, indirect parameter elicitation procedure, ordinal regression analysis or inverse analysis.
The input-oriented preferential information can be in the form of:
• an exact value for a parameter (e.g. the weight for criterion i is exactly 0.5);
• an interval for a parameter (e.g. the weight for criterion i should be between 0.2 and 0.3);
• a ratio or a trade-off between two criteria weights (e.g. criterion i is twice as important as criterion j);
• a relative importance between two criteria or two coalitions of criteria (e.g. criteria i and j are more important together than criterion k);
• etc.
Asking for the model parameters directly implies that the dm is aware of and can express with great precision his preferences, but also that he is completely familiar with the preference model being used and knows what each parameter means and how it affects the relation between the alternatives. For instance, the ahp method [Saaty 1980] determines the parameters by asking the dm the relative importance between criteria.
The output-oriented preferential information can be in the form of:
• indifference between two alternatives;
• strict preference of one alternative over another;
• weak preference of one alternative over another;
• incomparability between two alternatives.

Table of contents :

Résumé
Introduction
I On the problem of clustering in MCDA
1 On clustering
1.1 An introduction to cluster analysis
1.1.1 Clustering and Classification
1.1.2 Fields of application
1.1.3 Clustering challenges
1.2 Measuring similarity
1.2.1 Data characteristics
1.2.2 Measures on quantitative scales
1.2.3 Measures on qualitative scales
1.3 Overview of clustering approaches
1.3.1 Partitioning clustering approaches
1.3.2 Hierarchical clustering approaches
1.3.3 Other approaches
2 Multiple criteria aid for decisions
2.1 Multiple criteria decision aiding
2.1.1 A brief introduction
2.1.2 Decision aiding
2.1.3 Modelling a decision aiding process
2.2 Modelling preferences
2.2.1 Value functions
2.2.2 Outranking relations
2.2.3 Eliciting the model parameters
2.3 The classical decision problems
2.3.1 The choice problem
2.3.2 The sorting problem
2.3.3 The ranking problem
3 On the mcda clustering problem
3.1 State-of-the-art of clustering in mcda
3.1.1 Overview
3.1.2 Multicriteria Relational Clustering
3.1.3 Multicriteria Ordered Clustering
3.2 From data analysis to mcda
3.2.1 General comparison
3.2.2 Illustrative examples
3.2.3 Detailed analysis of the relations
3.3 The problem of clustering in mcda
3.3.1 Defining clustering in mcda
3.3.2 Motivations for clustering
II Tackling the problem of clustering in mcda
4 Modelling the clustering problem
4.1 Working context
4.1.1 Credibility calculus
4.1.2 Modelling the preferential relations
4.2 Comparing sets of alternatives
4.2.1 Using representatives
4.2.2 Aggregating pairwise comparisons
4.2.3 Properties of the relations
4.3 Clustering objectives
4.3.1 Non-relational clustering
4.3.2 Relational and ordered clustering
4.3.3 Additional objectives
5 Solving the clustering problem
5.1 Initial considerations
5.1.1 On solving the problem
5.1.2 Existing and new approaches
5.2 Self-organizing maps for mcda clustering
5.2.1 General description
5.2.2 K-means and K-medoids
5.2.3 New topologies and adaptivity
5.3 CLIP: CLustering using Indifference and Preference relations
5.3.1 First step: Partitioning using indifferences
5.3.2 Approximative approach for the first step
5.3.3 Second step: Refining using preferences
6 Validating the clustering approaches
6.1 Generating the benchmarks
6.1.1 Benchmarks characteristics
6.1.2 Generating the evaluations
6.1.3 Analysing the benchmarks
6.2 SOM adaptations
6.2.1 The disconnected SOM on indifferences
6.2.2 The chained SOM on indifferences
6.2.3 The ordered SOM on indifferences
6.3 CLIP and other approaches
6.3.1 First step of CLIP
6.3.2 Second step of CLIP
6.3.3 Comparative analysis
III Exploring large preferential datasets
7 Measures for characterising sets of alternatives
7.1 The preference model
7.1.1 The bipolar valued outranking relation
7.1.2 Modelling the relation through linear constraints
7.1.3 Formulating a mixed-integer linear program
7.2 Descriptive measures for sets of alternatives
7.2.1 Central profiles
7.2.2 Bounding profiles
7.2.3 Separating profiles
7.3 Validating the approaches
7.3.1 Extracting central profiles
7.3.2 Extracting bounding profiles
7.3.3 Extracting separating profiles
8 Clustering for large mcda applications
8.1 Clustering large datasets
8.1.1 General introduction
8.1.2 Limitations and challenges
8.1.3 Main approaches
8.2 mcda clustering algorithms for large datasets
8.2.1 A Divide and Conquer approach
8.2.2 Incremental clustering
8.2.3 Clustering through samples
8.3 Validating the approaches
8.3.1 Describing the experiments
8.3.2 Initial analysis
8.3.3 Comparative analysis
9 Case Study: Mining Toxics Release Inventory data
9.1 Describing the available data
9.1.1 Selecting the data on chemicals toxicity
9.1.2 Selecting the data on facility practices
9.1.3 Analysing the selected data
9.2 Defining the problem
9.2.1 Formulating the problem
9.2.2 Structuring the problem
9.2.3 The preference model
9.3 Solving the problem
9.3.1 Finding the result
9.3.2 Analysis using representatives
9.3.3 Analysis using bounding profiles
Summary of the main achievements
Perspectives and future work
Concluding remarks
Bibliography