The Art of Explaining to Humans
In one of its philosophical papers, David K. Lewis claims that explaining an event means “providing some information about its causal history. In an act of explaining, someone who is in possession of some information about the causal history of some event tries to convey it to someone else” (Lewis, 1986). In other words, an explanation is both a product and a process: on the one hand it is a product because it can be seen as an answer to a why-question, given a presupposition such as “they did that. Why did they do it?”, on the other hand it might also be considered as the cognitive process of deriving an explanation. Philosophers and psychologists agree on the fact that explanations refer to a cause (Salmon, 1989; Woodward, 2004) (in this work, we will not consider non-causal explanations, answers to questions like “what happened”). However, there are also other important properties to consider when discussing about explanations. Many explanations are contrastive: people do not usually ask why event P happened, but why event P happened instead of a diﬀerent event Q. Answering these questions is generally easier than providing complete explanations. Explanations may also be implicitly contrastive, which makes it harder (or even infeasible) to figure out the event Q the user is implicitly referring to. Explanations are selected: people do not ask for the complete cause of an event, but select one or two causes to be the explanation. Finally, explanations may be social: in this case, they should be seen as a conversation with the goal of transferring knowledge, hence they should take into account how people interact.
Interpretable explanations. Throughout the whole thesis, the terms interpretability and explainability are used interchangeably and invoke a measure of how understandable an explanation is. More formally, in Biran and Cotton (2017) interpretability is defined as the degree to which an observer may understand why a given decision has been taken. This means that a certain concept, used to explain a given decision, may be transmitted in diﬀerent forms and diﬀerent degrees of complexities, depending on the listener.
The reasons behind an explanation. An obvious reason to ask for explanations is to understand why something happened. In this case, explanations are seen as a need for humans to find meaning. Deeper logics involving social interactions, like creating a shared meaning of something, transferring knowledge, influencing people’s beliefs, and so on (Malle, 2004) are outside the scope of this work. In general, explanations might be required to persuade someone that a given decision is right (Lombrozo, 2006): then, providing the true reason might become of secondary importance. The structure of an explanation. Diﬀerent questions require diﬀerent explanations: for example, asking why a certain event happened is not the same as asking where it happened. In this work, we focus on why questions only. One of the oldest and most known explanation models in philosophy is the Aristotle’s Four Causes model. According to the Greek philosopher, we know something (and we can explain it) when we are able to identify its causes, that is when we know why something exists and why it exists as it is. This translates into four causes or modes: the former two, material and formal, refer to its composition; the latter two, eﬃcient and final, regard their origin and goal. For instance, the material cause for a statue is the marble, the formal cause is the shape given to the marble, the essence cause is the sculptor’s scalpel, and the final cause is the reason why the sculptor makes the statue. Each of these causes, either taken alone or all together, can be considered as reasonable explanations for why questions.
How people explain behavior. Research on social attribution, which studies how people explain behavior to others, constitutes the foundation for much of the work on explanations in general, and it is of great interest for ai explainability. Heider and Simmel (1944) are the first to demonstrate that humans tend to attribute folk psychological concepts, such as desire and intention, to objects. In one of their experiments, they asked the participants to watch a video with animated shapes moving around the screen, and to describe the scene. The participants described the movements of the objects as they were performed by humans. Then, Heider argued that the main diﬀerence between the human perception of objects and other humans is strongly related to the presence or lack of specific intentions. Many years later, in one of his most famous books (Malle, 2004), Malle proposes a conceptual framework for behavior explanations, formally distinguishing between intentional and unintentional behaviors: for unintentional behavior, people tend to oﬀer only causes, while for intentional behavior people prefer more complex explanations, taking into account mental states, desires, background reasons and emotions. Also norms and morals have a huge impact on social attributions. When people explain immoral behaviors or behaviors that go against commonly accepted “unwritten rules”, they are likely to include their personal thoughts and judgements in the explanations. In other words, explanations are biased by social beliefs. Unfortunately, human behavior, personal and social opinions are hard to model and encode into ai explanations.
Selecting and Evaluating Explanations
Explanations can be seen as a cognitive process that guides the generation and reception of explanations. Such process can be split into three steps (Miller, 2019): i) causal connection, where we identify the main causes of an event; ii) explanation selection, where a small subset of the causes is selected as the explanation, iii) explanation evaluation, usually performed by people to whom the explanation is addressed.
Causal connection. It is the process of identifying the causes of a fact, inferring them from observation and/or prior knowledge. It is obvious that people cannot simulate back all possible events to understand the associated causes. They use heuristics based on several criteria: people tend to focus more on abnormal/unusual causes (abnormality of events), intentional events receive more consideration than unintentional ones (intention of events), a major focus is attributed to recent and controllable events, not coincidences, to identify the causes of a fact (timing and controllability of events); changing the perspective, the causes associated to an event typically change (perspective). In Section 1.3.2, we will see that most machine learning models learn associations rather than causal relationships.
Explanation selection. Even when it is possible to establish all possible causes of a fact, it would be impossible for a human to understand them. Explanation selection is the process of selecting a subset of the causes identified in the previous step to provide an explanation. The work in this area shows that people usually select explanations according to criteria that are very similar to the ones used to identify them: explanations taking into account diﬀerences with respect to other events, abnormal conditions and intentional causes are more likely to be selected. In general, necessary causes are preferred to suﬃcient ones; goals are generally better explanations than preconditions, but preconditions and goals together are sometimes preferred.
Explanation evaluation. When individuals receive an explanation, they determine its quality according to several criteria. In his Theory for Explanatory Coherence (Thagard, 1989), Thagard argues that humans judge positively explanations that are aligned and coherent with their prior beliefs. Moreover, high quality explanations are simple (few causes) and general (they explain multiple events). This simplicity principle is followed by many interpretable machine learning models, including libre (see Chapter 3 for more details). Surprisingly, while true and high probability causes are part of good explanations, they are not always related to explanations that people find useful (Hilton, 1996). Vasilyeva et al. (2015) also notice that explanations where the explanatory mode (material, formal, eﬃcient, final) is well aligned with the goal of the question are preferred.
According to the conversational model of explanation (Hilton, 1990), explanations can be seen as a two-stage process: i) the diagnosis of causality, where the explanation is actually “crafted” by identifying the main causes, and ii) the conversation, where the explanation is conveyed to someone. Because of this second step, explanations are subjected to the rules of conversations. Then, those explanations should contain causes that are relevant to the explainee, aligned both with his prior and shared knowledge between explainer and explainee.
The Grice’s maxims (Greaves et al., 1999). They are a set of basic rules that should be followed to present an explanation. Although they are explicitly conceived for speeches, they naturally extend to any other conversation language. Grice identifies four classes of maxims that we summarize as follows. i) Quality: a) do not say things that you believe to be false, b) do not say things without suﬃcient evidence. ii) Quantity: a) make your contribution as informative as is required, b) do not make it more informative than is required. iii) Relation: a) Be relevant. iv) Manner: a) avoid obscurity of expression, b) avoid ambiguity, c) be concise and (d) be orderly. In a few words, an explanation should only contain necessary and relevant information.
Argumentation. A research study from Antaki and Leudar (1992) shows that a considerable amount of statements in explanations are argumentative claim-backings. When the explainer explicates or justifies something, he has to be ready to defend his claims. Interestingly, argumentation is dependent on what the explainee already knows, and focuses on abnormal factors as a way to empower the explanation. This confirms that good explanations must be relevant to both the question and the mental model of the explainee. Argumentative ai explanations are outside the scope of this thesis.
Interpretable Explanations in ai
When we talk about explainability and interpretability in machine learning, most of the concepts introduced in the previous section are valid and should rather be considered as necessary conditions to obtain “really interpretable” artificial intelligent systems. Unfortunately, today we are still far from this ideal goal: although many works, maybe unknowingly, tackle some of the desiderata for eﬀective explanations, there is currently no work that satisfies all the characteristics covered in the previous section. Such limitations can be traced back to the lack of a commonly accepted definition of interpretable machine learning: indeed, popular definitions are either contrasting or incomplete, as they only consider a subset of goals; goals in isolation are not a suﬃcient condition to make a model interpretable.
Toward a General Definition of Interpretable Machine Learning
In the machine learning literature, there has been a proliferation of definitions about interpretable ai starting from 2016. It is not a coincidence that, in the same year, the European Parliament has adopted, for the first time, a set of regulations for the collection, storage and use of personal information, the General Data Protection Regulation (gdpr). In particular, Article 22 refers to automated individual decision-making, including profiling, and aﬃrms a right to explanation, with a consequent strong impact on machine learning algorithms. With the gdpr, explainable ai becomes a real need also by law.
Limitations of popular definitions. Here below some of the most popular definitions about explainable machine learning: “By explaining a prediction, we mean presenting textual or visual artifacts that provide qualitative understanding of the relationship between the instance’s components and the model’s prediction, [. . . ] as a solution to the trusting a prediction problem.” (Ribeiro et al., 2016).
“An interpretable explanation, or explanation, is a simple model, visualization, or text description that lies in an interpretable feature space and approximates a more complex model.” (Herman, 2017).
“In the context of machine learning models, we define interpretability as the ability to explain or to present in understandable terms to a human.” (Doshi-Velez and Kim, 2017).
“To intuitively understand a machine learning model, we need to visualize it, make it accessible to the senses.” (Oﬀert, 2017).
“Explainable Artificial Intelligence will create a suite of machine learning techniques that enables human users to understand, appropriately trust, and eﬀectively manage the emerging generation of artificially intelligent partner.” (Gunning, 2019).
Challenges and Opportunities
Despite the high volume of work on rule learning, many historical problems are still challenging to solve and new ones have arisen due to interpretability constraints.
In particular, both heuristic and integer-optimization based lego approaches under-estimate the complexity and importance of finding good candidate rules (or patterns), and become expensive when the input dimensionality increases, unless some constraints are imposed on the size and support of the rules. Although such constraints favour interpretability, they have a negative impact on the predictive performance of the model. Additionally, these methods do not explicitly consider class imbalance issues: i) they take the pattern discovery process for granted and have no guarantees that the discovered patterns will be useful to generate rules that characterize the minority classes. Our novel solution to these historical problems will be proposed in the next chapter.
Other issues emerge when rule-learning models process streams of data that change with time. In this case, rules are continuously learned, removed, adapted according to several criteria as done in stagger (Schlimmer and Granger, 1986), flora (Widmer and Kubat, 1996) and more modern systems such as rudolf (Milo et al., 2018) and goldrush (Jarovsky et al., 2018). Discussing in detail about incremental rule learning is out of the scope of this work, but we think it might become a trending topic in the next few years.
Table of contents :
1 Interpretable Machine Learning
1.2 The Social Importance of Explaining Why
1.2.1 The Art of Explaining to Humans
1.2.2 Selecting and Evaluating Explanations
1.2.3 Communicating Explanations
1.3 Interpretable Explanations in ai
1.3.1 Toward a General Definition of Interpretable Machine Learning .
1.3.2 Desiderata of Interpretable Explanations
1.3.3 Properties of Interpretable Models
1.3.4 Evaluation Procedures
1.4 A review of Explainability Approaches
1.4.1 Transparent Models
1.4.2 Post-Hoc Interpretability
1.5.1 On the Reliability of Post-Hoc Explanations
1.5.2 Challenges and Opportunities
2 Rule Learning
2.2 Problem Definition
2.2.1 Data Description Language
2.2.2 Hypothesis Description Language
2.2.3 Coverage Function
2.2.4 Predictive Rule Learning
2.3 Rule Learning Process
2.3.1 Feature Construction
2.3.2 Rule Construction
2.3.3 Rule Evaluation
2.3.4 Hypothesis Construction
2.3.5 Overfitting and Pruning
2.4.1 A Critical Review of Rule Learning Methods
2.4.2 Challenges and Opportunities
3 Learning Interpretable Boolean Rule Ensembles
3.2 A real industrial use case
3.2.1 Context and objectives
3.2.2 Proposed solution
3.4 Boolean Rule Sets
3.4.1 Assumptions on the Input Data
3.4.2 The Base, Bottom-up Method
3.4.3 The libre Method
3.4.4 Producing the Final Boundary
3.5.1 Experimental Settings
3.5.2 Experimental Results
4 Disentangled Representation Learning
4.2 Representation Learning in a Nutshell
4.2.1 What Makes a Representation Good
4.2.2 Disentangling Factors of Variation
4.3 Defining and Evaluating Disentangled Representations
4.3.1 Symmetry Transformations and Disentanglement
4.3.2 Disentanglement and Group Theory
4.3.3 Consistency, Restrictiveness, Disentanglement
4.3.4 Evaluating Disentanglement
4.4 Practical Applications
4.4.1 Simplifying Downstream Tasks
4.4.2 Transfer Learning
4.4.3 Increasing Fairness in Predictions
4.4.4 Higher Robustness against Adversarial Attacks
4.4.5 Other Applications
5 An Identifiable Double vae For Disentangled Representations
5.2.1 Model Identifiability and Disentanglement
5.2.2 Connections with Independent Component Analysis (ica) .
5.3 vae-based Disentanglement Methods
5.3.1 Unsupervised Disentanglement Learning
5.3.2 Auxiliary Variables and Disentanglement
5.4 idvae – Identifiable Double vae
5.4.1 Identifiability Properties
5.4.2 Learning an Optimal Conditional Prior
5.4.3 A Semi-supervised Variant of idvae
5.5.1 Experimental Settings
5.5.2 Experimental Results
6.1 Themes and Contributions
6.2 Future Work