Abstract Interpretation of Numerical Properties

Get Complete Project Material File(s) Now! »

Abstract Interpretation of Numerical Properties

L’interpr´etation abstraite est une th´eorie g´en´erale de la construction et la preuve de correction d’approximations de s´emantiques de programmes. Nous rappelons ses principes de base et son application `a la conception d’analyseurs statiques pour l’inf´erence des propri´et´es des variables num´eriques d’un pro-gramme.
Ensuite, nous pr´esentons la syntaxe et la s´emantique formelle d’un petit lan-gage, nomm´e Simple, qui illustre la conception par interpr´etation abstraite d’un analyseur statique, param´etr´ par un domaine abstrait num´erique. Quelques exemples de domaines num´eriques abstraits classiques sont ´egalement rappe-l´es. Ce langage et cet analyseur seront utilis´es dans les chapitres suivants pour illustrer les propri´et´es des nouveaux domaines num´eriques abstraits introduits par cette th`ese.
Abstract Interpretation is a general framework for the construction and sound-ness proof of approximated semantics of programs. We recall here its core features and its applications to the design of static analysers for the automatic inference of the properties of the numerical variables of a program.
We then present a very small programming language, called Simple, together with its syntax and formal semantics, and exemplify the abstract interpretation framework by designing a small static analyser parametrised by a numerical abstract domain. We also recall some classical numerical abstract domains. This language and this analyser will be used throughout the following chapters to exemplify the new numerical abstract domains introduced in this thesis.

General Definitions

DEF DEF

Definitions, Properties. We will use the symbols = and ⇐⇒ to introduce new objects: the left object is defined to be equal or equivalent to the formula on the right. These should not be confused with the = and ⇐⇒ symbols that describe equality or equivalence properties on already defined objects. Orders. A partially ordered set (or poset ) (D, ⊑) is a non-empty set D together with a partial order ⊑, that is, a binary relation that is reflexive, transitive, and anti-symmetric. A reflexive, transitive, but not necessarily anti-symmetric relation is called a preorder and can be turned into a poset by identifying elements in D such that both a ⊑ b and b ⊑ a. d
When they exist, the greatest lower bound (or glb) of a set D D will be denoted by D, and its least upper bound (or lub) will be denoted by D. We will denote by ⊥ the least element and ⊤ the greatest element, if they exist. A poset with a least element will be called a pointed poset, and denoted by (D, ⊑, ⊥). Note that any poset can be transformed into a pointed poset by adding a new element that is smaller than everyone. A cpo is a poset that is complete, that is, every increasing chain of elements (Xi)i∈I , i j =⇒ Xi ⊑ Xj (where the ordinal I may be greater than ω) has a least upper bound, i∈I Xi, which is called the limit of the chain. Note that a cpo is always pointed as the least element can be defined DEF ∅.
Lattices. A lattice (D, ⊑, ⊔, ⊓) is a poset where each pair of elements a, b ∈ D has a least upper bound, denoted by a ⊔ b, and a greatest lower bound, denoted by a ⊓ b. A lattice is said to be complete if any set D D has a least upper bound. A complete lattice is always a cpo; it has both a least element ⊥ DEF ∅ and a greatest element ⊤ DEF = = D; also, each
set D D has a greatest lower bound d D DEF= { X ∈ D | ∀Y ∈ D, X ⊑ Y }. In this case, it is denoted by (D, ⊑, ⊔, ⊓, ⊥, ⊤). An important example of complete lattice is the power-set (P(S), , ∪, ∩, ∅, S) of any set S.
Structure Lifting. If (D, ⊑, ⊔, ⊓, ⊥, ⊤) is a complete lattice (resp. poset, cpo, lattice) and S is a set, then (S → D, ⊑˙, ⊔˙, ⊓˙, ⊥˙, ⊤˙) is also a complete lattice (resp. poset, cpo, lattice) the function that maps X to the value of the expression expr where X is a free variable. We will sometimes use the explicit notation [X1 expr 1, . . . , Xn expr n] to denote the application that associates the value of expr i to Xi. We will also denote by f [X expr ] the application equal to f , except that it maps X to the value of expr instead of f (X). Finally, DEF ∈D1→D2 we denote by Id the identity application: Id = .X . An application F between two posets (D1, ⊑1) and (D2, ⊑2) is said to be monotonic if X ⊑1 Y =⇒ F (X) ⊑2 F (Y ). It is said to be strict if F (⊥1) = ⊥2. An operator F ∈ D → D, that is, a function from a poset D to the same poset, is said to be extensive if ∀X, X ⊑ F (X). A monotonic application F ∈ D1 → D2 that preserves the existing limits of increasing chains (Xi)i∈I , i.e., F ( 1{ Xi | i ∈ I }) = 2{ F (Xi) | i ∈ I } whenever 1{ Xi | i ∈ I } exists, is said to be continuous. If F preserves existing least upper bounds (resp. greatest lower bounds), i.e., F ( 1 D) = 2 { F (X) | X ∈ D }, it is said to be a complete ⊔ morphism (resp. ⊓ morphism). Note that complete ⊔ morphisms and complete ⊓ morphisms are always monotonic. If F is an operator and i is an ordinal, we will denote by F i(X) F ’s
i th iterate on X which is defined by transfinite induction as F 0(X) = X, F +1(X) = DEF DEF
DEF F (X) when is a limit ordinal.
F (F (X)), and F (X) = Fixpoints. A fixpoint of an operator F is an element X such that F (X) = X. We denote by lfpX F the least fixpoint of F that is greater than X, if it exists, and by gfpX F the greatest fixpoint of F smaller than X. We also denote by lfp F and gfp F the least and DEF DEF greatest fixpoints of F , if they exist: lfp F = lfp⊥ F and gfp F = gfp⊤ F . A pre-fixpoint is such that F (X) ⊒ X, and a post-fixpoint X is such that F (X) ⊑ X. In particular, ⊥ is a pre-fixpoint for all operators, and ⊤ is a post-fixpoint for all operators. We now recall two fundamental theorems about fixpoints of operators in ordered structures:
Theorem 2.1.1. Tarskian fixpoints.
The set of fixpoints of a monotonic operator F in a complete lattice is a complete lattice. d Dually, gfpX F = { F ’s pre-fixpoints smaller than X }.
Theorem 2.1.2. Kleenian fixpoints. If F is a monotonic operator in a cpo and X is a pre-fixpoint for F , then F i(X) is stationary at some ordinal and lfpX F = F (X). 2. If, moreover, F is continuous, then lfpX F = F ω (X).
Antoine Mine´ Weakly Relational Numerical Abstract Domains

Abstract Interpretation Primer

A core principle in the Abstract Interpretation theory is that all kinds of semantics can be expressed as fixpoints of monotonic operators in partially ordered structures, would it be operational, denotational, rule-based, axiomatic, based on rewriting systems, transition systems, abstract machines, etc. Having all semantics formalised in a unified framework allows comparing them more easily. Beside comparing already existing semantics, Abstract Interpretation allows building new semantics by applying abstractions to existing ones. Abstractions are themselves first-class citizens that can be manipulated and composed at will.
A key property of the semantics designed by abstraction is that they are guaranteed to be sound, by construction. Thus, a sound and fully automatic static analyser can be designed by starting from the non-computable formal semantics of a programming language, and composing abstractions until the resulting semantics is computable.

READ Computational complexity of the shapelet discovery

Galois Connection Based Abstract Interpretation

Galois Connections. Let D♭ and D♯ be two posets1 used as semantic domains. A Galois connection, as introduced by Cousot and Cousot in [CC77], between D♭ and D♯ is a function pair (, ) such that:
Definition 2.2.1. Galois connection.
:D♭→D♯,
:D♯→D♭,
∀X♭, X♯, (X♭) ⊑♯ X♯ ⇐⇒ X♭ ⊑♭ (X♯).
This is often pictured as follows: D♭→ D♯ .
An important consequence of Def. 2.2.1 is that both and are monotonic [CC92a, § 4.2.2]. Also, we have ( )(X♯) ⊑♯ X♯ and X♭ ⊑♭ ( )(X♭). The fact that (X♭) ⊑♯ X♯, 1From now on, a poset (Dx , ⊑x ) will only be refereed to as Dx when there is no ambiguity, that is, when there is only one partial order of interest on Dx . The same also holds for cpo, lattices, and complete lattice: the same superscript X as the one of the set Dx is used when talking about its order ⊑x , lub ⊔x , glb ⊓x , least element ⊥x , and greatest element ⊤x , when they exist.
Domaines num´eriques abstraits faiblement relationnels Antoine Mine´ or equivalently that X♭ ⊑♭ (X♯), will formalise the fact that X♯ is a sound approximation (or sound abstraction) of X♭. D♭ will be called the concrete domain; D♯ will be called the abstract domain; will be called the abstraction function; will be called the concretisation
function. Moreover, (X♭) will be the best abstraction for X♭.
Galois Insertions. If the concretisation is one-to-one or, equivalently, is onto, or = Id , then (, ) is called a Galois insertion and pictured as follows: D♭→→ D♯ .
Designing an abstract domain linked to the concrete one through a Galois insertion corresponds to choosing, as abstract elements, a subset of the concrete ones and, as the abstract order, the order induced by the concrete domain. If we impose only the existence of a general Galois connection, then one concrete element can be represented by several, possibly incomparable, abstract elements.
Canonical Abstractions and Concretisations. Sometimes, it is not necessary to de-fine both the concretisation and the abstraction functions in a Galois connection as the missing function can be synthesised in a canonical way. We can use the following theorem:

Concretisation-Based Abstract Interpretation

Imposing the existence of a Galois connection between the concrete and the abstract do-mains is sometimes too strong a requirement and we will see several abstraction examples for which no function exists (see Sects. 2.4.6 and 2.4.7). In [CC92b], Cousot and Cousot explain how to relax the Galois connection framework in order to work only with a concreti-sation operator — or, dually, only an abstraction operator ; however, concretisation-based abstract interpretation is much more used in practice.
Concretisations. Let D♭ and D♯ be two posets. A concretisation is simply a monotonic function : D♯ → D♭. X♯ is said to be an abstraction for X♭ if (X♯) ⊒♭ X♭. Note that, if there is no simple way to compare two abstract elements, we can always fall back to the
“coarse” partial order defined by X♯ ⊑♯ Y ♯ DEF ⇐⇒ X♯ = Y ♯ which makes every function automatically monotonic.

Table of contents :

1 Introduction
1.1 Motivation
1.2 Key Concepts
1.3 Overview of the Thesis
1.4 Our Contribution
2 Abstract Interpretation of Numerical Properties
2.1 General Definitions
2.2 Abstract Interpretation Primer
2.2.1 Galois Connection Based Abstract Interpretation
2.2.2 Concretisation-Based Abstract Interpretation
2.2.3 Partial Galois Connections
2.2.4 Fixpoint Computation
2.2.5 Chaotic Iterations
2.2.6 Reduced Product
2.2.7 Related Work in Abstract Interpretation Theory
2.3 The Simple Language
2.3.1 Language Syntax
2.3.2 Concrete Semantics
2.3.3 A Note on Missing Features
2.4 Discovering Properties of Numerical Variables
2.4.1 Numerical Abstract Domains
2.4.2 Abstract Interpretor
2.4.3 Fall-Back Transfer Functions
2.4.4 Non-Relational Abstract Domains
2.4.5 Overview of Existing Numerical Abstract Domains
2.4.6 The Interval Abstract Domain
2.4.7 The Polyhedron Abstract Domain
2.5 The Need for Relational Domains
2.6 Other Applications of Numerical Abstract Domains
3 The Zone Abstract Domain
3.1 Introduction
3.2 Constraints and Their Representation
3.2.1 Constraints
3.2.2 Representations
3.2.3 Lattice Structure
3.3 Canonical Representation
3.3.1 Emptiness Testing
3.3.2 Closure Operator
3.3.3 Closure Algorithms
3.3.4 Incremental Closure
3.4 Set-Theoretic Operators
3.4.1 Equality Testing
3.4.2 Inclusion Testing
3.4.3 Union Abstraction
3.4.4 Intersection Abstraction
3.5 Conversion Operators
3.5.1 Conversion Between Zones and Intervals
3.5.2 Conversion Between Zones and Polyhedra
3.6 Transfer Functions
3.6.1 Forget Operators
3.6.2 Assignment Transfer Functions
3.6.3 Test Transfer Functions
3.6.4 Backwards Assignment Transfer Functions
3.7 Extrapolation Operators
3.7.1 Widenings
3.7.2 Interactions between the Closure and our Widenings
3.7.3 Narrowings
3.8 Cost Considerations
3.8.1 Ways to close
3.8.2 Hollow Representation
3.9 Conclusion
4 The Octagon Abstract Domain
4.1 Introduction
4.2 Modified Representation
4.2.1 Octagonal Constraints Encoding
4.2.2 Coherence
4.2.3 Lattice Structure
4.3 Modified Closure Algorithms
4.3.1 Emptiness Testing
4.3.2 Strong Closure
4.3.3 Floyd–Warshall Algorithm for Strong Closure
4.3.4 Incremental Strong Closure Algorithm
4.3.5 Integer Case
4.4 Operators and Transfer Functions
4.4.1 Adapted Set-Theoretic Operators
4.4.2 Adapted Forget Operator
4.4.3 Adapted Conversion Operators
4.4.4 Adapted Transfer Functions
4.4.5 Adapted Extrapolation Operators
4.5 Alternate Octagon Encodings
4.5.1 Efficient Representation
4.5.2 Adapted Hollow Form
4.6 Analysis Examples
4.6.1 Decreasing Loop
4.6.2 Absolute Value
4.6.3 Rate Limiter
4.7 Related Works and Extensions
4.8 Conclusion
5 A Family of Zone-Like Abstract Domains
5.1 Introduction
5.2 Constraint Matrices
5.2.1 Representing Constraints
5.2.2 Previous Work on Closed Half-Rings
5.2.3 Acceptable Bases
5.2.4 Adapted Closure Algorithm
5.2.5 Galois Connection
5.3 Operators and Transfer Functions
5.3.1 Set-Theoretic Operators
5.3.2 Forget and Projection Operators
5.3.3 Transfer Functions
5.3.4 Extrapolation Operators
5.4 Instance Examples
5.4.1 Translated Equalities
5.4.2 Retrieving the Zone Domain
5.4.3 Zones With Strict Constraints
5.4.4 Integer Congruences
5.4.5 Rational Congruences
5.4.6 Unacceptable Bases
5.5 Conclusion
6 Symbolic Enhancement Methods
6.1 Introduction
6.2 Linearisation
6.2.1 Interval Linear Forms
6.2.2 Interval Linear Form Operators
6.2.3 From Expressions to Interval Linear Forms
6.2.4 Multiplication Strategies
6.2.5 From Expressions to Quasi-Linear Forms
6.2.6 Extending Numerical Expressions
6.3 Symbolic Constant Propagation
6.3.1 Motivation
6.3.2 Symbolic Constant Propagation Domain
6.3.3 Interaction With a Numerical Abstract Domain
6.3.4 Substitution Strategies
6.3.5 Cost Considerations
6.3.6 Interval Linear Form Propagation Domain
6.3.7 Comparison with Relational Abstract Domains
6.4 Conclusion
7 Analysis of Machine-Integer and Floating-Point Variables
7.1 Introduction
7.2 Modeling Machine-Integers
7.2.1 Modified Syntax and Concrete Semantics
7.2.2 Adapted Abstract Semantics
7.2.3 Analysis Example
7.3 Using Machine-Integers in the Abstract
7.3.1 Using Regular Arithmetics
7.3.2 Using Saturated Arithmetics
7.4 Modeling IEEE 754 Floating-Point Numbers
7.4.1 IEEE 754 Representation
7.4.2 IEEE 754 Computation Model
7.4.3 Linearising Floating-Point Expressions
7.4.4 Analysis Example
7.5 Using Floating-Point Numbers in the Abstract
7.5.1 Floating-Point Interval Analysis
7.5.2 Floating-Point Linearisation
7.5.3 Floating-Point Zones and Octagons
7.5.4 Convergence Acceleration
7.6 Conclusion
8 Application to the Astr´ee Static Analyser
8.1 Introduction
8.2 Presentation of Astr´ee
8.2.1 Scope of the Analyser
8.2.2 History of the Analyser
8.2.3 Design by Refinement
8.3 Brief Description
8.3.1 Implementation
8.3.2 Analysis Steps
8.3.3 Abstract Domains
8.3.4 Partitioning Techniques
8.4 Integrating the Octagon Abstract Domain
8.4.1 Implementation Choices
8.4.2 Octagon Packing
8.4.3 Analysis Results
8.5 Integrating the Symbolic Constant Propagation
8.5.1 Implementation Choices
8.5.2 Analysis Results
8.6 Extrapolation Operators
8.7 Conclusion
9 Conclusion