Bridging the gap with partitioning: the ℘-CLIP abstraction

Get Complete Project Material File(s) Now! »

Improving precision of the abstract fixpoint computation

Several techniques can be employed to improve the precision of the analysis for Program 2.1, some of which will be presented in the remainder of this chapter.
Delayed widening. The fixed point computation presented above resorted to widening as soon as it started iteration computation. We could soundly and without loosing computability remove this widening computation for any finite number of iterations.
Loop unrolling. This improvement can be described as a syntactic code transformation. As implied by its name, this technique separates the first loop iterations from one another. As an example loop would be replaced by the code fragment from Program 2.2. Applying this pro-gram transformation, we would have obtained the following more precise invariant: S♯J K(S♯0) = (a 7→[0; 0], b 7→[0; ∞[, c 7→[0; 2]) at the end of the first loop.
For the presentation of the following two techniques we assume our language extended with some features so that Program 2.3 is part of our language. We provide neither the concrete nor the abstract semantics of added statements as it is straightforward. Widening with thresholds. One of the biggest source of imprecision for static analysis by abstract interpretation is the use of widening operations to converge to an over-approximation of the least fixed point. As we have seen in our examples, jumps are performed from abstract elements representing two concrete values [0; 1] to the unbounded abstract element [0; ∞[. We have seen that unrolling loops or delaying widening could prevent this when the number of loop iterations performed in the concrete are relatively low. Widening with thresholds consists in adding several stops (a finite number) before over-approximating to ∞. As an example consider a program incrementing an integer while it is lower that 123 (Program 2.3), and the set of thresholds {10, 100, 1000}. The first iterations of the abstract fixpoint computation would go as follow:
[0; 0]▽[0; 1] = [0; 10].
[0; 10]▽[0; 11] = [0; 100].
[0; 100]▽[0; 101] = [0; 1000].
[0; 1000] is then stable and can soundly be used as a loop invariant. Note that the number of iterations depends upon the number of thresholds and not on the actual value 123 from the program. Of course it is interesting to gather widening thresholds from the program syntax (e.g. 123 here) and even more interesting to gather them dynamically during the analysis.
Decreasing iterations. Once again consider Program 2.3. Assume that we start an analysis with the hypothesis that (i 7→[0; 100]), following the fixpoint computation introduced above we would obtain the following loop invariant: (i 7→[0; ∞[). This represents an over-approximation of the actual fixpoint, that may be refined. This improvement can be obtained by applying F one more time and intersecting the results with the abstract fixpoint. This can be seen as continuing the iterations of F without widening. This process can then be iterated. Note however that it might not converge. For this reason we define a narrowing operator that ensures convergence.
Definition 2.17 (Narrowing operator). Given a lattice (A, v, t, u, >, ⊥) we say that a binary operator △∈ A × A → A is a narrowing operator if: ∀x, y ∈ A, x u y v x △ y v x.

The octagon abstract domain [Min01b]

We have presented two numerical abstract domain: the interval domain and the polyhedra do-main. The interval abstraction provides a low-cost, low-precision abstraction, whereas the poly-hedra abstraction is more precise, but exponentially (in the number of dimensions of the concrete environment) more costly. In this subsection we present the octagon domain [Min01b], this do-main is more costly that the interval domain as it is relational, however it is not able to represent every linear relations between program variables as for the polyhedra domain. The octagon domain contains elements that are a conjunction of constraints of the form ±x ± y c. The concretization of such a set of constraints is defined as for the polyhedra abstraction. The num-ber of irredundant constraints in a polyhedra is unbounded, whereas octagons on n dimensions can be restricted to 2n2 constraints and retain their maximal level of precision (as subsequent constraints would necessarily be redundant). Abstract transformers with a O(n3) time complex-ity have been devised for the octagon abstract domain.
The octagon domain is able to express the most precise loop invariant of the first loop of Program 2.1: a + c 0 ∧ −a − c 0. However the complete program invariant can not be expressed by the octagon domain.
The octagon domain is strictly more expressive than the interval domain. Indeed x ∈ [a; b] can be expressed with the two octagonal constraints: x + x 2b and −x − x −2a. As a conclusion to the presentation of the three abstract domains: intervals, octagons, and polyhedra, Figure 2.5 provides the classical graphical comparison between these three domains. This figure underlines that polyhedra are more precise than octagons, themselves more precise than intervals.

Domain Composition

The global abstraction of an analysis instance is constructed at run-time by composing domains according to a user-given configuration file. An illustrative example is depicted in Figure 3.1 representing a simplified version of an analysis of C. The configuration starts with a long sequence of iterators, C program to C goto , i.e. state-less domains that handle individual parts of the C compound syntax by induction, including loops, switch, goto, etc. The configuration merges domains reused from the Universal toy-language (underlined) and C-specific domains. Following these iterators, the C analysis contains a composition of domains that handle atomic statements such as assignments and tests. Dynamic memory is handled by heap using recency abstraction [BR06]: each allocation site is associated with at most two abstract blocks, one representing the lastly allocated block at this site, and one representing all the blocks allocated before — this domain could be easily replaced with any domain that partitions the possibly unbounded set of allocated blocks into a bounded set of abstract blocks. Each variable or abstract heap block is then decomposed into a set of virtual variables, called cells, of scalar type, by C cells . In order to handle transparently union types and type-punning, we use the cell abstraction from [Min06a], where the decomposition is adapted dynamically according to the actual access pattern during the execution (rather than based on the static type, which can be deceiving), a more in-depth presentation of the cell abstraction will be provided in Chapter 4. The cell domain is composed, using a reduced product, with the C smash abstraction. This abstraction summarizes several
cells into one, this is particularly useful for the representation of arrays. Domain C pointers is then used to handle assignments and tests on pointers, while the numerical statements are delegated to an interval numerical abstraction boxes lifted to a semantics of machine integers.
Three types of generic composers are used to connect leaf domains:
Iterators. This composer iterates sequentially over a set of domains, which is materialized with in Figure 3.1. Lattice operators are defined pointwise, and transfer functions are called in sequence until one succeeds: domain i is invoked only if previous domains j < i returned None. This composition is useful to combine a set of iterators defining transfer functions for disjoint parts of the AST, such as loops, function calls, etc. In addition, it allows plugging new iteration schemes easily, e.g., replacing the induction-based iterators with CFG-based ones.
Reduced products. The classic method employed in most static analyzers (e.g., [BCC+10]) for creating reduced products is via cascading binary functors. MOPSA provides instead a generic n-ary reduced product composer that enables better propagation of reduction channels. Lattice operators are defined pointwise and transfer functions of argument domains are called in par-allel. All post-conditions are given to a user-defined reduction operator, that can access directly abstract elements of any domain in the pool and can exploit the published reduction channels to refine the final post-condition.
Stacks. Another novelty in MOPSA is a stacked reduced product of n functors sharing the same subordinate domain instance. This form of composition is useful when several domains depend on an external domain to delegate management of parts of their abstract state. Consider the case of cell and smashing abstractions from [Min06a, BCC+10] in Figure 3.1. Both domains need a pointer domain for address resolution and a numeric domain for abstracting numeric environ-ments. By sharing the same numeric domain instance, relations between cells and smashed variables can be inferred. Please note that for presentation purposes and as this was not cru-cial to the understanding of the results of subsequent sections, the signatures presented in this section have been simplified w.r.t. the actual implementation, we refer to [MOJ18] for a more complete presentation. The signature of exec from Program 3.1 does not actually permits the use of sharing under a reduced product. Indeed, when a domain C is shared by two domains A and B, combined in a reduced product, the handling of a statement by A and B might generate two different sets of statements, propagated to C, thus yielding two different elements from C. These are merged back together by a function merge defined by C, taking as arguments the two diverging abstract elements and the log of all statements propagated to C by A and by B.

READ Laboratory investigations of the foraging behaviour of New Zealand scampi

Galois connection with the Cell abstract domain

The String abstract domain is an abstraction of the Cell abstract domain. Indeed we forget information that do not help us track the position of the first ‘\0’ character. We define the Galois connection between the Cell domain and the String domain using two functions : to_cell and from_cell. The to_cell(s, S♯) function computes the range of sl in the numeric abstract domain, for each possible length value we set the cells placed before (resp. at) the length to [1; 255] (resp. 0), this yields an abstract element per possible value in the range, those are then joined. Conversely from_cell(s, S♯) computes the minimum length value (the index of the first cell whose range contains 0), and the maximum length value (the index of the first cell whose range is exactly {0}, it is the size of the buffer if no such index is found), finally those constraints are added to the numerical domain. If a string does not contain any ‘\0’ character, we define its length to be the allocated size of the buffer it is contained in. The cell abstract domain provides an add_cells, which given a set of cells and an abstract element adds the set of cells to the abstract element, by collecting potential constraints.

Table of contents :

1 Introduction
1.1 Program verification
1.1.1 Tests
1.1.2 Model Checking
1.1.3 Program proving
1.1.4 Abstract interpretation
1.2 Content of the thesis
1.3 Outline of the thesis
2 Background – Abstract interpretation
2.1 Notations
2.2 Order theory
2.2.1 Order and Galois connections
2.2.2 An example of Galois connection: intervals [CC76]
2.2.3 Operators, fixpoint theorems
2.2.4 Widening
2.3 Abstract Interpretation
2.3.1 By example
2.3.2 Improving precision of the abstract fixpoint computation
2.3.3 Composing abstract domains
2.3.4 Concretization only framework
2.4 Numerical Domains
2.4.1 Definitions
2.4.2 The polyhedra abstract domain [CH78]
2.4.3 The octagon abstract domain [Min01b]
2.4.4 Notations
2.5 Conclusion
3 MOPSA
3.1 Introduction
3.1.1 Motivations
3.1.2 Outline of the chapter
3.2 Unified Domains Signature
3.2.1 Lattice
3.2.2 Managers
3.2.3 Flows
3.2.4 Evaluations
3.2.5 Queries
3.3 Domain Composition
3.4 Experiments
3.5 Conclusion
4 String abstraction
4.1 Introduction
4.1.1 Motivations
4.1.2 Outline of the chapter
4.2 Syntax and concrete semantics
4.3 Background – Cell abstract domain
4.4 String abstract domain
4.4.1 Domain definition
4.4.2 Galois connection with the Cell abstract domain
4.4.3 Operators
4.4.4 Abstract evaluation
4.4.5 Abstract transformations
4.4.6 String declaration
4.4.7 Dynamic memory allocation
4.5 Related works
4.6 Conclusion
5 Modular analysis
5.1 Introduction
5.1.1 Motivations
5.1.2 Outline of the chapter
5.2 General remarks
5.3 Using relational domains
5.4 Building the summary functions
5.4.1 Summaries
5.4.2 Inferring and proving summaries
5.4.3 Improvements
5.5 Preliminary experimental evaluation
5.6 Related works
5.7 Conclusion
6 Numerical Abstraction
6.1 Introduction
6.1.1 Homogeneous definition sets
6.1.2 Fixed finite definition set
6.1.3 Outline of the chapter.
6.2 Concrete semantics
6.2.1 Definition
6.2.2 Example of use
6.3 State of the art
6.3.1 Partitioning
6.3.2 Encoding of optional variables via avatars
6.3.3 A folk technique for non relational domains
6.4 The CLIP abstraction
6.4.1 Abstraction definition
6.4.2 Composing inclusion and projection
6.4.3 Operator definitions
6.4.4 Abstract transformers
6.5 The integer case
6.5.1 Losing soundness
6.5.2 Losing parametricity
6.6 Bridging the gap with partitioning: the ℘-CLIP abstraction
6.6.1 Abstraction definition
6.6.2 Set difference between SVIs
6.6.3 Operator definitions
6.6.4 Widening operator
6.6.5 Abstract transformers
6.6.6 Conclusion
6.7 Unbounded maps
6.8 Conclusion
7 Tree abstraction
7.1 Introduction
7.1.1 Trees and terms
7.1.2 Outline of the chapter
7.2 Syntax and concrete semantics
7.2.1 Terms and numerical terms
7.2.2 Syntax of the tree manipulating language
7.2.3 Concrete semantics of the language
7.2.4 Example
7.3 Background – (Tree) regular languages
7.3.1 Regular languages
7.3.2 Tree regular languages
7.3.3 Implementation remarks
7.4 State of the art
7.4.1 Using (Tree) Regular languages
7.4.2 Tree languages
7.4.3 Numerical terms
7.5 Natural term abstraction by tree automata
7.5.1 Value abstraction
7.5.2 Abstract transformers
7.5.3 Environment abstraction
7.6 Natural term abstraction by numerical constraints
7.6.1 Value abstraction
7.6.2 Abstract semantics of operators
7.7 Building up the full abstraction
7.7.1 Product
7.7.2 Removing redundant information
7.7.3 Factoring away the numerical component
7.7.4 Functional evaluation of tree expressions
7.8 Implementation and example
7.9 Conclusion
8 Conclusion
8.1 Summary of the thesis
8.2 Concluding example
8.3 Future works