313
edits
Changes
no edit summary
This is a work in progress to explain the static program analysis technique of '''Abstract Interpretation''', which is basic background for using the Treehydra abstract interpretation libraries.
=Introduction= ==Background: Program Analysis==
'''Program analysis''' refers to algorithms for discovering facts about the behavior of programs. Examples include finding which variables are constant, whether null pointer errors can occur, or whether every "lock" operation is followed by exactly one "unlock". Program analysis is a basic technology with many applications, especially compiler optimizations, bug finding, and security auditing.
The goal of static analysis is to solve this problem by attaining complete coverage of the program: using special algorithms to analyze the behavior over all possible inputs. The price is that the results are only approximate. (Discovering exact behavior for all inputs would solve the halting problem.) But the results only have to be approximate in one direction. That is, you can create a bug-finding analysis that has false positives (sometimes reports a bug where there is none) but does discover all bugs. And you can also create a version that has no false positives but might miss some real bugs. There is a lot of design freedom in how to approximate, and designing good approximations is a key part of the art of static analysis. The goal is to be precise where it counts for your results, but approximate elsewhere, so the analysis runs fast.
=Introduction= Abstract interpretation is a basic static analysis technique. We will describe it informally, focusing on the ideas and practical applications, and leaving the mathematical underpinnings aside. We will focus on '''intraprocedural''' abstract interpretation, i.e., analyzing one function at a time, rather than analyzing the whole program together, because whole-program analysis is difficult to apply to a large codebase like Mozilla. We're going to explain abstract interpretation using an example analysis problem called ''constant propagation''. The goal is to discover at each point in the function whether any of the variables used there have a constant value, i.e., the variable has the same value no matter what inputs are provided and no matter what code path was followed. This analysis is used by compilers for the constant propagation optimization, which means replacing constant variables with a literal value. (This may save a load of the variable, and it also removes a use of the variable, which can enable other optimizations.) Here is an example C++ function: int foo(int a, int b) { int k = 1; // k is constant: 1 int m, n; if (a == 0) { ++k; // k is constant: 2 m = a; n = b; } else { k = 2; // k is constant: 2 m = 0; n = a + b; } return k + m + n; // k is constant: 2 } =Concrete Interpretation==