Abstract Interpretation
This is a work in progress to explain the static program analysis technique of Abstract Interpretation, which is basic background for using the Treehydra abstract interpretation libraries.
Background: Program Analysis
Program analysis refers to algorithms for discovering facts about the behavior of programs. Examples include finding which variables are constant, whether null pointer errors can occur, or whether every "lock" operation is followed by exactly one "unlock". Program analysis is a basic technology with many applications, especially compiler optimizations, bug finding, and security auditing.
Program analysis has two major divisions: static analysis and dynamic analysis. Dynamic analysis basically means testing: you run the program on one or more test cases and observe its behavior. For example, you can run the program and see if you get any null pointer errors. The primary advantage of dynamic analysis is that you can get exact information about the program behavior, because you are actually running it. The disadvantage is that you get information only for the test inputs you provide, and only for the code paths exercised by those test inputs. Thus, dynamic analysis cannot prove that an optimization is safe or that a program is free of some class of security flaws.
The goal of static analysis is to solve this problem by attaining complete coverage of the program: using special algorithms to analyze the behavior over all possible inputs. The price is that the results are only approximate. (Discovering exact behavior for all inputs would solve the halting problem.) But the results only have to be approximate in one direction. That is, you can create a bug-finding analysis that has false positives (sometimes reports a bug where there is none) but does discover all bugs. And you can also create a version that has no false positives but might miss some real bugs. There is a lot of design freedom in how to approximate, and designing good approximations is a key part of the art of static analysis. The goal is to be precise where it counts for your results, but approximate elsewhere, so the analysis runs fast.
Introduction
Abstract interpretation is a basic static analysis technique. We will describe it informally, focusing on the ideas and practical applications, and leaving the mathematical underpinnings aside. We will focus on intraprocedural abstract interpretation, i.e., analyzing one function at a time, rather than analyzing the whole program together, because whole-program analysis is difficult to apply to a large codebase like Mozilla.
We're going to explain abstract interpretation using an example analysis problem called constant propagation. The goal is to discover at each point in the function whether any of the variables used there have a constant value, i.e., the variable has the same value no matter what inputs are provided and no matter what code path was followed. This analysis is used by compilers for the constant propagation optimization, which means replacing constant variables with a literal value. (This may save a load of the variable, and it also removes a use of the variable, which can enable other optimizations.)
Here is an example C++ function we want to analyze, with some comments showing constant variables.
int foo(int a, int b) {
int k = 1; // k is constant: 1
int m, n;
if (a == 0) {
++k; // k is constant: 2
m = a;
n = b;
} else {
k = 2; // k is constant: 2
m = 0;
n = a + b;
}
return k + m + n; // k is constant: 2
}
Control Flow Graphs
Abstract interpretation operates on a flowchart representation of the function called a control flow graph or CFG. Here is the CFG for the example function: