Changes

Jump to: navigation, search

Abstract Interpretation

6,391 bytes added, 22:58, 15 May 2008
Concrete Interpretation
=Concrete Interpretation=
 
To explain abstract interpretation, we'll start with concrete interpretation, which is just the normal kind you are already familiar with. We'll use it to build up to abstract interpretation.
 
We could imagine trying to guess which variables are constant just by calling the function with different inputs, and finding which variables are constant over all those runs. Let's start by running with "a = 0, b = 7" and trace the run:
 
(instruction) (interpreter state after instruction)
ENTRY a = 0, b = 7
k := 1 a = 0, b = 7, k = 1
COND a == 0 (TRUE)
k := k + 1 a = 0, b = 7, k = 2
m := a a = 0, b = 7, k = 2, m = 0
n := b a = 0, b = 7, k = 2, m = 0, n = 7
t1 := k + m a = 0, b = 7, k = 2, m = 0, n = 7, t1 = 2
ret := t1 + n a = 0, b = 7, k = 2, m = 0, n = 7, t1 = 2, ret = 9
RETURN ret
EXIT
 
So k = 2 just before its use in 't1 := k + m'. We can run with many other inputs and get the same result. But testing can never tell us that k = 2 for all inputs. (Well, actually it can, but only if you can run <math>2^64</math> test cases.)
 
=Approaching Abstract Interpretation=
 
But if you look at the function, you can see that for k, there are really only two important cases: a == 0, and a != 0, and b doesn't matter. So let's (on paper) run two tests: one with input "a = 0, b = ?" and the other with "a = NZ, b = ?", where '''NZ''' means "any nonzero value" and '''?''' means "any value". First, a = 0:
 
(instruction) (interpreter state after instruction)
ENTRY a = 0, b = ?
k := 1 a = 0, b = ?, k = 1
COND a == 0 (TRUE)
k := k + 1 a = 0, b = ?, k = 2
m := a a = 0, b = ?, k = 2, m = 0
n := b a = 0, b = ?, k = 2, m = 0, n = ?
t1 := k + m a = 0, b = ?, k = 2, m = 0, n = ?, t1 = 2
ret := t1 + n a = 0, b = ?, k = 2, m = 0, n = ?, t1 = 2, ret = ?
RETURN ret
EXIT
 
It looks pretty much like the concrete case, except that we now have some abstract values, NZ and '?', that represent sets of concrete values.
 
We also have to know what operators do with abstract values. For example, the last step, "ret := t1 + n", evaluates to "ret := 2 + ?". To figure out what this means, we reason about sets of values: if '?' can be any value, then 2 + '?' can also be any value, so 2 + '?' -> '?'.
 
The other test is "a = NZ, b = ?":
 
(instruction) (interpreter state after instruction)
ENTRY a = NZ, b = ?
k := 1 a = NZ, b = ?, k = 1
COND a == 0 (FALSE)
k := 2 a = NZ, b = ?, k = 2
m := 0 a = NZ, b = ?, k = 2, m = 0
n := a + b a = NZ, b = ?, k = 2, m = 0, n = ?
t1 := k + m a = NZ, b = ?, k = 2, m = 0, n = ?, t1 = 2
ret := t1 + n a = NZ, b = ?, k = 2, m = 0, n = ?, t1 = 2, ret = ?
RETURN ret
EXIT
 
At this point, we've tested with every possible input, and we've tested every path, so our results are complete. This proves that k = 2 is in fact always true just before "t1 := k + m".
 
We're definitely getting somewhere, but the procedure we just followed doesn't generalize. We knew exactly what abstract values to use for the test cases, but that was only because this was a simple function and we could see that by hand. This won't work for complicated functions and it's not automatic. We need to make things simpler.
 
Another problem is that we are analyzing each path through the function individually. But a function with <math>k</math> if statements can have up to <math>2^k</math> paths, and a function with loops has an infinite number of paths. So on real programs, we'll experience path explosion and we won't get complete coverage.
 
=Abstract Interpretation by Example=
 
One problem with our first approach to abstract interpretation was that we don't know how to pick what sets abstract values to use for input test cases. So let's not. Instead, let's just do one test with any input at all: "a = ?, b = ?":
 
(instruction) (interpreter state after instruction)
ENTRY a = ?, b = ?
k := 1 a = ?, b = ?, k = 1
COND a == 0
 
Now what? We have no information about a, so we don't know which branch to take. So let's do both. First, the true case:
 
(instruction) (interpreter state after instruction)
a = ?, b = ?, k = 1
k := k + 1 a = ?, b = ?, k = 2
m := a a = ?, b = ?, k = 2, m = ?
n := b a = ?, b = ?, k = 2, m = ?, n = ?
 
Then, the false case:
 
(instruction) (interpreter state after instruction)
a = ?, b = ?, k = 1
k := 2 a = ?, b = ?, k = 2
m := 0 a = ?, b = ?, k = 2, m = 0
n := a + b a = ?, b = ?, k = 2, m = 0, n = ?
 
At this point, the two execution streams rejoin. We could just keep running the streams separately, but we know that will lead to path explosion in general. So instead, let's merge the streams by merging the states. We need a single interpreter state that covers both of these states:
 
a = ?, b = ?, k = 2, m = ?, n = ?
a = ?, b = ?, k = 2, m = 0, n = ?
 
We can get it by merging the variables individually. As before, we use set reasoning. For example, with k, we say, k is 2 on the left and on the right, so k = 2 in the merged state. With m, m could be anything if coming from the left, so even though it must be 0 if coming from the right, m could be anything in the merged state. The result:
 
a = ?, b = ?, k = 2, m = ?, n = ?
 
We can continue executing in one stream:
 
t1 := k + m a = ?, b = ?, k = 2, m = ?, n = ?, t1 = ?
ret := t1 + n a = ?, b = ?, k = 2, m = ?, n = ?, t1 = ?, ret = ?
RETURN ret
EXIT
 
And we get the answer we wanted, k = 2.
 
Notice that this time, we didn't have to divide up states at the start, we just started with no knowledge, which is perfectly general. Also, because we rejoin execution streams, in a loop-free method, we only have to interpret each instruction once, so we have completely solved the path explosion problem. (We haven't tried loops yet, and they are a bit harder--they require us to analyze statements more than once, but only finitely many times.)
313
edits

Navigation menu