Control-Flow Analysis

Control-Flow Analysis is a popular technique for performing static analysis of many different kinds of programming languages. It’s most often needed in cases where you have some kind of dynamic dispatch: either where you have first-class functions or when you have objects and you call one of their methods.

Imagine for a moment that you were given a program which you were asked to analyze manually. You might start by going to the top of the program, running through each branch, and keeping track of the values variables could take. If the program is too complex to hold entirely in your head, you might start by writing down some abstractions to simplifiy remembering. For example, instead of remembering that the variable x holds the value of 12, you might just remember that it is a number. Thus, when you saw some assignment to x, such as x := x + 1, you could skip over that and just think that x is still a number.

If you came across a loop, you probably wouldn’t trace each execution through the loop: just one or two passes would be enough to tell you some interesting facts. For example:

i := 0
while i < 10 {
  print "i is {i}"

For a simple loop like this, it’s easy to show how the program will always make progress and complete the loop. For more complicated loops, proving progress might be impossible. (It might also be wrong: we do get programs with infinite loops.)

In these cases, we can just check to see if we’ve returned to a state that’s identical to a state that we’ve seen before: if x was a number, it should still be a number, etc. If the variables' (abstract) values are the same, we can conclude that there might be a loop and move on. It’s not guaranteed to be accurate, but it is a strategy that’s guaranteed to terminate.1

There’s more to CFA than what I’ve outlined here, but this should give you an idea. Stay tuned for more!

  1. This is the classic completeness/consistency trade-off introduced by Kurt Gödel. A consistent evaluation of a program leaves in a state where there are programs that we cannot compute because we can’t tell if they halt. (The Halting Problem) A complete analysis, which is what we are interested in here, must sacrifice on consistency, and return inaccurate, though still useful results. ↩︎