This writeup is about the sneaky things compilers do.

There's no general solution to the halting problem, but there are certain special cases where you can sneak up on it and give it a wedgie, at least. When you're writing a compiler, this becomes interesting, because it lets you generate code that does certain things a lot less often than it might otherwise need to. Consider the following example:

    int i = 0;

    if ( i ) {
        /* Do something improbable */
    }

Assume that nothing has been elided there. i is a variable, but when execution reaches the if, there's only one value that i can possibly have. At that point, it is for all intents and purposes1 a constant with a known value. Therefore, you can skip compiling that conditional (thus saving a cycle or two at runtime) and everything in the block dependent on it (thus shrinking the code a bit). That example is nothing much, but it's a very simple example. Consider:

    for ( i = 0; i < 0x7fffffff; ++i ) {
        a[ i ].foo = i;
        a[ i ].bar = "baz";
        a[ i ].baz = strlen( a[ i ].bar );
        a[ i ].whatever = i & 1;
    }

A big chunk of what that loop is doing is pointer arithmetic and dereferencing pointers. A smart compiler should notice that a[i] appears five times in the body of the loop, during which nobody changes the value of i. The code it generates should add i to a once and then hang on to that. It should also notice that the argument being passed to strlen is always the same. If it knows with certainty what strlen does, it can get rid of 2,147,483,646 redundant calls to strlen — or better yet, it can generate code that never calls strlen at all: The length of the string "baz" is known to the compiler up front. The example is contrived, but the principle is valid. Naturally, this all becomes more problematic in late-binding languages where "strlen" might refer to damn near anything by the time you get there. That's the "certainty" part. One of the (lesser) reasons statically-typed languages can be more efficient than their free-and-easy brethren is that there's a lot more certainty for the compiler to work with. It's also generally true that function calls, especially into libraries, make life difficult because of potential side effects and unpredictable return values.

You might say that data flow analysis is the subtle art of recognizing the obvious when it bites you in the ass. If you can teach a compiler to do that, you're doing okay.




1 There is no such thing as an "intensive" purpose. Those who imagine otherwise will be shot.