Just in time compilation converts a
program in an interpreted language
to native machine code just before
the program is run, in order to make
it run faster. Currently a popular
way to increase the speed of java
bytecode programs.

We all know that Java performance is worse than natively compiled code, right? Well, even though I've found this to generally be the case, occasionally a Java Zealot will tell me how good the HotSpot JIT is and how it can make code even faster than natively compiled code. Likewise with C# and .NET. Don't get me wrong, I have nothing against Java or C#, especially from a productivity point of view. But I've remained skeptical about claims of "better than native" performance.

However, I've finally come across a concrete example of a program that Java executes faster: a simple recursive Fibonacci number calculator. On my computer, the C version calculates the 42nd Fibonacci in about 10 seconds, and the Java version takes about 6 seconds. Out of curiosity, I tried it again, with the JIT turned off (with the -Xint command line option). Unsurprisingly, this slows down the Java version significantly; it takes almost a minute and a half. Obviously, that JIT is doing some serious work!

Of course, this is a completely contrived example. Even if someone needed to generate Fibonacci numbers, no one would write his or her code this way, particularly if performance was an issue. But at least it demonstrates that the JIT can be effective. This program certainly plays right into the strengths of the JIT: it is a small piece of code, run over and over. It can be compiled at runtime, then further refined with feedback from each run.

As a side note, it is worth noting that the techniques used with Java's JIT can also be used with more traditional languages like C. In fact, some advanced optimizing compilers can take profiling data as input to generate better code, mirroring at compile-time what HotSpot does at runtime. I just happen to not have such a compiler.


fib.c, the C program:

#include <stdio.h>

static int fib(int i) {
    if (i == 0 || i == 1)
        return 1;
    return fib(i - 1) + fib(i - 2);
}

int main(int argc, char *argv[]) {
    printf("%d\n", fib(42));
    return 0;
}

Fib.java, the Java program:

public class Fib {
    private static final int fib(int i) {
        if (i == 0 || i == 1)
            return 1;
        return fib(i - 1) + fib(i - 2);
    }

    public static void main(String args[]) {
        System.out.println(fib(42));
    }
}

As stated in the other writeups, a Just-In-Time (JIT*) compilation is "done during execution of a program – at run time – rather than prior to execution"1, primarily in order to improve performance of Interpreted Languages. Instead of re-writing the previous writeups, I would instead like to describe some of the subtleties, in the hope that the reader may gain an appreciation for the complexities involved in speeding up an interpreted language via a JIT. A small disclaimer: everything below is a gross over-simplication of the true realities of implementing a JIT.

 

The easiest way of dealing with run-time compilations is to do them synchronously. That is, when a method foo calls another method bar, the JIT first compiles bar and then allows the Virtual Machine (VM) to execute bar. This can be done in two ways: the application thread can block on a dedicated compilation thread, or the application thread can also play the role of the compilation thread, in essence blocking on the compilation. However, this has the downside of not only the added overhead of run-time compliations, but also the fact that no application code can get executed while the compilation is happening.

To get around this problem, the other way of doing run-time compilations is to do them asynchronously. That is, a method bar will get queued for compilation at some point during the execution of the program. Until the compiled body is available for execution, the VM will continue interpreting bar. Once the compiled body is available for execution, on the next invocation of bar, the VM will transfer execution over into the compiled body.

If C was an interpreted language, JIT compilers wouldn't be all that complicated with this approach (for the most part). However, the complexities arise from so called "Dynamic Languages", like Java. These languages, in particular Java, support features like Garbage Collection (GC**) and Class Loading/Unloading, among other things. This means that the VM, the GC, and the JIT need to always be on the same page. It also means that just naively implementing the aformentioned approach will lead to disaster.

 

Loosely speaking, having a GC means that a developer does not need to worry about freeing memory they may have allocated. It is the GC's job is to periodically sweep through the memory space and collect the bits of memory that are no longer being used. The true details of a GC are more interesting; see the writeups on garbage collection if interested in learning more. This gets more complicated once you stick a JIT into the whole mix.

Why? Well in order for the GC to do its job, it needs to know which objects are alive (ie have a reference to them) and which objects are dead (ie have no reference to them). When a thread executes JIT'd code, the VM has no state information whatsoever. If bar uses an object myObject actively, the GC should not "kill" this object. However, if the JIT does not coordinate with the GC, then the GC could very well free myObject from the memory space.

The solution to this is something known as a Write Barrier. It "is a fragment of code emitted by the compiler immediately before every store operation"2 It ensures that the GC is aware of objects that are still in use by JIT'd code. Failing to do so could, in Java for example, result in Null Pointer Exceptions thrown non-deterministically.

 

Class Loading/Unloading is a completely different beast. It involves using a Class Loader to load or unload classes dynamically while the VM is running. However this usually happens at GC. This is similar to how a binary executable could load or unload DLLs at runtime. Consider the scenario involving a class named myParentClass. myParentClass has a child class called myChildClass, as well as a virtual method called myVirtualMethod that myChildClass reimplements. Additionally, myChildClass' implementation of myVirtualMethod gets JIT'd.

Life in the VM continues until myChildClass gets unloaded. At this point, in order to be functionally correct, any call to myVirtualMethod should go myParentClass's implementation of myVirtualMethod. The VM is aware of this and will do the correct thing. What about a JIT'd method, say aRandomMethod, that happens to call myVirtualMethod? Well, it sees that there is a JIT'd version of myVirtualMethod and happily calls that method. It does not know that it is the wrong version.

How do we deal with this problem? Guards and Assumptions. When JITing aRandomMethod, the JIT needs to place a guard before any call to myVirtualMethod. It then needs to register an assumption against that guard. The guard acts as it is named; it says "I am going to allow aRandomMethod to call this version of myVirtualMethod under the assumption that it is OK to do so".

When a class gets unloaded, that assumption is no longer valid. This triggers a mechanism that goes to every guard and updates it to change the code path to some other location. This location could be, for example, back in the interpreter since it is the safest path. The guard now says "The assumption that it is OK for aRandomMethod to call this version of myVirtualMethod is no longer true so I can't let you do that".

 

What I've explained so far is mainly about implementation. In other words, this stuff is part of what is needed to implement the language. The main purpose of a JIT is for performance; it should be low overhead and output high performance code. This is not trivial, but these are some of the things that can be (and are) implemented:

Profiling
There are different ways of profiling. The VM could profile the interpreter and give the JIT the information. This can be used to determine which methods should be JIT'd. There can also be software profiling, in which samples of JIT'd code are collected by a dedicated thread. Additionally, there could be JIT profiling, where special instructions are added to JIT'd code to call a snippet of code to give the method a "tick". Branch and value profiling could also be implemented. This information can then be used to better optimize methods.

Compiler Optimizations
A JIT is after all a compiler. As such, using the right optimizations is key to improving performance. An example is knowing when or when not to inline a method. A JIT often has the advantage of getting profiling information from various sources at run-time. It can use this information to dynamically determine the best way to optimize a method.

Recompilations
Not all methods have to compiled at the highest optimization level (like -O3 in gcc). Recompiling is a way of first compiling methods at fairly low optimization levels and then, depending on the profiling information, recompiling them at higher optimization levels. This reduces compilation overhead and memory footprint. However, it isn't that simple; any method that wants to call a recompiled method needs to have a way of calling the new body; otherwise the entire exercise is pointless.

PICs
A Polymorphic Inline Cache (PIC) is a way to reduce the overhead of determining at run-time which virtual method to call. To reuse the previous example, if aRandomMethod wants to call myChildClass' implementation of myVirtualMethod, the code needs to go the the Virtual Function Table (VFT) and find the address of the correct method to call. This can get expensive if aRandomMethod calls myVirtualMethod many times. The solution is to only go through the VFT once. After that, the address is cached in a PIC so that the next time aRandomMethod wants to call specifically myChildClass' implementation of myVirtualMethod, it can grab the address from the PIC. Only when it wants to call a different implementation does it need to go through the VFT. Again, complications from Class Unloading apply here too.

On Stack Replacement
Probabaly one of the coolest and complicated approaches to improve performance is On Stack Replacement (OSR). Let's say there exists a method named myMethod that is being interpreted. Let's also say that it is only invoked once. If myMethod is in a long running loop, it will only ever get interpreted. The JIT could already have a compiled body available for the next invocation. However, the VM has no way to use the faster compiled body for the current invocation. OSR is a way of "jumping" from the middle of interpreting a method directly into the middle of a JIT'd body. The compilcations arise because all of the state that needs to be maintained and transferred over when OSR takes place.

 

There is, of course, much more, but it would take an eternity to go through everything. Unfortunately, there also isn't a lot of material out there on JITs. However, Oracle JRockit: The Definitive Guide is a good source of information for learning about the (JRockit) JIT.


* "JIT" is sometimes used as a noun, referring to the JIT Compiler. "JIT" is also used as a verb, eg "To JIT a method" means to use the JIT to compile a method. It is also used as an adjective, eg "A JIT'd method" is a method that was compiled by the JIT.

** "GC" is used to refer to the concept or event of Garbage Collection, or the Garbage Collector, depending on the context in which its used.

1. Just-in-time Compilation

2. Write Barrier

Log in or register to write something here or to contact authors.