A race condition is a situation where running multiple concurrent processes (for these purposes, a thread is also modeled as a process, as are processes running on separate machines) will give differing results, depending on the (unspecified, and usually unspecifiable) details of the ordering of operations.

To break this up into understandable bits: Suppose multiple processes are running, sharing some resource. Almost all models of concurrent execution don't specify a precise order in which the processes will execute instructions (and almost all real situations don't exhibit a precise order, either, so the models are accurate here). It is therefore reasonable to assume the end result of the computation will be dependent also on the order. Code which exhibits this is said to have a race condition.

Example

This is a specific example of one way in which a race condition can arise. But it is fairly illustrative. Consider this code:

for i:=1 to 10 do
  s := s+1
Suppose it runs in 2 processes, with the variable i private to each process, and the variable s shared (and initialised to 0 before execution starts). What is the "computed" value of s when both processes finish?

If the operation s := s+1 is executed atomically (that is, always executed in full by one process before the other can touch s), then the result is 20. But on almost all machines, this operation is executed as a load, an increment, and a store. And the other process may access s between operations. For instance, on the first iteration, the 2 processes may interleave instructions as follows:

LOAD s
                      LOAD s
INCREMENT s
STORE s
                      INCREMENT s
                      STORE s
After both processes have finished the first iteration, s will contain 1, instead of 2 as intended!

What's happening is that the processes are racing to see who will access s first, and who will finish the entire access first. In this example, there are no winners.

The solution (in this case) is to add a synchronization mechanism to all accesses to s, e.g. by placing them inside a mutual exclusion zone.

"Benevolent" race conditions

Not all race conditions are horrible! Consider, for example, a multithreaded web server trying to deal with an incoming connection. Which of the threads picks up the connection is, in general, not wholly determined by the state of the web server! Strictly speaking, this is a race condition.

But of course we don't care. In this case, we do not regard the identity of the thread which performs the service as part of the "result" of the "computation".

If, on the other hand, more than one thread would attempt to answer the connection, trouble would ensue. This is no longer a harmless race condition...

Race conditions are one of the ugliest type of bug to detect and fix, mainly because they are not reproducible. Even if all processes involved are started in the same order and with the same time difference, they will most likely arrive at the critical section differently than before, due to interrupts.

So if the race condition only manifests with a timing scenario that occurs with a very low probability, it becomes very hard to observe when you are actually looking for it. If you absolutely have to find it because the results are catastrophic and must not occur even infrequently, you're pretty screwed.

The way to deal with race conditions is to not let them come into being in the first place, by paying the utmost attention to correctly dealing with shared resources in the design stage, not in the debugging stage.

Log in or register to write something here or to contact authors.