(I feel I need to write a rebuttal to
Rollo's node here)
Branch prediction is hardly based on a "
lucky guess" -- most programs consist of loops that are repeated several times. The
looping construct will usually exhibit a
strong bias to
taken or
not-taken, depending on the loop's sense, which means that a simple
2-bit predictor will capture most of the information about the branch and yield very high
prediction rates (over 90%, typically).
Branch prediction in any modern
CPU yields extraordinary performance benefits. As
pipeline lengths increase, the
penalty for
misprediction becomes longer and longer. Furthermore, a modern
out-of-order issue CPU needs a reasonable supply of
instructions to dispatch to the
functional units -- if you allowed the
scheduler to block at every
branch, no
instruction-level parallelism could ever be exploited.
Branch prediction is being folded directly into the
caches in newer
processors -- that's essentially what a
trace cache is.
Finally, as far as
multitasking and
branch prediction, cycles "risked" on
meaningless work would otherwise be wasted, unless you are talking about
cooperative multitasking -- it's pretty rare that a thread of execution will
yield back to the
scheduler in
preemptive multitasking. It makes no sense to claim that
branch prediction is only helpful for
single-process systems, since it was originally deployed on the large
timeshare machines of the 1960's (
IBM's
System 360 to be specific).
(It sure feels good to flex some almost-
atrophied computer architecture mental muscles...)