conditional execution (idea) by alisdair

Conditional execution is an instruction set architecture feature, most notably found on the ARM series of processors and also employed by Motorola's AltiVec unit, which predicates individual instructions on processor status flags. The importance of this feature is proportional to pipeline length, but even on the three-stage ARM7 and five-stage StrongARM and ARM8, it has the advantage that complicated branch prediction units are unnecessary.

Short Branches

The standard way of skipping over several instructions based on a test condition is to use a local branch. This has the unfortunate side-effect of introducing a pipeline bubble: so for each branch, a no-op instruction is added to your code. Adding even one instruction to a tight loop can be a serious performance hit, so processor designers sought to find a way around the problem.

Their solution was to create a branch prediction unit: an initially simple piece of circuitry which guesses whether a conditional branch will be taken or not. The most straightforward units are surprisingly effective in the standard five-stage pipeline: simply always-take or never-take operation can improve performance dramatically. However, once processors begin to extend their pipeline, more accurate prediction is required, and this section of the processor increases in complexity. Adding complicated logic to the pipeline not only increases processor cost, but power usage too.

Conditional Execution

ARM's solution is to avoid short conditional branches almost altogether. Instead of only allowing branches to be executed conditionally, the concept was extended to every instruction. Each standard opcode is executed unconditionally by default: to predicate it to a certain combination of processor flags, a condition mnemonic is suffixed. For example:

ADD r0, r4, r5 ; store r4 + r5 in r0
ADDPL r0, r4, r5 ; as above, but only if the negative flag is clear

Conditional execution has the significant advantage that performance-critical, tight loops need not have short branches to control minor program flow. This leads to greater pipeline performance and higher code density, both leading to higher instruction throughput. Additionally, no branch prediction units are needed to maintain high performance, and so the processor cells are smaller and use less power.

Setting the Processor Flags

Processor flags can be modified using the S (set flags) suffix to any normal instruction: this suffix causes the flags to be set according to the final value of the destination operand. The fact that this is optional comes as a surprise to seasoned 6502 programmers, who are used to every operation affecting the processor flags; however, it is very useful in the context of conditional execution, as it allows for multiple instructions to be executed predicated on one setting of status flags. For example:

SUBS r0, r0, r2 ; subtract r2 from r0, and set flags
ADDEQ r0, r0, r2 ; if the result was zero, add r2 to r0 again...
MOVEQ pc, lr ; ... and return from the function
MUL r0, r0, r2 ; otherwise, multiply r0 by r2 and store in r0

The other, more common way to set processor flags is to use one of the comparison instructions: CMP (compare), CMN (compare negative), TEQ (test equal), or TST (test bits). These instructions only affect the processor status, they do not modify registers, and have an implicit S flag set. There follows an example of using CMP and conditional execution, using the RSB (reverse subtract) instruction:

; abs() function: return the absolute value of the integer parameter.
; Pre-condition: r0 holds a signed integer
; Post-condition: r0 holds a positive signed integer

.abs
CMP r0, #0 ; compare r0 to constant zero
RSBLT r0, r0, #0 ; if r0 is less than zero, set r0 to 0 - r0
MOV pc, lr ; return from function

Condition Mnemonics Reference

Sixteen possible conditions are available to the ARM assembly programmer, each with their own mnemonic extension to the opcode. Eight of these are direct tests of the four processor flags, zero, overflow, negative, and carry. Another six are combinations of flag settings for testing after comparison instructions. The remaining two are NV or "never", used for no-op instructions, and AL or "always", the default condition.

The flag testing condition mnemonics and their meanings are:

EQ (Equal): Z (zero) set
NE (Not Equal): Z clear
VS (oVerflow Set): V (overflow) set
VC (oVerflow Clear): V clear
MI (MInus): N (negative) set
PL (PLus): N clear
CS (Carry Set): C (carry) set
CC (Carry Clear): C clear

The more complicated comparison mnemonics and their meanings are:

HI (HIgher): C set AND Z clear: This condition is true if operand one was greater than operand two. Note that it is assumed that both operands are unsigned.
LS (Lower than or Same): C clear OR Z set: This is the logical inverse of HI.
GE (Greater than or Equal): N == V: This condition is true if operand one was greater than or equal to operand two. In this case, the numbers compared are assumed to be signed.
LT (Less Than): N XOR V: This is the logical inverse of GE.
GT (Greater Than): (N == V) AND Z set: This condition is true if operand one is greater than operand two. Again, the numbers compared are assumed to be signed quantities.
LE (Less than or Equal): N XOR V OR Z set: This is the logical inverse of GT.

Sources:

ARM Assembly Language, Ginns, Dabs Press, 1988

ARM9	branch prediction	Methods of execution	ARM1
superpipelined	ARM7	StrongARM	ARM10
ARM8	armasm	FLAC	conditional branching
Wild ARMs	no-op	Predicate	Death penalty
ISA	ARM6	ARM3	ARM2
Condition Code Register	Limited-range MAX without branches	instruction set	Pentium 4