There are two distinct multiplication algorithms for integers, one for unsigned values and one for signed values. The unsigned one is easier, so I'll start with that one.

For the sake of these code snippets, the values of the numbers to be multiplied are called `x` [`m` bits wide] and `y` [`n` bits wide]. This means (0 ≤ `x` < (2 ^ `m`)) and (0 ≤ `y` < (2 ^ `n`)). [Typically, `m` = `n`, however, algorithms are more useful the more generic they are, so I choose not to restrict my writeup to that particular special case.]

Let `V` be an `m`-bit wide binary string containing the value 0 to start. Let `W` be an `n`-bit wide binary string containing the value of `y`. Initialise `i` to 0. Loop: First, make `V` one bit longer by adding a "0" to the left of the string [the side of the most significant bit]. If bit `i` of `W` is set, add (`x` * (2 ^ `i`)) to `V`. [The resulting value is guaranteed to still fit in `V`, because at the beginning of the loop it contains a value less than (2 ^ (`m` + `i`)), the value that gets added is less than (2 ^ (`m` + `i`)), and the string `V` is (`m` + `i` + 1) bits wide after the "0" gets added to the left.] Add one to `i` and loop until `i` equals `n`. In pseudo-code:

`V` := ("0" * `m`)

-- That's `m` copies of the character "0", not multiplication

`W` := `y`

-- the '`y`' in the line above really means 'the string representation of `y`'

`I` := 0

while (`I` < `n`)

`V` := ("0" prepended to `V`)

if ((`W` & (2 ^ `I`)) ≠ 0) then

-- That's the numeric, or bitwise, 'and', not the logical 'and'

`V` := (`V` + (`x` prepended to ("0" * I)))

-- the '`x`' in the line above really means 'the string representation of `x`'

-- the addition is on the values represented by the strings

-- the result of the addition is converted back into a string the same length as `V`

end if

`I` := (`I` + 1)

end while

At the end of the algorithm, the result of the multiplication is in `V` (`m` + `n` bits long).

Why did I write the algorithm, manipulating strings and converting back and forth between binary values and strings? It seems just stupid! The reasons will become clearer when the strings are subdivided, then put together differently. Let `W1` be the (`n` - `i`)-bit value floor(`W` / (2 ^ `i`)), let `W2` be the `i`-bit value (W mod (2 ^ `i`)); let `V1` be the `n`-bit value floor(`V` / (2 ^ i)), let `V2` be the `i`-bit value (`V` mod (2 ^ i)).

Several things can be immediately improved upon. Instead of checking (`W` & (2 ^ `I`)), check (`W1` & 1). Instead of adding (`x` prepended to ("0" * `I`)) to `V`, add `x` to `V1`. Since `I` is no longer used in calculatons, it can be chaged from a counter going up to `N` to a counter going down to 0, which can be implemented faster on some systems. The new code looks like this:

`V1` := ("0" * `m`)

`V2` := empty

`W1` := `y`

`W2` := empty

`I` := `n`

while (`I` > 0)

`V1` := ("0" prepended to `V1`)

if ((`W1` & 1) ≠ 0) then

`V1` := (`V1` + x)

end if

Take the rightmost bit off `V1`, add it to the left of `V2`

Take the rightmost bit off `W1`, add it to the left of `W2`

`I` := (`I` - 1)

end while

Each iteration through the loop, `V2` gets wider by one bit (the extra bit originally gets added to `V1`, but then it gets transferred to `V2` later). And, each iteration through the loop a bit gets transferred from `W1` to `W2` - but then `W2` is never used, except as a bit bucket, so it may as well not be there at all. So, one bit is added to `V2` and one bit is removed from `W1` each time through the loop - this is not the kind of thing a computer scientist lets sit unremarked.

If `A` were a register, fixed at `m` bits wide, and `B` were a register fixed at `n` bits wide, then `A` could contain `V1` and `B` could contain *both* the values `V2` and `W1`. In fact, it turns out easy to do this, storing `V2` in the `i` most significant bits of `B`. One extra bit is requred in the middle of the loop - a "0" gets added to `V1` before the bit gets taken off the end of `W1`, however, the carry flag can easily carry that burden. Rewritten, the code looks like this:

`A` := 0

`B` := `y`

`I` := `n`

while (`I` > 0)

Clear the carry flag

-- The carry flag temporarily acts as the extra bit added to V1.

if ((`B` & 1) ≠ 0) then

`A` := (`A` + `x`)

-- This addition may cause an overflow. The algorithm relies on the carry flag being set when this happens.

end if

shift right `A`

shift right `B`

-- The 'shift right' shifts the carry into the MSb, and shifts the LSb into the carry

-- The effect of the shifts is:

-- - to move the bit added to `V1` (the carry bit) into the register `A` (at the MSb)

-- - to move one bit from `V1` (the LSb of `A`) into `V2` (the MSb of `B`)

-- - to move the imaginary boundary in register `B` between `V2` and `W1`

-- - to remove one bit from `W1` (the LSb of `B`)

`I` := (`I` - 1)

end while

The algorithm for signed (2's complement) multiply is more involved, and it is called Booth's algorithm.

Again, I will use `x` and `y` [`m` bits and `n` bits wide respectively] to represent the two values to be multiplied. In this case the value `x` is between -(2 ^ `m`) and ((2 ^ `m`) - 1) inclusive and the value of `y` is between -(2 ^ `n`) and ((2 ^ `n`) - 1) inclusive.

The most interesting part of Booth's algorithm is the way the signed value of `y` is manipulated. Some math and some proof is required before truly understanding the mechanism behind the workings of the algorithm.

Let `P` be an `n`-bit wide binary string.

Let `sP`(`i`) be the character `i` places from the least significant end of the string `P`.

[Informally, `sP`(0) is the least significant bit of `P`, `sP`(1) is the next bit, etc..]

Let `bP`(`i`) be the value of `sP`(`i`).

Let `rP`(0) be the empty string, let `rP`(`i` + 1) be `sP`(`i`) prepended to `rP`(`i`).

[Informally, `rP`(`i`) is the rightmost `i` bits of `P`.]

In particular, `rP`(1) = `sP`(0), and `rP`(`n`) = `P`.

Let `vP`(`i`) be the sign-extended value of `rP`(`i`).

In particular `vP`(`n`) = the sign-extended value of `P`.

For the sake of this proof and algorithm, the value of the empty string is 0, i.e. `vP`(0) = 0.

Prove that:

**P1** - if ((`vP`(`i`) ≥ 0) and (`sP`(`i`) = "0")) then (`vP`(`i` + 1) = `vP`(`i`))

**P2** - if ((`vP`(`i`) ≥ 0) and (`sP`(`i`) = "1")) then (`vP`(`i` + 1) = (`vP`(`i`) - (2 ^ `i`)))

**P3** - if ((`vP`(`i`) < 0) and (`sP`(`i`) = "0")) then (`vP`(`i` + 1) = (`vP`(`i`) + (2 ^ `i`)))

**P4** - if ((`vP`(`i`) < 0) and (`sP`(`i`) = "1")) then (`vP`(`i` + 1) = `vP`(`i`))

[Why prove these things? Where did you come up with them? - Prove these things because they're what you need to prove in order to show the validity of Booth's algorithm. It's a bit like working backwards from the answer, but a quick justification goes like this: The string "…00" represents the same value as the string "…0", they both represent the value 0. The string "…10" represents the value -2, 2 less than "…0". The string "…11" represents the same value as the string"…1", they both represent the value -1. The string "…01" represents the value 1, 2 greater than "…1". As an aside, the sign-extended value of "0", often used for 'false', is 0, and the sign-extended value of "1", often used for 'true', is -1. This, among other reasons, is why some programming languages use the numeric values 0 and -1 to represent 'false' and 'true'.]

Proof by induction is clearly the way to go.

Take the base case (`i` = 0) first.

Prove **P1** with (`i` = 0):

if ((`vP`(`i`) ≥ 0) and (`sP`(`i`) = "0")) then (`vP`(`i` + 1) = `vP`(`i`))

if ((`vP`(0) ≥ 0) and (`sP`(0) = "0")) then (`vP`(0 + 1) = `vP`(0))

if ((0 ≥ 0) and (`sP`(0) = "0")) then (`vP`(1) = 0)

if ((true) and (`sP`(0) = "0")) then ((the value of `rP`(1)) = 0)

if (`sP`(0) = "0") then ((the value of `sP`(0)) = 0)

This is true.

Prove **P2** with (`i` = 0):

if ((`vP`(`i`) ≥ 0) and (`sP`(`i`) = "1")) then (`vP`(`i` + 1) = (`vP`(`i`) - (2 ^ `i`)))

if ((`vP`(0) ≥ 0) and (`sP`(0) = "1")) then (`vP`(0 + 1) = (`vP`(0) - (2 ^ 0)))

if ((0 ≥ 0) and (`sP`(0) = "1")) then (`vP`(1) = (0 - 1))

if ((true) and (`sP`(0) = "1")) then (`vP`(1) = -1)

if (`sP`(0) = "1") then ((the value of `rP`(1)) = -1)

if (`sP`(0) = "1") then ((the value of `sP`(0)) = -1)

This is true.

Prove **P3** with (`i` = 0):

if ((`vP`(`i`) < 0) and (`sP`(`i`) = "0")) then (`vP`(`i` + 1) = (`vP`(`i`) + (2 ^ `i`)))

if ((`vP`(0) < 0) and (`sP`(0) = "0")) then (`vP`(0 + 1) = (`vP`(0) + (2 ^ 0)))

if ((0 < 0) and (`sP`(0) = "0")) then (`vP`(1) = (0 + 1))

if ((false) and (`sP`(0) = "0")) then (`vP`(1) = 1)

if (false) then (`vP`(1) = 1)

In logic, 'if false then `X`' is always true.

Prove **P4** with (`i` = 0):

if ((`vP`(`i`) < 0) and (`sP`(`i`) = "1")) then (`vP`(`i` + 1) = `vP`(`i`))

if ((`vP`(0) < 0) and (`sP`(0) = "1")) then (`vP`(0 + 1) = `vP`(0))

if ((0 < 0) and (`sP`(0) = "1")) then (`vP`(1) = 0)

if ((false) and (`sP`(0) = "1")) then (`vP`(1) = 0)

if (false) then (`vP`(1) = 0)

This is true.

Now for the induction case (`i` > 0).

Prove **P1** with (`i` > 0):

if ((`vP`(`i`) ≥ 0) and (`sP`(`i`) = "0")) then (`vP`(`i` + 1) = `vP`(`i`))

if (`vP`(`i`) ≥ 0), then ((the value of `rP`(`i`)) ≥ 0). This can only be true if the leftmost digit of `rP`(`i`) is "0", i.e. (`sP`(`i` - 1) = "0"). In this case, (`rP`(`i`) = ("0" prepended to `S`)) where `S` is some string (`i` - 1) bits long [in particular, `S` is `rP`(`i` - 1)].

(`rP`(`i` + 1) = (`sP`(`i`) prepended to `rP`(`i`))), so if (`sP`(`i`) = "0") then (`rP`(`i` + 1) is ("0" prepended to ("0" prepended to `S`))), so (`rP`(`i` + 1) is ("00" prepended to `S`)).

(The value of ("0" prepended to `S`)) is the same as (the value of ("00" prepended to `S`)), so (`vP`(`i` + 1) = `vP`(`i`)).

Prove **P2** with (`i` > 0):

if ((`vP`(`i`) ≥ 0) and (`sP`(`i`) = "1")) then (`vP`(`i` + 1) = (`vP`(`i`) - (2 ^ `i`)))

if (`vP`(`i`) ≥ 0), as argued before, (`rP`(`i`) = ("0" prepended to `S`)) where `S` is `rP`(`i` - 1).

(`rP`(`i` + 1) = (`sP`(`i`) prepended to `rP`(`i`))), so if (`sP`(`i`) = "1") then (`rP`(`i` + 1) is ("1" prepended to ("0" prepended to `S`))), so (`rP`(`i` + 1) is ("10" prepended to `S`)).

`vP`(`i`) is (the value of ("0" prepended to `S`))

`vP`(`i` + 1) is (the value of ("10" prepended to `S`))

The value ...10`S` is the value ...10`Z` plus the value ...0`S`, where `Z` is a string of zeros the same length as `S`.

`vP`(`i` + 1) is ((the value of ("10" prepended to `Z`)) + (the value of ("0" prepended to `S`)))

`vP`(`i` + 1) is ((- (2 ^ `i`)) + `vP`(`i`))

`vP`(`i` + 1) = (`vP`(`i`) - (2 ^ `i`))

QED

Prove **P3** with (`i` > 0):

if ((`vP`(`i`) < 0) and (`sP`(`i`) = "0")) then (`vP`(`i` + 1) = (`vP`(`i`) + (2 ^ `i`)))

if (`vP`(`i`) < 0), then ((the value of `rP`(`i`)) < 0). This can only be true if the leftmost digit of `rP`(`i`) is "1", i.e. (`sP`(`i` - 1) = "1"). In this case, (`rP`(`i`) = ("1" prepended to `S`)) where `S` is some string (`i` - 1) bits long [in particular, `S` is `rP`(`i` - 1)].

(`rP`(`i` + 1) = (`sP`(`i`) prepended to `rP`(`i`))), so if (`sP`(`i`) = "0") then (`rP`(`i` + 1) is ("0" prepended to ("1" prepended to `S`))), so (`rP`(`i` + 1) is ("01" prepended to `S`)).

`vP`(`i`) is (the value of ("1" prepended to `S`))

`vP`(`i` + 1) is (the value of ("01" prepended to `S`))

The value ...01`S` is the value ...1`S` minus the value ...10`Z` where `Z` is a string of zeros the same length as `S`.

`vP`(`i` + 1) is ((the value of ("1" prepended to `S`)) - (the value of ("10" prepended to `Z`)))

`vP`(`i` + 1) is (`vP`(`i`) - (- (2 ^ `i`)))

`vP`(`i` + 1) = (`vP`(`i`) + (2 ^ `i`))

QED

Prove **P4** with (`i` > 0):

if ((`vP`(`i`) < 0) and (`sP`(`i`) = "1")) then (`vP`(`i` + 1) = `vP`(`i`))

if (`vP`(`i`) < 0), as argued before, then (`rQ`(`i`) = ("1" prepended to `S`)) where `S` is `rP`(`i` - 1).

(`rP`(`i` + 1) = (`sP`(`i`) prepended to `rP`(`i`))), so if (`sP`(`i`) = "1") then (`rP`(`i` + 1) is ("1" prepended to ("1" prepended to `S`))), so (rP(i + 1) is ("11" prepended to `S`)).

(The value of ("1" prepended to `S`)) is the same as (the value of ("11" prepended to `S`)), so (`vP`(`i` + 1) = `vP`(`i`)).

Exhaustive math, and what's it good for? It's good for understanding how Booth's Algorithm works.

The following is a piece of pseudocode which simply copies a value from one register (`P`, which is `n` bits wide) to another (`V`), using the proof above.

`P` := `y`

`I` := 0

`V` := 0

-- `V` is now `vP`(0) and `I` is now 0

while (`I` < `n`) do

-- `V` is now `vP`(`I`)

`B` := ((`P` >> `I`) & 1)

-- the >> operator is logic shift right [arithmetic shift right would work just as well]

-- `B` now is `bP`(`I`)

if ((`V` ≥ 0) and (`B` = 0)) then

-- `V` is now `vP`(`I`)

-- **P1** says in this case `vP`(`I` + 1) = `vP`(`I`)

-- Do nothing...

-- `V` is now `vP`(`I` + 1)

elseif ((`V` ≥ 0) and (`B` = 1)) then

-- `V` is now `vP`(`I`)

-- **P2** says in this case `vP`(`I` + 1) = (`vP`(`I`) - (2 ^ `I`))

`V` := (`V` - (1 << `I`))

-- the << operator is logic shift left [there is no arithmetic shift left, really]

-- `V` is now `vP`(`I` + 1)

elseif ((`V` < 0) and (`B` = 0)) then

-- `V` is now `vP`(`I`)

-- **P3** says in this case `vP`(`I` + 1) = (`vP`(`I`) + (2 ^ `I`))

`V` := (`V` + (1 << `I`))

-- `V` is now `vP`(`I` + 1)

elseif ((`V` < 0) and (`B` = 1)) then

-- `V` is now `vP`(`I`)

-- **P4** says in this case `vP`(`I` + 1) = `vP`(`I`)

-- Do nothing...

-- `V` is now `vP`(`I` + 1)

endif

-- `V` is now `vP`(`I` + 1)

`I` := (`I` + 1)

-- `V` is now `vP`(`I`)

end while

-- `V` is now `vP`(`I`) and `I` is now `n`, so `V` is `vP`(`n`), or `y`

The handling of the signed value `x` is easier by comparison. Let `W` be (`x` * `V`) - but, `x` may be positive or negative, so checking the sign of `V` in the previous pseudocode does not translate directly to checking the sign of `W`. Rather, it translates to checking the sign of (`W` / `x`). A seperate flag is made to keep track of the sign of what used to be `V`: Let `C` be 0 if ((`W` / `x`) ≥ 0) and 1 otherwise.

`P` := `y`

`I` := `n`

`W` := 0

`C` := 0

-- `W` is now (`x` * `vP`(0)) and `C` is now the sign bit of `vP`(0) and `I` is now 0

while (`I` > 0) do

-- `W` is now (`x` * `vP`(`n` - `I`)) and `C` is now the sign bit of `vP`(`n` - `I`)

`D` := ((`P` >> (`n` - `I`)) & 1)

-- `D` now is `bP`(`n` - `I`)

if ((`C` = 0) and (`D` = 0)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and `C` is now the sign bit of `vP`(`n` - `I`)

-- **P1** says in this case `vP`(`n` - `I` + 1) = `vP`(`n` - `I`)

-- Do nothing...

-- `W` is now (`x` * `vP`(`n` - `I` + 1)) and `C` is now the sign bit of `vP`(`n` - `I` + 1)

elseif ((`C` = 0) and (`D` = 1)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and `C` is now the sign bit of `vP`(`n` - `I`)

-- **P2** says in this case `vP`(`n` - `I` + 1) = (`vP`(`n` - `I`) - (2 ^ `I`))

`W` := (`W` - (`x` << (`n` - `I`)))

-- `vP`(`n` - `I`) was nonnegative, but `vP`(`n` - `I` + 1) is negative

`C` := 1

-- `W` is now (`x` * `vP`(`n` - `I` + 1)) and `C` is now the sign bit of `vP`(`n` - `I` + 1)

elseif ((`C` = 1) and (`D` = 0)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and `C` is now the sign bit of `vP`(`n` - `I`)

-- **P3** says in this case `vP`(`n` - `I` + 1) = (`vP`(`n` - `I`) + (2 ^ `I`))

`W` := (`W` + (`x` << (`n` - `I`)))

-- `vP`(`n` - `I`) was negative, but `vP`(`n` - `I` + 1) is nonnegative

`C` := 0

-- `W` is now (`x` * `vP`(`n` - `I` + 1)) and `C` is now the sign bit of `vP`(`n` - `I` + 1)

elseif ((`C` = 1) and (`D` = 1)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and `C` is now the sign bit of `vP`(`n` - `I`)

-- **P4** says in this case `vP`(`n` - `I` + 1) = `vP`(`n` - `I`)

-- Do nothing...

-- `W` is now (`x` * `vP`(`n` - `I` + 1)) and `C` is now the sign bit of `vP`(`n` - `I` + 1)

endif

-- `W` is now (`x` * `vP`(`n` - `I` + 1)) and `C` is now the sign bit of `vP`(`n` - `I` + 1)

`I` := (`I` - 1)

-- `W` is now (`x` * `vP`(`n` - `I`)) and `C` is now the sign bit of `vP`(`n` - `I`)

end while

-- `W` is now (`x` * `vP`(`n` - `I`)) and `I` is now 0, so `W` is (`x` * `vP`(`n`)), or (`x` * `y`)

The final stage to constructing Booth's algorithm is to squeeze all the bits together into as little space as possible. At the beginning of the first iteration of the loop, all `n` bits of `P` need to be kept, and `W` needs to be `m` bits wide. (`m` bits wide is enough to start with because the first iteration through the loop the biggest change that can happen is subtracting `M`, which is itself only `m` bits wide.) After each iteration, one bit (the LSb) can be dropped from `P` and one bit needs to be added to `W`. (Extending the sign when necessary, adding one bit to `W` each iteration will always keep it wide enough for the value it has to store because at the beginning of iteration `i`, `W` will be (`y` + `i`) bits wide, the biggest change that can happen is adding or subtracting (`M` * (2 ^ `i`)) (a (`y` + `i`) bit wide number), so the result will always fit in (`y` + `i` + 1) bits - one bit extra each iteration.)

The pair of registers `B` (`n` bits wide) and `E` (`m` bits wide) are given the shared duty of handling the values of `W` and `P`. At the beginning of the first iteration through the loop, `W` is in `E` and `P` is in `B`. After one iteration, `W` is in all of `E` and in one bit of `B`, and `P` is in `B` except for the one bit taken up by `W`. After two iterations, `W` is in `E` and in two bits of `B`, and `P` is in `B` except it's two bits shorter.

`B` := `y`

`I` := `n`

`E` := 0

carry flag := 0

-- `W` is now (`x` * `vP`(0)) and the carry flag is now the sign bit of `vP`(0) and `I` is now 0

while (`I` > 0) do

-- `W` is now (`x` * `vP`(`n` - `I`)) and the carry flag is now the sign bit of `vP`(`n` - `I`)

`D` := (`B` & 1)

-- `D` now is `bP`(`n` - `I`)

if ((carry flag = 0) and (`D` = 0)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and the carry flag is now the sign bit of `vP`(`n` - `I`)

-- **P1** says in this case `vP`(`n` - `I` + 1) = `vP`(`n` - `I`)

-- Do nothing...

elseif ((carry flag = 0) and (`D` = 1)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and the carry flag is now the sign bit of `vP`(`n` - `I`)

-- **P2** says in this case `vP`(`n` - `I` + 1) = (`vP`(`n` - `I`) - (2 ^ `I`))

`E` := (`E` - `x`)

elseif ((carry flag = 1) and (`D` = 0)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and the carry flag is now the sign bit of `vP`(`n` - `I`)

-- **P3** says in this case `vP`(`n` - `I` + 1) = (`vP`(`n` - `I`) + (2 ^ `I`))

`E` := (`E` + `x`)

elseif ((carry flag = 1) and (`D` = 1)) then

-- `W` is now (`x` * `vP`(`n` - `I`)) and the carry flag is now the sign bit of `vP`(`n` - `I`)

-- **P4** says in this case `vP`(`n` - `I` + 1) = `vP`(`n` - `I`)

-- Do nothing...

endif

Arithmetic shift right E:B

-- `W` is now (`x` * `vP`(`n` - `I` + 1)) and the carry flag is now the sign bit of `vP`(`n` - `I` + 1)

`I` := (`I` - 1)

-- `W` is now (`x` * `vP`(`n` - `I`)) and the carry flag is now the sign bit of `vP`(`n` - `I`)

end while

-- `W` is now (`x` * `vP`(`n` - `I`)) and `I` is now 0, so `W` is (`x` * `vP`(`n`)), or (`x` * `y`)

-- `E`:`B` contains the value `W`

Rolling the registers left the way the algorithm shows keeps the `m` most significant bits of `W` in register `E` - so, taking `x` and multiplying it by (2 ^ `I`) and adding it to or subtracting it from `W` can be accomplished by simply taking `x` and adding it to or subtracting it from the register `E`. Also, the value `C` fortuitiously falls into the carry flag after the appropriate rotation.

The practice of using one register to hold (parts of) two different values is used in other algorithms, such as the division algorithm. The big difference between the multiplication and division algorithms in this respect is that in division, the MSbs of the answer are calculated first so the shifts are to the left, and in multiplication the LSbs of the answer are calculated first so the shifts are to the right.