Montgomery multiplication

(Multiplication algorithms:)

What it does

The Montgomery multiplication algorithm, described in [1], is a fast method for computing

x × y mod M,

for large x, y and M (typically up to a few thousands of bits), when M is odd. Many important algorithms of asymmetric cryptography -- Diffie-Hellman algorithm is probably the most important example -- require many such operations, making Montgomery multiplication an important tool. Montgomery observed that, while multiplication is not too expensive, modular reduction is. So a variant of the plain old multiplication algorithm is used, but the modulo operation is eliminated totally.

Where it does it

In what follows, assume M is n bits long (and think e.g. n=1536). Compute

r = 2^n mod M = 2^n-M.

Additional required constants are r^-1 (the inverse of r) and r² -- both modulo M. If changing M is an infrequent operation (and for Diffie-Hellman, it is) then this precomputation is essentially free.

Montgomery multiplication defines a new representation of large numbers modulo M

a ⇒ R(a) = a × r mod M,

as well as a new multiplication operation

a ⊗ b = a × b × r^-1 mod M.

The neat property of all this is that

R(a × b) = a × a × r mod M = R(a) × R(b) × r = R(a) ⊗ R(b)

So R is a weird isomorphism of (Z/MZ)*, if you us the operator ⊗. If the preceding sentence means nothing to you, ignore it. It's just saying that ⊗ multiplies in the R-world the same way that × multiplies in the "real" modulo M world.

If we're going to be computing a sequence of multiplications, we don't need to convert to and from the R-world all the time, either. For instance, if computing g^x mod M (the basic operation for Diffie-Hellman), we just need to compute R(g) (which is a precomputed constant for computing a private key), then repeatedly use Montgomery multiplications (up to 2n-1 times), and convert the end result back to the real world.

Even nicer, R can be expressed in terms of ⊗ on bit-strings, using a precomputed constant. We have that

x ⊗ r² = x × r² × r^-1 mod M = x × r mod M = R(x),

and

R(x) ⊗ 1 = x × r ×1 × r^-1 = x,

so conversion to and from the R-world is just another application of our ⊗ operation.

A full Diffie-Hellman exchange is at most 2n ⊗ operations!

How it does it

So the trick is to show how to compute R(⋅) and ⊗ rapidly -- without recourse to modulo operations.

Well, the elementary algorithm for multiplying X=(x_n-1...x₀) by Y=(y_n-1...y₀) is just

Z = 0;
for i in 0..n-1:
  if (x_i is set) then Z = Z + (Y << i);
done;
output: X×Y = Z.

We want to perform the above multiplication, simultaneously dividing by 2ⁿ. Since there are n steps, we divide by 2 at each step.

And here's the big deal. Dividing Z by 2 is easy -- if Z is even: just right shift. It even works when computing modulo M. But if 0 ≤ Z < M is odd, it is true that Z / 2 mod M = (Z+M) / 2 (where the RHS is computed as normal base-2 arithmetic). And we can compute this last division using a shift.

So the algorithm is just

Z = 0;
for i in 0..n-1:
  if (x_i is set) then Z = Z + Y;
  if (Z is odd) then Z = Z + M;
  Z = Z >> 1;
done;
if (Z ≥ M) then Z = Z - M;
output: X⊗Y = Z

It gets even better...

Fewer branches are required in the loop than might appear. In a circuit, the first "if" operation can be performed directly: add x_i ⋅ Y (an AND gate will do) instead of the first if. The algorithm is particularly suitable to storing Z in a redundant notation: "carry save", where carry bits are stored separately from Z. And no large operands need to be kept lying around -- n bits each is enough.

The final reduction after the loop (if (Z ≥ M) then Z = Z - M;) is an annoyance: it's another branch delay, right at the end. Luckily, if a sequence of multiplications is required, it need only be performed after the sequence ends (the proof of this is based on showing that if one input is bounded by 2M-1 and the other by M, then the output is also bounded by 2M-1) -- this appeared in [2]

References

P. Montgomery, Modular multiplication without trial division, Mathematics of computation, vol. 44, 1985, pp. 519-521.
Colin D. Walter, Montgomery exponentiation needs no final subtractions, Electronics Letters, 35(21):1831--1832, October 1999.

irrational number	Fast 6502 multiplication	forty-two	Convert any number to decimal
Multiplication Is Vexation	Multiplication algorithm	Russian peasant multiplication	Multiplication using the Fast Fourier Transform
public key cryptography	carry save	isomorphism	modulo
Toom-Cook multiplication	Comba multiplication	Karatsuba multiplication	Egyptian multiplication
Multiple precision arithmetic	Barrett Reduction	multiplication table in Perl	Multiplication
Mod	algorithm

(Multiplication algorithms:)

What it does

Where it does it

How it does it

It gets even better...

References

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups

Login
Password

Montgomery multiplication

(Multiplication algorithms:)

What it does

Where it does it

How it does it

It gets even better...

References

Sign In

Recommended Reading

About Everything2

User Picks

Editor Picks

New Writeups