'Good morning! My account number is 212-13670004, BBVA Bank. Nice
working with you! --dogganos'
One form of Message Authentication Codes (MACs) is
the Hashed Message Authentication Codes (HMACs) which use one-way hash
functions combined with a secret shared key in order for the
recipient of a message to be sure that what he received is intact and,
indeed, from the sender implied. It must be noted from the beginning
that it is not a way to encipher messages, so that they cannot be read
by people who do not possess the secret key. Although HMAC can be used
in conjunction with cryptography in order to reassure the recipient
of a message that the message was indeed sent by the sender it claims,
arrived intact (digital signatures) and, additionaly, make the message
unreadable to others, HMAC per se leaves the message as is, only adding
the HMAC digest at the end of the message or in a separate file
accompanying the message.
So, a text message sent for example over the
internet using HMAC, will remain readable to anybody who snoops the
data packets, but the recipient will be sure about the identity of
the sender and in addition, that the message arrived exactly as the
sender sent it (integrity check).
Let's examine the example provided above.
This message is transmitted (via e-mail for
example) to my employer so that he can put my payment in my bank
account. The message is not encrypted because there is no need to,
since if somebody intercepting the message also wishes to put some money
in my bank account, he is more than welcome. But it is
obvious that my employer must receive the correct, unaltered account
number, since an attacker would prefer his account number to
be credited instead of mine.
Since I share my secret key (which for
this example is 'DoGgAnOs') only with my employer, two things happen:
first, only the employer can verify that this message actually came from
me, and second, that the message is in no way altered.
These two things work because of the properties of cryptographically
strong hash functions , with the addition that the attacker has no
knowledge whatsoever of the secret key.
The Design of HMAC
We can now see how HMAC really works. First of all,
the input to the HMAC algorithm is a message of arbitrary length
(referred to as 'M'), and a secret key (referred to as 'K'). (For extra
information about the keys, see 'The Keys in HMAC' section). The output
is a n-bit (same 'n' as that of the hash function used, e.g. 128 for
MD5, 160 for SHA-1) digest.
Hash functions process data in blocks, usually 64
bytes each, and, consequently, so does the HMAC algorithm. But for the
sake of generality, we will refer to the byte size of each data-block as
'B'. We will also refer to the byte size of the hash function’s output
(which varies between differrent hash functions) as 'L'. For example L
is 16 bytes for MD5 and 20 bytes for SHA-1. L is also the size in bytes
of HMAC's output digest.
So, the HMAC mechanism is as follows: We first
define two fixed byte pads (ipad and opad) of size B, each repeating
the same byte value:
ipad = the byte 0x36 repeated B times
opad = the byte 0x5C repeated B times
To compute the HMAC of our message 'M', using our
secret key 'K' we perform ('H' is the hash function selected, e.g.
H( K XOR opad, H( K XOR ipad, M))
('i' and 'o' in ipad and opad respectively, are mnemonics
for 'inner' and 'outer')
As we can see, the HMAC calculation is performed in
four discrete steps:
1. we append zeros (if needed) to the key K so that it
ends up with a size of B bytes.
2. we XOR (bitwise exclusive-OR) the key obtained from the previous
step, with each of the ipad and opad, so that we get two new pads, say,
k_ipad and k_opad.
3. we append our message M to the end of k_ipad and we hash the
4. we append to k_opad the result of the previous step and we hash
What we get from step 4 is the HMAC digest.
The Design Idea of HMAC (or Why HMAC works)
OK, you'll say. Why all this fuss with these weird ipads and opads
and XORs etc?
At first sight, the HMAC computation algorithm may
seem a little bit redundant. One could argue that instead of using the
complicated scheme H( K XOR opad, H( K XOR ipad, M)), it would
be enough to use H( K, M ). That is, to apply the hash function
to the concatenation of the shared secret key and the message. Now it is
still impossible for an adversary to forge the message. But
actually there is a very serious problem with that method. Someone can
always add new information at the end of the original message, and still
be able to compute a valid MAC, without, of cource knowing the key.
Another method which thwarts this attack would be to add the message
length at the beginning of the message but this method has already
been successfully attacked. The same goes for the method of adding the
secret key at the end of the message.
A idea that is better than all of the above is to
put the key both at the beginning and end of message: H( K, M, K ).
Even better with different keys: H( K1, M, K2 ). But even for
these methods, various cryptanalysts have published papers indicating
vulnerabilities. A method which uses a two-time application of the hash
function on combinations of the key with the message is the following:
H( K, H( K, M)). This method is secure against the so-called extension
attacks that the previous ones are vulneruble to, because the outer
application of the hash function avoids the exposure of the intermediate
result of the inner application of the hash function.
This way we have now approached a construction
that is actually used in the HMAC algorithm and is considered secure
against all currently known cryptanalytic attacks. As we saw above, the
real algorithm used to calculate HMAC is: H( K XOR opad, H( K XOR
ipad, M)). Now, the reason that the "two keys" (K1 = K XOR opad, K2
= K XOR ipad) are derived pseudo-randomly by XOR-ing them with ipad and opad, is
that an attacker trying to learn about possible dependencies between K1
and K2, does not get to see directly the output of the pseudorandom
function on any input.
One last word about the specific values selected for ipad
and opad. So, the particular values were selected in order to
have a simple representation (to simplify the function's specification
and minimize the potential of implementation errors), and to provide
a high Hamming distance between the pads. The latter is intented to
exploit the mixing properties attributed to the compression
function underlying the hash schemes in use. These properties are
important in order to provide computational independence between the
two derived keys.
The Keys in HMAC
Now we should mention something about the keys. It
is well known that the security level of any system, is that of it's
weakest part. So, to achieve maximum security (i.e. the theoretical
maximum) we must be sure, first to use a secret key that is random
and second, a key that has a minimum size of L bytes (like the
corresponding hash function's output, which in case of SHA-1 is 160
bits, in case of MD5 is 128 bits). The key used in the HMAC algorithm
can have any size lower than B bytes (the block size of the
hash-function, which is 512 bits both for SHA-1 and MD5), and HMAC
also defines that in case of a larger key, it must be first hashed using
H (resulting to a new, L-byte key).
Practically, really random keys (produced with a
cryptographically strong pseudo-random number generator) are very
rarely used due to their increased management overhead (they have to
be stored somehow, which introduces more security issues). Most people
use the so-called 'passphrases' as keys, which are easier to remember
than a random key.
A recent presentation of mine for the lesson in my
university: 'Cryptography and Systems Security'.