Message Authentication Codes

'Good morning! My account number is 212-13670004, BBVA Bank. Nice working with you! --dogganos'
HMAC_SHA1:e2806fdefff275c719d63df99ff39a435bc0254d

One form of Message Authentication Codes (MACs) is the Hashed Message Authentication Codes (HMACs) which use one-way hash functions combined with a secret shared key in order for the recipient of a message to be sure that what he received is intact and, indeed, from the sender implied. It must be noted from the beginning that it is not a way to encipher messages, so that they cannot be read by people who do not possess the secret key. Although HMAC can be used in conjunction with cryptography in order to reassure the recipient of a message that the message was indeed sent by the sender it claims, arrived intact (digital signatures) and, additionaly, make the message unreadable to others, HMAC per se leaves the message as is, only adding the HMAC digest at the end of the message or in a separate file accompanying the message.

So, a text message sent for example over the internet using HMAC, will remain readable to anybody who snoops the data packets, but the recipient will be sure about the identity of the sender and in addition, that the message arrived exactly as the sender sent it (integrity check).

Let's examine the example provided above.

This message is transmitted (via e-mail for example) to my employer so that he can put my payment in my bank account. The message is not encrypted because there is no need to, since if somebody intercepting the message also wishes to put some money in my bank account, he is more than welcome. But it is obvious that my employer must receive the correct, unaltered account number, since an attacker would prefer his account number to be credited instead of mine.

Since I share my secret key (which for this example is 'DoGgAnOs') only with my employer, two things happen: first, only the employer can verify that this message actually came from me, and second, that the message is in no way altered. These two things work because of the properties of cryptographically strong hash functions , with the addition that the attacker has no knowledge whatsoever of the secret key.

The Design of HMAC

We can now see how HMAC really works. First of all, the input to the HMAC algorithm is a message of arbitrary length (referred to as 'M'), and a secret key (referred to as 'K'). (For extra information about the keys, see 'The Keys in HMAC' section). The output is a n-bit (same 'n' as that of the hash function used, e.g. 128 for MD5, 160 for SHA-1) digest.

Hash functions process data in blocks, usually 64 bytes each, and, consequently, so does the HMAC algorithm. But for the sake of generality, we will refer to the byte size of each data-block as 'B'. We will also refer to the byte size of the hash function’s output (which varies between differrent hash functions) as 'L'. For example L is 16 bytes for MD5 and 20 bytes for SHA-1. L is also the size in bytes of HMAC's output digest.

So, the HMAC mechanism is as follows: We first define two fixed byte pads (ipad and opad) of size B, each repeating the same byte value:

ipad = the byte 0x36 repeated B times
opad = the byte 0x5C repeated B times

To compute the HMAC of our message 'M', using our secret key 'K' we perform ('H' is the hash function selected, e.g. SHA-1):

H( K XOR opad, H( K XOR ipad, M))
('i' and 'o' in ipad and opad respectively, are mnemonics for 'inner' and 'outer')

As we can see, the HMAC calculation is performed in four discrete steps:

1. we append zeros (if needed) to the key K so that it ends up with a size of B bytes.
2. we XOR (bitwise exclusive-OR) the key obtained from the previous step, with each of the ipad and opad, so that we get two new pads, say, k_ipad and k_opad.
3. we append our message M to the end of k_ipad and we hash the result.
4. we append to k_opad the result of the previous step and we hash the result.

What we get from step 4 is the HMAC digest.

The Design Idea of HMAC (or Why HMAC works)

OK, you'll say. Why all this fuss with these weird ipads and opads and XORs etc?

At first sight, the HMAC computation algorithm may seem a little bit redundant. One could argue that instead of using the complicated scheme H( K XOR opad, H( K XOR ipad, M)), it would be enough to use H( K, M ). That is, to apply the hash function to the concatenation of the shared secret key and the message. Now it is still impossible for an adversary to forge the message. But actually there is a very serious problem with that method. Someone can always add new information at the end of the original message, and still be able to compute a valid MAC, without, of cource knowing the key. Another method which thwarts this attack would be to add the message length at the beginning of the message but this method has already been successfully attacked. The same goes for the method of adding the secret key at the end of the message.

A idea that is better than all of the above is to put the key both at the beginning and end of message: H( K, M, K ). Even better with different keys: H( K1, M, K2 ). But even for these methods, various cryptanalysts have published papers indicating vulnerabilities. A method which uses a two-time application of the hash function on combinations of the key with the message is the following: H( K, H( K, M)). This method is secure against the so-called extension attacks that the previous ones are vulneruble to, because the outer application of the hash function avoids the exposure of the intermediate result of the inner application of the hash function.

This way we have now approached a construction that is actually used in the HMAC algorithm and is considered secure against all currently known cryptanalytic attacks. As we saw above, the real algorithm used to calculate HMAC is: H( K XOR opad, H( K XOR ipad, M)). Now, the reason that the "two keys" (K1 = K XOR opad, K2 = K XOR ipad) are derived pseudo-randomly by XOR-ing them with ipad and opad, is that an attacker trying to learn about possible dependencies between K1 and K2, does not get to see directly the output of the pseudorandom function on any input.

One last word about the specific values selected for ipad and opad. So, the particular values were selected in order to have a simple representation (to simplify the function's specification and minimize the potential of implementation errors), and to provide a high Hamming distance between the pads. The latter is intented to exploit the mixing properties attributed to the compression function underlying the hash schemes in use. These properties are important in order to provide computational independence between the two derived keys.

The Keys in HMAC

Now we should mention something about the keys. It is well known that the security level of any system, is that of it's weakest part. So, to achieve maximum security (i.e. the theoretical maximum) we must be sure, first to use a secret key that is random and second, a key that has a minimum size of L bytes (like the corresponding hash function's output, which in case of SHA-1 is 160 bits, in case of MD5 is 128 bits). The key used in the HMAC algorithm can have any size lower than B bytes (the block size of the hash-function, which is 512 bits both for SHA-1 and MD5), and HMAC also defines that in case of a larger key, it must be first hashed using H (resulting to a new, L-byte key).

Practically, really random keys (produced with a cryptographically strong pseudo-random number generator) are very rarely used due to their increased management overhead (they have to be stored somehow, which introduces more security issues). Most people use the so-called 'passphrases' as keys, which are easier to remember than a random key.

Source:
A recent presentation of mine for the lesson in my university: 'Cryptography and Systems Security'.

MD5 hash function	luser	ANSI X9.19	Text Message
XOR	Cryptology	CBC MAC	pseudo-random function
OMAC	OFB	RMAC	Standard Cryptographic Algorithm Naming
hamming distance	RFC 1321	SHA-1	digest
mechanism	pseudo-random	hash	Coding for E2: Finding the Code
HMAC	passphrase	secure hash	Sender