It is widely agreed that each unique language should be assigned a unique and definitive code, so that when a language is referred to by different names, everybody knows what language is under discussion.

It is further agreed that this should be a two or three letter case insensitive alphabetic code, and EVERYBODY agrees that it needs to be an official standard.

So far, I have unearthed six such standards. I'm certain there are others.


MARC (MAchine Readable Cataloging) is an old bibliographic standard. The MARC Language Code has 434 three letter codes assigned.
ISO 639-1 is a two letter standard with 139 codes assigned.
ISO 639-2/B is a three letter standard assigning 464 different codes for bibliographic applications, which is almost the same as
ISO 639-2/T, a three letter standard for terminology applications, which differs from ISO 639-2/B in only 23 of the 464 assigned codes.
RFC 3066 (Tags for the Identification of Languages) specifies a multi-part tag. The first part is :
  • the ISO 639-1 code, if it exists
  • else the ISO 639-2/T if it exists
  • else use "i" (for IANA-defined registrations ) or "x" (for private use).
The second part, if two characters, is the ISO 3166 two character country code, otherwise is is a string specifically registered with the IANA.
Ethnologue has its own three letter encoding with 7,198 codes assigned; which is getting close to the 17,576 maximum you can have with three letters. Ethnologue is the only one that uses uppercase letters for the language code.

Oddly enough, Linguasphere, another very elaborate taxonomy of the world's languages, doesn't use language codes. It has codes 00 to 99 for language families, but that's it.


Comparing the number of codes in a scheme isn't the whole story. For example, Ethnologue has a different code for each of 103 different sign languages, for example

Whereas ISO 639-2, and therefore RFC 3066, has only the single code sgn for sign language. However, RFC 3066 appends the two letter country code to get the same effect :

Be careful not to confuse the ISO 639 two and three letter language codes, with ISO 3166 two and three letter country codes.

Log in or register to write something here or to contact authors.