Compact disc data encoding scheme

When Philips introduced the compact disc in 1982, commercial laser technology wasn't nearly as well developed as it is nowadays. The people at Philips' NatLab research center had to develop various elaborate schemes to be able to read the data off the CD surface with one of those crude devices. Some of those schemes are still being used today for reasons of backward compatibility, even though they limit the storage capacity of a CD dramatically. The most elaborate space wasting hack they had to pull is 8 to 14 modulation. It simply means that every byte on a CD gets stored as a 14 bit word. That means you can actually store 3/4th more data on your CD's than you thought!

Why is this necessary? Actually, it isn't, but back then it was. The data on a Compact Disc surface are represented by microscopic holes. These are known as pits. The absence of a pit is known as a land in CD jargon. Contrary to common misconception, a land/pit does not represent a bit on the CD, but a transition between a land and a pit represents a one, and no transition (two consecutive lands or pits) represents a zero. They had to do that, because a transition between two states during one clock cycle was far easier to detect than the value itself.

This representation of data presented a new problem. What if you would want to encode a byte with two consecutive ones? That would mean a transition from land to pit and from pit to land (or vice versa) in an incredibly small amount of space. It would form either a very small land or a very small pit. Either way, this would have been expensive to read and to manufacture because it required greater precision of the equipment.

What they did instead, was to assign each byte a code with no consecutive ones in it before putting it on the cd. When a player reads the data back it needs to decode the data again. In order to do this, you need to assign each byte value to a new 14 bit word. Anything less than 14 bits for the new values won't do because there won't be enough combinations to make without consecutive ones to make up for the 256 different byte values. You can count that if you like, but you can easily prove it. I will leave that as an exercise to the reader, though.