An Introduction to Data Compression, MPEG Video, and DVD Technology
At the heart of any digital video transmission or storage system, one will find compression algorithms. Because transmission bandwidth and storage space are scarce resources, it is important to fit the maximum amount of data possible into those time and space constraints. This is done by removing information that is redundant -- or appears to a human viewer to be redundant. The main goal of this paper is to address the principles behind the most common video compression and storage system in use today -- the Digital Versatile Disc, or DVD.
Since the birth of the digital computer, computing technology has inevitably advanced towards greater processing speed and greater storage capacity, and with that advance comes the ability to process and store new information media. The first computers were programmed and outputted their results purely in binary, and later advanced to use decimal numbers and after that alphanumeric text.
After that came the ability to display on screen vector graphics (stored by the computer as coordinates for line segments) and raster graphics (each dot on the screen, called a pixel, stored as a value representing its color). Storing and processing screen's worth of graphics data represented a huge burden on early computers, but as technology advanced, so did the amount of data that could be stored (leading to higher-quality images being displayed), as did the ways in which images could be manipulated (leading to the first software programs for editing computer graphics).
Computer technology continued to advance, however, and with it the human interest in storing new media in computer memory, to exploit the possibilities of computer processing. After still images, video became the next logical step as computers moved into the 1980s. However, storage capabilities lagged behind the growth in processing power needed to display still and moving pictures, and when it came to transmitting such data over electronic connections (such as the burgeoning Internet) the prospect of working with raw image or video data became daunting. The storage needed for a screen's worth of image data was an order of magnitude above the actual storage available at the time. There needed to be some way of conveying the information needed while reducing the amount of data actual stored in memory or transmitted to other systems. Luckily, computer science had a solution.
Compressing data means representing a piece of information in a smaller amount of space (space being bytes in memory) than it originally took up. There are two basic ways to compress data. One way is by removing redundancies in the data. For instance, I could make a table with a numbered row for each word that appears more than, say, twice in this document, and then replace each of those words in this document with its number from the table. Each time I used the word 'the', it might be replaced with the number '1', a savings of two bytes. I could send this new document along with the lookup table and instructions on how to replace the numbers with their original words to someone, and it would represent a very crude form of compression.
A second way to compress data is to strt rmving pcs of info whch dn't dtract frm th ovrall legiblty of th documnt. This relies on the human brain to fill in the missing pieces of information. Clearly, this sort of compression would grow tiring if used on text, and would be disastrous if used on instructions for a computer (computers at this time having no powers of creativity or abstract thought). However, there are some media for which this type of compression is acceptable, namely, sound and images.
The first type of compression is known as lossless compression, as no information is lost between compression and decompression. The second form is known as lossy compression, because some data is deemed redundant to the human viewer and is discarded. It is the second form of compression that is used in DVD videos, specifically a standard of compression known as MPEG.
The MPEG Video Compression Standard
There are several qualities needed for any video compression method:
- It must allow random access. A viewer must be able to view any section of the video without having to watch all the preceding video.
- It must allow fast forward and reverse searches, to keep the same features present in other film and video technologies, and well as reverse playback.
- The audio and video must be synchronized to a single clock, so that the audio always matches up with the video.
- It must be robust in the face of errors. If frames are lost due to damaged storage media or gaps in digital transmission, these dropped frames must be handled gracefully.
- The format must be flexible, and handle video sizes from a small fraction of a computer screen to the entire size of a high-definition television.
- To top it all off, the encoding and decoding process must be done quickly, so that there is little lag noticeable to the viewer!
MPEG compression meets all these standards, and provides enough reduction in size to allow transmission of video over the Internet, as well as storage of video on limited storage media.
How does MPEG video compression work? It relies on two types of redundancy: temporal redundancy and spatial redundancy. Spatial redundancy is redundant information in a single frame. Temporal redundancy is redundant information spanning multiple frames.
Reducing Temporal Redundancies
To look for redundancies across frames in the video, MPEG video frames are categorized into three different types. Intrapictures are the least-compressed, and are used as reference frames for random access. Predicted pictures can only be decoded with knowledge of what the last frame contained, and are more compressed than intrapictures. Bidirectional pictures refer to both the frame before and the frame after, and therefore can never be accessed directly, because the decoder must have the other two frames in memory as well.
One technique with which redundancies can be found is motion compensation. This algorithm assumes that information in the current frame looks a little like
information in the last frame, perhaps moved a little within the bounds of the frame. The MPEG standard looks at 16x16 pixel sections of each frame, called macroblocks, and notes the movement in each block. If the frame is compressed using information from the previous frame and the next frame, it is said to be interpolated.
Reducing Spatial Redundancies
MPEG's principal method for reducing spatial redundancies (redundancies in a single frame) is the discrete cosine transform, or DCT. The precise mathematics behind this function are beyond the scope of this paper, but essentially the DCT turns image data into compound wave data, and decomposes that compound wave into its component frequencies. It has been found that by throwing away the higher frequencies, much data can be removed without significant losses in image quality. (If anyone feels they can explain how the DCT works in simple terms, please do!)
Putting It All Together
The MPEG frames are written to the storage medium as streams of bits representing the MPEG data. Each key intrapicture frame is in a group of no more than 6 predicted or bidirectional pictures, about 1/5th of a second's worth of video. That way random access to the video can take place, down to the nearest 1/5th second. Also, if data for one frame were lost, no more than 1/5th of a second would be damaged until another intrapicture came along to restart the decoding. The bits can thus be safely transmitted over an electronic connection, or written to a physical storage device, such as a DVD. The DVD medium encodes each bit as a pit or land (raised area) in the aluminum or gold reflective layer of the disc, following a circular path. The DVD players provide further data security by guessing at any bits that are unreadable on the disc. Since a single frame will typically contain thousands of bits, quite a few could be guessed at before the image quality will suffer noticeably.
Without video compression, working with digital video would be nearly impossible to this day. The sheer amount of data involved remains enormous even by today's standards, so compression techniques of removing redundant data remain a necessity.
Bloomfield, Louis A. How Things Work: The Physics of Everyday Life, Second Edition. (John Wiley and Sons, 2001). pp 423-428.
Le Gall, Didier. MPEG: A Video Compression Standard for Multimedia Applications. Communications of the ACM. Volume 34, Number 4. (April 1991). pp 47-58.
Press, Larry. Thoughts and Observations at the Microsoft CD-ROM Conference. Communications of the ACM. Volume 3, Number 7. (July 1989). pp 784-788.
Shi, Changgui and Bharat Bhargava. A Fast MPEG Video Encryption Algorithm. ACM Multimedia '98. (Bristol, UK, 1998). pp 81-88.