While the theorem might seem esoterically mathematical, its actually simple common sense ( Isn't most of science just that? )
To understand it, we need to know what sampling means. An audio signal from a microphone or other source is a continuous stream of electricity which is at any point of time proportional to the rapid pressure changes of the sound waves. When storing into a digital medium like a CDROM, the signal is measured at regular intervals ( the sampling rate ) and the levels of the samples are stored digitally. The more frequently the samples are taken, the more accurate the reconstruction of the sound will be when played back
Now, take the case of the most simple sound wave as produced by a tuning fork, which is a sine wave of a single frequency. To reproduce any semblance of the sine wave, you would have to sample the wave at least twice per cycle ( once at the peak and once at the trough), which would result in a triangular wave of the same frequency. Of course, there is a very slim chance that the samples might fall at the points where the wave reaches zero, but the odds are very less. Sampling at a lower frequency, however would always result in a waveform which had no resemblance either in shape or frequency to the original.

Simple! Right?