This is cool. Pure, unadulterated fun!

The requirement here is to take a long series of numbers between zero and 255, eight-digit base two numbers ("bytes"), and convert them into a long series of numbers between 33 and 95. Why start at 33 instead of zero? Because we're mailing the result, so we need printable characters, and to a computer (at least in the 1970s when this was invented), "printable characters" are numbers between 33 and 127. We stop at 95 because the numbers 97 through 122 are the lower-case letters, and back in the dark ages there were some network hosts and/or mail systems which converted all letters to capitals. If you sent a message where the distinction between 'A' and 'a' was significant, you were screwed. For that reason, people avoided the problem. That's no longer much of a concern, but uuencoding has been so widely used for so long that momentum has carried it along.

We've reduced the possible range of values for each byte that we send, from 256 (counting the zero) to 63. Using six bits out of each byte instead of all eight, you can express 64 different values; that'll do fine, with one to spare. What this means now is that we've got two bits in each byte which we've got to put somewhere. We'll need them later, so we can't just discard them.

The way we do this is by spreading them out; here's an example of some bytes:

00101101 11010010 00100010

That's 24 bits. We can very easily divide up those three eight-bit chunks into four six-bit chunks:

001011 01 1101 0010 00 100010

I've left the original spaces in place, in the hope of staying clear on what went where.

We can do the above for each three bytes of the source, and end up with a new series of numbers, with four numbers for every three we started with. Of course, we've still got a problem here: For good and sufficient reason, we've decided not to use any values lower than 33. Our new numbers are all in the range zero to 64, so they'll have to change. What we do now is add 32 to each of our new numbers; the zeros will come to 32, which is too low for us, but we'll just arbitrarily make all of those equal to 96 and remember to take that into account on the other end, when we decode it all.

So now there we are: We've taken our eight-bit crap which mailers would have macerated, and converted it into more constrained crap which will slide unharmed through the gut of any mailer we're aware of. There are a few other things that happen before we're really finished; we cut up our output into lines of 60 characters each (and one odd line at the end), and prefix each line with a character indicating its length. Never mind that, it's just an implementation detail.

When our mailer-proof crap reaches its destination, the receiver uses uudecode algorithm to restore the original crap. There's nothing exciting there; uudecode is just the same steps in reverse.

If anybody has any suggestions to make this writeup more clear (or, god help me, more accurate), please let me know. Thanks.


(I'm checking the GNU uuencode source for their approach to dealing with "orphan" bytes if filesize mod 3 isn't zero; I haven't written my own uuencoder, otherwise I'd've found out the hard way and I'd already know :)