Lossless Compression is any method of compressing information such as text and images without any loss of information upon decompression. Lossless Compression works extremely well with big chunks of text.
The need for compression is obvious, but how does it work? How can you take a chunk of information, compress it, then decompress it and receive the exact same information you had prior to compression?
The process is pretty simple -- the compression program searches the information for patterns, it then constructs a dictionary which contains all the patterns. Next, the original string is replaced with a new string, in which all the repeating patterns are replaced with pointers to the dictionary.
Let's take an example text block and run it through this compression process.
Don't know what this site's about? Don't understand what's going on? Can't instinctively act on your primitive aggression? Check out the Everything FAQ (or) Everything University to learn more!
As you can see, some of the words are repeated, these are patterns. Let's store the patterns in the dictionary and replace them in the text block with a pointer to their locations.
Our dictionary contains:
1. Don't
2. know
3. what
4. Everything
5. on
1 2 3 this site's about? 1 understand 3's going 5? Can't instinctively act 5 your primitive aggression? Check out the 4 FAQ (or) 4 University to learn more!
While we look for patterns in words, the compressor will look for any repeating strings, including separate letters and spaces. Let's look for better patterns in the text block. We'll mark the space character as _ (underscore).
Our dictionary contains:
1. Don't_
2. know_
3. what
4. _Everything_
5. _on
6. ou
7. ti
8. 's
9. or
123 this site8 ab6t? 1understand 38 going5? Can't ins7nc7vely act5 y6r primi7ve aggression? Check 6t the4FAQ (9)4University to learn m9e!
You can see that we replaced much more patterns and removed many repeated strings, this is bound to make the text block somewhat smaller. The compression ratio will improve as the text block grows bigger; more patterns will be replaced with pointers, leaving the dictionary at its same size.
Please note that this was a very simple demonstration of how Lossless Compression works; advanced compression programs can look for extremely complex patterns we'll never be able to spot and calculate which patterns are better than others, overwriting inferior patterns.