HTTP compression - Everything2.com

DON'T BE A TORTOISE

You're surfing along, and, unfortunately, unlike all your friends, you're still using a dialup modem. This means that pages take a while longer for you to download, like the E2 home page. And sometimes you're downloading the list of every known organism off of the National Center for Biotechnology Information's site, so it doesn't matter that you have DSL or cable:

Everything2.com             NCBI's Huge List
------------------         ------------------
Speed  | Load Time         Speed  | Load Time
-------|----------         -------|----------
14.4 K |  14 sec.          14.4 K |  98 sec.
28.8 K |   7 sec.          28.8 K |  49 sec.
56.0 K |   3 sec.          56.0 K |  25 sec.
128+ K |  <1 sec.          128+ K |  10 sec.

This is where HTTP compression comes in. Taking a file and making it smaller is known as compression, and HTTP is the protocol by means of which content is transfered over the web. So you can conclude that HTTP compression would be compressing the content which is transfered over the web.

HTTP compression software is installed by the owner of a webserver, directly onto the server. Browsers identify themselves as accepting compressed content (ACCEPT-ENCODING = GZIP, DEFLATE) if they can, and most browsers can. The HTTP compression software on the webserver sends compressed, or smaller, content to browsers which can accept compressed content. The browser then decompresses the content, and shows it to you. This process takes barely any time. Say E2 and NCBI both had HTTP compression software installed. Then the new download times would be as follows:

Everything2.com             NCBI's Huge List
------------------         ------------------
Speed  | Load Time         Speed  | Load Time
-------|----------         -------|----------
14.4 K |   3 sec.          14.4 K |  11 sec.
28.8 K |   1 sec.          28.8 K |   5 sec.
56.0 K |  <1 sec.          56.0 K |   2 sec.
128+ K |  <1 sec.          128+ K |  <1 sec.

Comparing these figures to the original charts, you can see that there's a huge improvement in download time. E2 is about four times faster and the NCBI list is about ten times faster. You might ask why E2 isn't ten times faster, too.

First of all, there's the fact that the NCBI list was larger to begin with. Larger files tend to have higher compression ratios than smaller files. In fact, most HTTP compression software does not compress the smallest of files, since compression would actually make these files larger. But there's another major difference: the NCBI list is static content, while E2 is dynamic content. Static content is content that is the same for everybody. Dynamic content is content that the webserver changes depending on who's looking at it. For example, if you look at a text file on the Net, and I look at the same file, we will see the exact same thing. This means that a text file is static. Now, look at E2's homepage. You might see your username in the Epicenter nodelet, but I see mine. This means that E2's homepage is dynamic (as is the rest of the site).

Compression methods for static and dynamic files differ. Static files aren't going to be changing at all, so they can be precompressed and the compressed versions can be stored in a directory known as the compression cache. This is called caching. On some webservers, you will not be able to get a compressed version of static content unless a compressed version already exists in the compression cache. When a static file is compressed for a user when no compressed version cached, it is known as on-demand compression. The resulting compressed file of on-demand compression is usually deposited into the compression cache.

Compression of dynamic files works differently since the webserver must first make changes to the file before the HTTP compression software can compress and send it. All requests for compressed dynamic files are on-demand requests (thought that's not what they're called), since the webserver first does processing, and then the HTTP compression software compresses the file. A new file is generated, and therefore a new file must be compressed, for every request. Some websites have only static or only dynamic compression enabled, depending on what type of content most of the site is composed of. It is usually possible to treat dynamic content as static content, since most HTTP compression software allows you to change which file extensions are treated as which type of file.

However, HTTP compression softwares vary in this, as they do in many other things. Some common software is PipeBoost (http://pipeboost.com/), FuzzyCompress (http://fuzzelfish.com/fc/), XCompress (http://xcompress.com/), and the compression built into PHP 4+. Most of these utilities operate in the same method. Some (such as PipeBoost) offer an online service that allows you to estimate how much compression will be done on a certain webpage using that product. Fortunately, since these products are all similar, a single such service will allow to estimate compression for all other products, too.

POSSIBLE PROBLEMS

This gets a bit more technical now. The browser identifies itself as accepting compression through the HTTP request header HTTP-ENCODE = GZIP, DEFLATE. Netscape and others may support only GZIP, not both GZIP and DEFLATE. Problems occur when the HTTP compression software sends compressed content to incompatible browsers. This may happen if the software does not check for compatability, or, much more likely, when the browser incorrectly identifies itself. Browsers that kids hack together in a few hours can have this problem. Also, sometimes a proxy server identifies itself as compression-compatible, when, indeed the browser itself is not. By W3C specifications all HTTP 1.1-compatible browsers should support HTTP compression, but not all do. Also, further checking is required since some HTTP-1.0 browsers support HTTP compression.

Overall, HTTP compression is a very useful technology. It's only implemented on a few websites, but I think we will see that number increase over time.

And that's all, folks…

PHP: How to use output compression	Zzyzzyxx	URL escape sequences	Making your web site more cache friendly
META http-equiv	Ad-aware	How to get Apache to send compressed versions of static HTML files	Indent-o-Meter
self-extracting executable	Dial-up	cache	server push
HTTP refresh	HTTP	compression	Wget your Webcomics!
bull pizzle	A Modest Video Game Proposal	J-league	HTTP Methods
referer	What is edev?	Men can download naked women. Women can't download men worshipping them. Ha ha!	The Genetics of Hair Color