display | more...

This is a paper written by Michael Lesk, you can get it at http://www.lesk.com/mlesk/ksg97/ksg.html. This paper has some very interesting numbers (they are a little outdated now but still interesting non-the-less).

The First thing that Lesk mentions is how much information is in the LoC (Library of Congress). It is generally given as 20-terabyte, but this only takes in to account Text. The LoC also has about 13 million photographs, even if compressed to a 1 megabyte JPG each would be 13 terabytes. It also has 4 million maps in the Geography Division might scan to 200 terabytes, over five hundred thousand movies, at 1 gigabyte each they would be 500 terabytes (most are not full-length color features), and 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 terabytes. This makes the size of the LoC more like 3 petabytes. But the problem with this is mostly published materials. In a very limited test Lesk did he found that only 28% of the web sites contained published material.

So who much information is out there that isn't in the LoC? Well if United States manufactures 38 million tons a year and if each ton contains 220 sheets of paper and each sheet has 5000 bytes. That is 8,000 terabytes of text each year. Now some of this paper isn't used to write on or maybe used to make copies of pre-existing information. So if half the pages have new information and each page is copied an average of 100 times, that still leaves 40 terabytes a year. The US makes up about 1/4 of the worlds GDP, so if we multiply the US's information generation by 4 we get average information generation for the world of about 160 terabytes.

What about Non-Text mediums? There were about 4,615 films made world wide in 1989, if we take it that the average movie is 7200 seconds and 5 megabytes/second, that makes about 166 terabytes. There are about 52 billion photo's taken each year (1996), if each of these is compressed using JPG to 10 kilobytes, that is 520 petabytes. This number doesn't include NASA's earth observing project that captures about 11,000 terabytes (1996). Then there is broadcasting in the US there about 1593 TV stations, if we use 5-megabytes/second compression and there are 30 million seconds a year. That is 200 petabytes, but only about 1/10 of the programming is actually new (like we didn't all ready know this), we end up with 20 petabytes. If we extrapolate this number to the entire world, we have about 80 petabytes. Radio takes a relatively little amount of space, there about 6,956 radio stations in the US. If we use 8-kilobytes/second compression, it only takes up about 1.7 terabytes or about 6.8 terabytes would-wide. Then there is CDs (407 million), cassettes (336 million), and vinyl (20 million!) (all stats of 1992). Assuming 550 megabytes per CD and cassette that would be 400 petabytes, but if we discount duplicates (30,000 for every 1 original) we end up with 15 terabytes for the US and 60 terabytes world-wide. Last, and most surprising, is telephony, in the US (1994) there were 500 billion tolled call-minutes and about 20 times as many local calls. If compressed at 56 kilobytes/second it would take up about 4,000 petabytes.

(Here comes the scary part!!)

Now human memory. It is suggested that the brain holds about 200 megabytes of information, this number takes in to account the rate at which information is forgotten and the amount of information need to do normal activities. (This number comes from T. K. Landauer's "How much do people remember? Some estimates of the quantity of learned information in long-term memory".) That means if there are about 6 million people in the world that means the memory of every living person in the world takes up about 1,200 petabytes.

But what about the words we forget? If the average American in one year (all stats from Census 1995) spends 1,578 watching TV, 12 hours watching movies, at an average of 120 words/minute that is about 11 million words or about 50 megabytes of Ascii. 354 hours a year are spent reading newspapers, magazines and books at about 300 words/minute that is about 32 megabytes of Ascii. This means in seventy years you are exposed to only about 6 gigabytes of Ascii.
In conclusion:

There is about 12 exabytes of information out there right now and it is growing at about 4 exabytes a year! This means that if we want to put down all there is we better get cracking. And get some big hard drives!

Log in or register to write something here or to contact authors.