(Note: This is not a technical writeup, it's intended to clear up some things about distributed networking and Gnutella to the general public.)
What is distributed computing?
is where a task is broken up into multiple pieces and shared with other machines. Each machine does their part, and then the results are merged together, netting a result for a problem that no individual machine could solve by itself. Distributed computing has been brought to the public consciousness by such efforts as distributed.net
. People see these efforts and join on for the fun value of seeing their computers work towards a common goal
. For the uninitiated, distributed.net is a public effort to crack various encryption
methods posited by the government as a standard. The goal is to show that "uncrackable" codes can be broken, and they've has several large successes. The SETI@home
project has more imagination, however, and it allows computer owners to make their computer use its spare CPU cycles to scan data from outer space for signs of life. This appeals to the sci-fi fan in all of us, and has been very popular.
Nothing could compare to the popularity of Napster
, though. Napster has brought distributed networking to the public eye. The lure of free music, regardless of the legality or lack thereof, is what launched distributed networking. Nullsoft
, makers of the hugely popular WinAmp
, wanted to get in on this market. Gnutella
, Nullsoft's truly distributed file sharing program, was released to an eager public, and then pulled from their page within a day. Apparently Nullsoft's owners, America Online
, didn't like the controversy surrounding Napster, and didn't want to be touched by it at all. Understandable.
Why do we still hear about Gnutella?
Gnutella was released and downloaded by many people. They shared it with their friends. The strength of a distributed file sharing network is that the more people who use it, the more capacity it has
. So everyone who owned Gnutella wanted to share it with their friends. Unfortunately, Gnutella was not to be updated, as it was not a supported product
by Nullsoft any more. So, users of Gnutella were left with an early version of a program that would always be stuck incomplete.
Why did Gnutella not die off in the face of the much more polished and popular face of Napster?
Napster uses a centralized database
. Every time you start up Napster, you hook in to Napster's master database and tell it what files you have available for download. Then, to search for that Metallica MP3 file
you want, you send the request to Napster, and they look through the list of all of the people who have connected in, and give you a list of people to try getting the file from. Napster knows all, sees all.
Many internet users value privacy, and the recent monitoring of Napster users by people such as Metallica
is made easier by this setup. Gnutella, on the other hand, has no centralized database. You connect to a friend on the Gnutella network
, and your IP number
(a location on the internet) gets shared with everybody else on the Gnutella network within a certain distance of you (more on this later). The files you have are not broadcasted, only the fact that you are present.
Another reason for the emerging popularity of Gnutella is the fact that any kind of file can be shared on Gnutella. Napster disallows any filetypes except for MP3 audio
. On Gnutella, you can commonly find MP3 audio, as well as images and videos (commonly of pornography
). The versatility is appealing to current users, and very important for the future. We may very well see a distributed system such as Gnutella start to fulfill many of the roles currently played by the HTTP
protocol (web browsing
Finally, Gnutella has not been frozen with the version released by Nullsoft. Many people reverse engineered
the Gnutella protocol. The Gnutella protocol is simple to use and understand, and, as a result, many Gnutella clones
have been developed rapidly. Some of these are starting to rival Napster in terms of polish and quality.
What is the Gnutella protocol?
Please note that this is not intended to replace a real education on protocols, networking, or anything.
A protocol is a networking language. The Gnutella protocol consists of five packet types. Basically, this means there are five "verbs" that can be understood by Gnutella currently. These are Ping
, Query Result
, and Push Request
- Ping is used to announce your presence on the network. When you send a ping packet, your are asking who's out there.
- Pong gets returned by every machine who receives a Ping. Included in the Pong packet is information on the sender, including how many connections they have open, and how many files they are sharing.
- Query is sent by your client whenever you want to do a search for a filename. The query gets broadcast to all machines connected to you, just like Ping, who pass it along as well. Every client who receives a Query packet performs a search through their file database. Every client can implement this search differently, so your mileage may vary according to who you're connected to. For example, if you search for ".JPG", some clients will return any file that has ".JPG" in the filename. Other clients may treat that as a "regular expression", which may net you completely different results. And other clients may even do a search through the files for JPG as a keyword. You don't know how the Query results are acheived when you receive a...
- Query Result packets are sent by clients who have succesfully found at least one match to the Query. It gets routed back along the path the Query came from. There is no way for a client to know who a Query came from directly. This allows anonymity on the Gnutella network, and prevents people from collecting a list of everybody who searches for "KKK", for example. When a Query Result packet is returned, information on the files found is sent along with the IP of the client where the files were found. Then, you can open a simple HTTP request to the machine and download your files. Unless, of course, your target or yourself are behind a firewall, leading us to the last packet type...
- Push Request is sent when a direct connection is unavailable. The Push Request gets routed through the network just like a Query or Ping packet. It is targetted, however, at the machine who sent the Query Result you are looking for. When a Push Request is received, the client will try to open a connection itself to the requester, hopefully bypassing the firewall. Unfortunately, in the current implementation, this rarely works.
As machines connect on to the Gnutella network, they send out a Ping packet and record the Pong responses. They can then connect to any machine that returns a Pong, and have multiple connections out on to the mesh
of Gnutella. As others connect to your own machine, you can end up connected to multiple spots spanning the mesh. This is important because every packet you send out has a Time To Live
, or TTL
. If every ping packet sent was passed on continually, you would immediately get the Ping bouncing back and forth over the network until the entire Gnutella network collapsed from the traffic. So Gnutella borrows the IP
concept of TTL here. As a packet gets forwarded, the TTL gets decremented. When you receive a packet with a TTL of 0, you don't forward any more. This gives each packet a life span. So the more connections you have on the mesh, the more areas your packets will reach, and the more likely you will find a valid Query Result.
So what makes Gnutella interesting?
As Gnutella clients evolve, it is very likely we will see more interesting content
available. This will likely include web pages in the future. Imagine a web search engine that wasn't constrained by an index maintained by who knows who. Here's a scenario. I'm looking for a used car, so I perform a search on a Honda
dealer's web page for a particular model. He searches all of his own pages for information that is relevant, and passes on the search to other ten other people he knows who might have information as well. Each of these ten sites do the same thing. If your search has a TTL of 5, you have effectively searched 100,000 clients. Nobody was particularly impacted by your search, individually, the load was spread out. And each client does not translate to a web page. Each of those clients will likely be scanning 50-50,000 web pages hosted on the same machine.
Have you ever performed a web search, found something that had exactly what you were looking for, and clicked on the link only to find the dreaded "404
"? Gnutella searches can be much more fresh
. There won't be the same need to have massive databases updated rarely. Information will become much more dynamic and accessible.
The development of the Gnutella protocol has been unique so far. A dropped program, reverse engineered by a number of programmers in their spare time, is definitely not a typical process. Gnutella's development so far has been a queer survival of the fittest, and it's been a fascinating ride. The Gnutella movement needs clear thinkers to help it out, if it is to survive. And the goal is very worthwhile. A succesful Gnutella network could force the government to reevaluate such critical cases as DeCSS
right now. We're on a roller-coaster that's just starting down the hill. Intellectual property
will change in this decade. For better, or for worse.