How to maintain privacy when using the web

There are a couple of pieces of information that can potentially be used to identify a user to any web service. These include cookies a web site sets, your computer's IP address, and the user agent string your web browser sends when it visits a web site.

Cookies

A cookie is basically a small amount of data that a web site stores on your computer. If a web browser has cookies turned on, whenever you visit a web site, it will check to see if you've visited the site before and it set a cookie. If it has a cookie that hasn't expired, it sends that cookie along with the request for a page. There are three typical uses for a cookie: storing preference information, storing a unique identifier per user, and maintaining session information. Most modern web browsers allow you to configure settings for cookies on a per-site basis, so if one site requires cookies to work, but you don't want to use cookies in general, it's possible to do this.

Every time you visit a web site that tries to associate a unique ID with you, it will set a cookie in your browser. If you have cookies turned off, what essentially happens is that every time you visit the site, your browser says, "I've never been here before," and the site sends you a new id.

Unfortunately, if the site also puts your preferences in your cookie, you can't save any preferences with cookies turned off. If the site relies on cookies to maintain sessions, you won't be able to use it with cookies turned off (E2 fits into this category.) So in order to do this properly, you'll need to figure out how your browser's cookie management works.

To maximize privacy, turn off cookies in your browser. Unfortunately, E2 requires cookies to be turned on. So if you want to both use E2 and secure your privacy, you'll need to accept cookies on some sites and not on others.

Your IP address

It's very difficult to surf the web for a long time without running into the ad which states "Your computer is broadcasting an IP address!" In order for the computers you're visiting on the web to send data back to you (like this web page), the computer on the other side needs to know where to send it, and this is your IP address. For many people, IP addresses aren't a big deal privacy-wise: every time you dial in to an ISP over a modem, your IP address changes. If you use DSL or cable modem, it's still possible that your IP address changes periodically.

There are two ways you can potentially gain some privacy here:

  1. Use a dynamic IP address. If your IP keeps changing, your identity can't be assigned to any one IP address. The ISP you're logging into still has the capability to figure out which traffic is coming from your account.

  2. Use a proxy. A proxy is a server that's set up somewhere on the net that acts as a middleman for any connections you want to make. Your IP address doesn't get sent out, only the proxy's does. If you're using a popular enough proxy, there's a lot of traffic coming out of it, and odds are it'll be difficult or impossible to separate out your traffic. Additionally, if the proxy uses encrypted connections (HTTPS), your ISP won't be able to tell what you're doing, only that you're connecting to a proxy somewhere. You're still in danger if the proxy logs what connections you make, but since the point of many of these proxies is to give people privacy, many have policies that they don't keep track of where users are going.

Browser user agent strings

Whenever your browser connects to a web site, it tells that server what it is. Sometimes these descriptions can get pretty specific: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:1.0.1) Gecko/20021104 Chimera/0.6. Not only do browsers send what version of the browser they're running, but they'll often send what operating system you're running on, which language it was compiled for, and various other details. Many proxies will strip these off for you, so a proxy can be the answer here. Additionally, if you're using the same user agent as many other people, this can't be used to positively identify you.

Some browsers let you change the string that gets sent. If you're using mozilla, there's a toolbar at http://xulplanet.com/downloads/prefbar/ which allows you to change the user agent. Opera also has a preference that allows you to change the user agent string.

Other HTTP headers

Last-modified

This particular method is a bit more difficult to detect. It turns out that whenever a web browser hits a web page, it checks whether that page has been cached. It does this by looking at the Last-modified header. The web server can send an arbitrary date instead of the real one, thereby sending something which can identify that particular web browser. Since this header is required to be a date, the web server has much less flexibility in assigning unique identifiers than if using cookies, but most people will have no way of detecting that such a thing is happening.

As far as I know, the only people using this technique are people trying to demonstrate this as a potential problem (see, for example, http://zork.net/~mbp/meantime/). There are two possible ways to avoid this problem:

  1. Turn off the cache. This potentially uses less bandwidth, and since you wouldn't be able to backtrack without re-retrieving web pages, you'll make it easier for the server operator to construct your path through the web site.
  2. Use a proxy that strips off cache information. I don't know how common such proxies are.

Referrers

Referrers, (or referers as they're spelled in the HTTP specification) get sent to a webserver any time you hit a web site. These can be used to determine the path that you took through a web site. They're not really all that great of a signal: any time you click on the back button, your browser usually has the previous page cached, so the webserver can't track your every move, only when you see a new page.

It's possible to get some browsers to not send this information. Additionally, some proxies will strip this information out. If you're ultra-paranoid, you can type every URL in the location bar, since those URLs are assumed not to have referrers.


Thanks to Jetifi for reminding me about the ability to change UA strings.