A type of proxy server that doesn't look like it's there.

When speaking of WWW/HTTP caching proxies, in typical use, the user needs to tell the web browser where the web proxy is. ("My web proxy is at cache.isp.fi, port 8080. Don't pass requests to 'localhost' and '*.fi' through it.")

With transparent proxy, the user wouldn't need to specify anything - all web traffic will always go through the proxy server. This is technically done with some clever packet forwardings.

However, in case of transparent caching proxies, the problem is that since the client (often) doesn't know it's talking through a proxy and thinks it has a direct connection to the server, very strange things might occur. For example, during the time my ISP's proxy was misconfigured, I kept getting E2 and Kuro5hin homepages from a couple of different users who also used E2/K5. Also, not everyone in the web run their server at port 80, and (thanks to soapish innovations in Microsoft) not everyone uses port 80 for specifically http...

Oftentimes such problems only occurr when caching is involved. In many other protocols, proxied traffic isn't cached, and will, as such, work nicely. (I'm sending this writeup, for example, through a NAT machine - all traffic from the office will go through a "masquerading" computer that makes the whole office network look like a single computer to the network that is outside the office. This system only translates internal network addresses for external network and keeps track of connections for several computers; it does not cache the data that passes through.)

Transparent proxying (formally known as interception proxying/caching) is gaining popularity with more and more ISPs. It involves intercepting packets that normally would go directly to web servers, and redirecting them to go through your proxy first.

This can be accomplished by a number of methods:

  • Installing Squid, MS Proxy Server, or similar software along with firewalling/IP filtering software on a router that your web traffic normally passes through. Note that this router now has to bear the burden of the proxy software, along with its normal routing/filtering functions.
  • If the proxy is not on the traffic path, you can configure a router to redirect packets to your proxy (Cisco routers using the route-map function, Linux routers with iptables, ipchains, or ipfwadm).
  • An access router (Cisco again, or others) can be configured to redirect traffic from dialups or other interfaces to send packets to your proxy.
  • Newer Cisco IOS versions (11.x and later) have the WCCP (Web Cache Coordination Protocol) which can be used not only to redirect packets to a proxy, but to load-balance them among several proxies.
  • Layer-4 switches like the Alteon ACE-Director or the Foundry Networks ServerIron can not only redirect by target port (the above methods only redirect traffic to port 80, for example) but can detect which packets carry web traffic, allowing them to redirect non-port 80 traffic.
Transparent proxies have a number of benefits:
  • You can cache incoming web data, gaining most of the benefits of a proxy cache, without requiring users to point their browsers specifically to your proxy.
  • You can force users to use your proxy, allowing password checking,web filtering, and other functions.
  • You can allocate your traffic to go out other gateways aside from your default, without needing to use load-balancing software or BGP.
Of course, I've found a few drawbacks in practice:
  • Without a layer-4 switch, you can only redirect port 80, which means you don't intercept non-http traffic, such as ftp, https and other potentially cacheable or proxyable applications.
  • When your proxy goes down, its a bear to set up backups (as some of the methods outlined above don't allow for alternate routes). As WWWWolf mentioned, it's even worse when the cache goes wonky without cutting out completely.
  • You get complaints from users trying non-standard stuff; also, some websites don't cache well, and you can run into problems trying to access these sites.

All in all, transparent proxying would be easier if you can afford the bigger hardware, as these have fewer problems and more advantages, than trying to run transparent proxying using cheaper/older hardware. For less than a hundred users, I've found that it's easier to get them to fix their browsers from time to time.

Some details taken from the Squid FAQ (www.squid-cache.org) as well from noding my homework.
JerboaKolinowski: Most modern proxy servers handle cookies fine, if set up properly. Your problem most likely stems from an overly-aggressive cache setup (i.e. cache everything, regardless of Expiry-date or Content-type). Complain to your admin.
With regards to proxy logs - most targeted advertising schemes work via spyware; if you're on an ISP that stoops low enough to use a transparent proxy to insert ads, by all means, switch.

The appeal of these beasts to an ISP, and the pitch made most heavily by the salesman, is that the caching will reduce demands on Internet bandwidth, which is generally the ISP's largest revenue expenditure - Internet bandwidth is expensive, and salesmen promise reductions of as much as a half or more.

The problem for the customer is that you are just at the mercy of the proxy for your web connectivity. In effect, the ISP is introducing a single point of failure into the most-used part of their network - if the proxy goes down then their support desk will be flooded by customers who have lost all web access.

(If this happens to you, your best recourse is to have an external free public proxy ready, operating on a port that isn't port 80, and configure your browser to use it - your traffic will then pass by the transparent proxy, or its layer 4 switch, which is only intercepting packets on port 80. A handy trick which I've used more than once, since, for various reasons, I'm stuck for the present with an ISP that forces the indignity of a transparent proxy on me, and it often fails.)

The advantage, in terms of bandwidth-saving, is real enough, on the supposition that a lot of customers will be visiting the same few sites. However, with the advent of more and more interactive web sites (which should be flagged as uncacheable by the proxy configuration), the evolution of various peer-to-peer protocols (which consume an ever-increasing proportion of an ISP's bandwidth) and the continuing rise in streaming Internet media, this advantage will diminish over the years to come.

One other problem, not mentioned above, is a security one. If you visit a site whose authentication is by cookie, and whose default page is authenticated, then you may either see a page left by a previous user at your ISP, or you may leave yours in the cache. For example, I've sometimes gone to http://www.everything2.com/ and seen pages that have 'belonged' to other users, from my ISP's proxy cache, and I've been told other E2 users have seen mine, including 'private' /msgs.

This is because the proxy uses the browser request and the url to tell whether a site is interactive, and shouldn't be cached, or static and should be (for example the presence of a question-mark in the url, or POST-data in the request, tells the proxy not to cache.) But if you just go to www.everything2.com, there's neither POST-data nor CGI variables after a question mark, so to the proxy the page looks like it should be cached. The way to avoid this happening (on E2) is to bookmark a node (your homenode, whatever - I use /index.pl?node=plebeian) and enter E2 there instead of the front page - the presence of the question-mark in the url is enough to tell the proxy not to cache it (assuming the proxy is configured correctly). Once you're logged in, clicking on Welcome to Everything is safe, because it will produce a url with CGI variables in it, and it won't get cached.

In fact I've seen this problem with two separate ISPs. The problem may well be that the cache settings in the proxy are incorrect, and that it will deal with cookies correctly if coaxed (though of course such delving into the details of the http headers is an ugly piece of design, necessitated by an ugly technology, somewhat akin to the way a NAT router is forced to delve into the wrong layer of the network model in order to deal properly with ftp transactions, though since http and its associates present much more of a moving target, it's likely to go wrong far more often.) Also, such technicalities may be difficult to bring to the attention of the sysadmin of a large ISP such as British Telecom's BTOpenworld, where the first line support staff have a tendency to hang up on you if you broach an issue not covered by their script system.

Further concerns, for the more paranoid, might be that the proxy's logs may provide your ISP with a very easy way of tracking your web-consumption, and (a separate issue) you can actually get transparent proxies that will insert advertising directly into the web pages you are retrieving - one example I've heard of recently, which may have been down to this, was an E2 user who found that whenever the word 'slots' came up on an E2 page, it was a link to another (unafilliated) site offering online gambling. Fortunately my ISP hasn't yet stooped so low.

Of course, transparent proxies may also be used in order to censor what webpages are available for the user to see, and I believe some organisations are already using them in this way.

Overall, and this, really, is my point, it's a technology which puts more control over the user in the hands of the ISP - and have no doubts, unless you are the executive director or a shareholder in said ISP, this is a Bad Thing - which is another reason it appeals to the larger and more commercially minded companies in the field.

By all means provide a caching proxy, make it the default, even, in the script or CD that sets up the users' web browsers, but making it 'transparent' (compulsory) is a step too far, in my view, symptomatic of the way the freedoms of the home computer user are being eroded by the concerns of big business, and I'd strongly advise anyone who has the choice to use an ISP that doesn't do this.

Log in or register to write something here or to contact authors.