name based virtual hosts (thing) by benjya

Name-based virtual hosts could be said to have revolutionised the way websites are published. But to understand how and why they work, we need to have at least a basic understanding of the underlying DNS system.

The DNS system is what translates a name - say "www.everything2.com" into an IP address - 141.211.29.11 in this case. But it's not necessary a one-to-one mapping, as you can define as many names as you like that all point to the same address. However, the name-to-address translation is done on the machine that's establishing the connection, and the connection itself is purely established to a remote IP address, with the remote end (in this case, the web server) not knowing what name was originally entered.

So, when you enter a URL into your web browser, you enter something like this.

http://www.everything2.com/index.pl?node_id=124

This can be translated into three parts.

http - this is the protocol that we're using, establishing it as a normal request from a web browser.
www.everything2.com - this is the remote server that we're going to talk to.
index.pl?node_id=124 - this is the page we're requesting from the remote server. In this case it happens to be a dynamic page, but for the purposes of this writeup, the actual type of page isn't relevant.

So, before the days of Name-based virtual hosts, a browser would perform the following steps.

It resolves "www.everything2.com" to its IP address - 141.211.29.11.
It checks the protocol - http - which travels over tcp port 80, so it opens a tcp connection to 141.211.29.11 on port 80.
It then knows its talking to the correct remote machine, so it simply sends the line "GET /index.pl?node_id=124" to the remote browser. It may also send other pieces of information, but this is the key line that tells the server what data it wants.

The problem, which should be clear here, is that the remote server has no idea what name it was originally called under. Bring on the HTTP 1.0 specification (which recommended support for name-based virtual hosts), and HTTP 1.1 (which requires it, and is the level supported by all modern browsers).

The only difference is that rather than just asking for the appropriate filename, the entire URL is sent to the remote server - most importantly including the remote server name. So although when the connection is first established the remote server doesn't actually know what hostname the user typed in, this hostname is then included in the "GET" (or "POST") request, so the web server knows which page to return.

This has made a huge difference to anybody who hosted multiple websites on a single server's IP address - and with the global shortage of IP addresses, this is essential. In the past, if you had multiple websites (which could have been owned by users of an ISP or even of a business), they would have to look like this.

http://www.ispuserserver.com/username1
http://www.ispuserserver.com/username2
http://www.ispuserserver.com/username3
http://www.ispuserserver.com/username4

Now, though, with name-based virtual hosts, we can have

http://username1.ispuserserver.com
http://username2.ispuserserver.com
http://username3.ispuserserver.com
http://username4.ispuserserver.com

Furthermore, if the user or business concerned has their own domain, they could have "www.companyname.com" pointing to the same IP address as dozens (hundreds, thousands even) of other names, but the web server will always know which page to return.

The only minor flaw with this plan is that it won't work with HTTPS - SSL secured browser connections. When you establish an HTTPS connection (which uses, normally, port 443), the entire data session is encrypted - including the actual HTTPS GET line itself. For the web server to decrypt this, it needs to know the encryption key. But if the server was trying to use name-based virtual hosts over SSL, it wouldn't know which encryption key to use to decrypt the request, because it needs to decrypt the request to see which host's key it needs to retrieve! This can be worked around by running multiple SSL web servers on different ports on the same machine, but it's somewhat less transparent (and, of course, isn't a name-based virtual host!)

Chances are that most websites you access, with the exception of large companies who have enough traffic (and money!) to justify a private server, are hosted on the same server as plenty of other sites, using name-based virtual hosts.

one-to-one correspondence	HTTPS	IP address	web browser
domain	hostname	HTTP	SSL
host	one-to-one	domain name	dynamic
website	URL	web server	DNS
ISP