Apache2 is now out. I wonder if it should be its own node, as it is so different.

Preliminary design on Apache2 began (IIRC) in 1997. Code writing began in 1999, and now we have reached stability, albeit with some warts and misfeatures, primarily the build system.

Apache 1.3.x was a very conservative design: it followed the ancient Unix idiom of 'fork to handle a request', with the additional feature of forking subprocesses in advance to spread the initialization load over time. The benefits to that approach are the simplicity of implementation, and assurances of stability as processes are being killed off and restarted all the time (so memory leaks can't get too far along before the whole process is collected). The disadvantages in the 1.3 design are lack of execution speed, high memory requirements, and no easy way to unify memory access between the multiple processes that comprise a single server.

Enter threads. Start-up and context switching time for threads is less than 1/4 of the time for processes. More importantly, threads operate within the same memory space, so 1, your memory costs are lower, and 2, you can do new things with shared memory spaces, for example:

  • database connection pooling: no nasty db connection startup times! Persistent, shareable handles for the life of your server (and only like 5-10 for the whole server, instead of 3-5 * N processes, so big memory savings.)
  • memoized functions: LISP-type languages frequently have a macro of some kind that "memoizes" a function, ie, saves the return value and returns it via the same syntax as calling the original function; useful with expensive function calls. This trick is a single application of the more general:
  • central caching. and all interpreters can read from it, so you only need one copy.
Of course, some things have to break. Namely, most of the mod_{perl/php/etc} interpreters. Their interpreters are not what we call thread-safe: they rely on a variety of global data structures for the operation of the interpreters, for example, let's say the list of function symbols in one perl namespace is a single list in C; the functions to access and modify this list presume that only the one interpreter is storing and accessing this list. Now imagine embedding this intepreter in a threaded application: multiple threads are vying for multiple interpreter states in the same process memory space; which means the multiple interpreters are tripping over each others changes. All the interpreter bookkeeping needs to be changed, the data needs to be scoped to a particular interpreter, so they can operate independently. There is another way to modify an interpreter, but it's hairy, so I'll leave it for now.

Right now I know of only 2 common languages that are ready for apache2: mod_tcl and mod_scheme. Python2.2 should be fine, since it's thread-safe in PyWX, but I haven't seen the release notes on that yet.

Oh yeah, there's a bunch of other fun stuff in apache2:

  • The old handler mechanism is toast; now every module can choose to 'handle' a request. It's risky, as two modules could munge the other's response, but you can have several modules doing several distinct things on a single request.
  • Everything and its mother is a filter. Every module can have its output filtered or can filter others. The module system is a big chain of filters, all modifying the data stream.
  • It's really fscking fast. Numbers to come.
  • the threaded runtime has been abstracted; so you can use a runtime targeted to the platform. This is a consequence of the APR growing into an independent delivery platform.
  • IP6
  • I18N
  • and a bunch of new modules.
More as my code develops.