Hurd: Hird of UNIX Replacing Daemons
Hird: Hurd of Interfaces Representing Depth
The Hurd is the GNU Project's replacement for the UNIX kernel, but itself is not an operating system kernel. There seems to be a little confusion on this point, some people calling the Hurd a 'kernel', others calling it an 'operating system'. The Hurd is actually a framework of system services running on top of a microkernel, which makes it conceptually closer to an operating system than to a kernel. Currently, there is an implementation of such framework, (maintained by the Debian people and named Debian GNU/Hurd), which uses a modified version of the Mach 4.0 microkernel, named GNU Mach, as its nucleus. There is also an ongoing project to port the Hurd to the University of Karlsruhe's L4Ka microkernel.
The Hurd project was started early in 1991, after the University of Utah released their Mach kernel under a license friendly to the GPL, allowing it to be extended and tailored. But embedding a microkernel into the GNU system is not simply a matter of "implanting” it there. Micro kernels implement only basic services like IPC, basic hardware support and memory management, and as the aim of the Hurd is to substitute the UNIX kernel, it must deliver also more generic services such as network protocols, POSIX compatible system calls, file systems, etc., to name but a few. To afford this, a microkernel based O.S. requires either a super-task on top of it to deliver and direct all these conveniences (that's a 'single-server approach', used by L4-Linux) or a set of simpler, specialized servers, one for each type of service (this is the "multi-server approach"). The FSF elected the later, and started to build up a herd of UNIX replacing daemons, which they labeled 'the Hurd' (note that the article 'the' is part of the name).
Shortly after it was launched, the Hurd project became somewhat irrelevant (at least to the urgent needs of the then up surging free software community), when in 1991 Linus Torvalds released his Linux kernel under the GPL. Because it is a monolithic, all-in-one kernel, Linux was quickly integrated into the GNU system, and GNU/Linux distributions (popularly known as just " Linux") started to pop out everywhere. The Hurd people continued their efforts, but not with the intensity that such a project would require. In 1994, the Hurd reportedly boots for the first time, and in 1997 was released a 0.2 version of the system. In 1998, Marcus Brinkmann started to work on a Hurd based Debian distribution, whose ISO images are available by FTP at ftp.gnu.org/iso.
A short description of the Hurd
Three components form the basic architecture of a Hurd system: a microkernel, system servers, and the C library. As said before, the microkernel provides only basic hardware services, inter-process communication and resource (memory and processor) management. The Hurd servers are user space programs that implement mid-level services such as file systems, network protocols, process control (not to be confounded with tasks and threads control, more on that below), etc. By providing a set of POSIX compatible system calls, the C library - usually GLIBC - completes the Hurd's structure.
The Hurd follows the philosophy according to which only the very basic, rigorously hardware dependent services should be put into the kernel but, on the other side, it expects that all the hardware dependent parts of the system are put into the microkernel; this frees the servers to do more general, abstract work and ensures that they can be ported to different architectures with minimal code adaptation.
Since the very beginning of the project, the Hurd’s "official" microkernel has been Mach. In theory, however, the Hurd should be microkernel independent, it should be designed in a way that would be easy to switch from one kernel to another as the need surges. GNU Mach is, according to Marcus Brinkmann, not the “sexiest kernel in the world, but more like an ‘old mamma’ “. L4Ka has a number of advantages over Mach, including rapidness and superior device driver interface, and has all the potential to be Mach’s substitute in the not so remote future. But whatever microkernel will be used by the Hurd, it will have to cover the following basic conveniences:
- General hardware abstraction: Any operating system kernel must set a scheme in which user programs are sheltered from hardware access details. This includes interrupt servicing and device drivers' direct access to hardware, amid other services. Mach takes its device drivers from the 2.0 branch of the Linux kernel.
- Memory management: Memory management is handled by the microkernel because it is naturally hardware dependent, once that, in order to control the use of system memory, a process must deal directly with the MMU (Memory Management Unit: a microchip, in charge of memory allocation and paging at hardware level in the PC architecture). That is, if a user program is to be executed, the microkernel is the piece responsible for getting the necessary memory space;
- Tasks management: The microkernel is also responsible for organizing the processes that share the computer's resources into tasks and threads. Context (task) switching interferes directly with the processor's registers, thus it’s very hardware dependent, and goes to the microkernel.
- Message passing (IPC):
In multitask, multi-user operating systems, inter-process communication is an especially essential service. It's responsible for passing messages from one process to another, for synchronizing hardware access, and for making RPC feasible. The Mach microkernel has in its IPC mechanism ( The Mach Ports) one of its highlights. A monolithic kernel doesn't have to bother too much about IPC (at least not when it comes to in-kernel only processes), but a microkernel based system, and more over one that uses a multi-server approach, relies heavily on IPC to maintain steadiness and even to perform basic operations such as memory access; As opposed to Linux, where all functions of the kernel have direct access to any data structure within the system code, the Hurd servers’ calls for external services must be performed via IPC or RPC; this raises the need for a high-speed and trustworthy communication protocol for the servers to interact with each other and with the kernel. In the current implementation of the Hurd, the Mach ports mechanism provides such a protocol, and the Mach Interface Generator (MiG) can be used to produce code suitable for using in the dialogues between servers.
At code level, GNU Mach’s ports are much like UNIX’s file descriptors, except for that they are used as one-way message queues to pass information between tasks. For example, every opened file in a Hurd system has a port associated with it in the file system server. In order to perform write or read operations on that file, a process must request a port to the file, and then send the data through such port. The original Mach also provided network transparency of RPC trough the ports mechanism, but this feature is not present in GNU Mach.
The Hurd Servers:
The key point about the Hurd is delivering freedom to the user. Common computing environments restrict user’s actions in order to enforce safety. For example, regular users cannot install software that deals with the kernel level services, nor are they allowed to shut down parts of the system incorporated by the kernel, because normal systems have a monolithic kernel, and shutting down a file system module (on Linux) would shut it down for all the users on the system. With the Hurd, however, non-administrative users can easily turn parts of the system on and off as they feel the need, without messing with the other users' system structure. That is, if I turn the SMTP protocol off, this only have effect to me, other users can still send their e-mails. Of course, such an audacious arrangement could not be afforded without a rigorous scheme of servers’ isolation and a clean, well defined set of interfaces amongst them.
Some of the already implemented Hurd servers are:
- File system server:
File system servers are "the most important" Hurd servers, because not only they handle file name resolution but also map Mach ports to file descriptors. File system servers across the system are arranged in a tree-like structure. There's a root server, which responds initially for all the file requests, and there are subsequent, localized file servers, which hold ports for specific regions on the file system structure. Filename resolution works like this:
(...) pathname resolution is used to traverse through a tree of servers. In fact, filesystems themselves are implemented by servers (let us ignore the chicken and egg problem here). So all the C library can do is to ask the root filesystem server about the filename provided by the user (assuming that the user wants to resolve an absolute path), using the dir_lookup RPC. If the filename refers to a regular file or directory on the filesystem, the root filesystem server just returns a port to itself and records that this port corresponds to the file or directory in question. But if a prefix of the full path matches the path of a server the root filesystem knows about, it returns to the C library a port to this server and the remaining part of the pathname that couldn't be resolved. The C library than has to retry and query the other server about the remaining path component. Eventually, the C library will either know that the remaining path can't be resolved by the last server in the list, or get a valid port to the server in question.
--- Marcus Brinkmann
Now, if you think that sounds like a DNS resolution, then you've got the point.
- Process server
A Hurd system has a default Process Server, whose function is to do system wide process bookkeeping. Tasks and threads switching are done at the microkernel, but if POSIX style information about a Mach task is needed, the Process server is the place to look after for it (provided that the task is registered in the process server, which is optional). The process server assign every registered task with a PID, and can hold information such as environment variables, argument vectors, and more general information like hostname and system version of the system running the process.
- Authentication server
Authentication servers handle user identities. They do not do password checking, they only provide trustworthy information about system users. Other servers may chose whether they do or not trust a given authentication server. Users can have more than one id, and can dynamically 'mount' new ids by merging two or more identities (perhaps ending up with a wider range of permissions).
Translators are pieces of software responsible for translating generic resource requests into more specific requests for the physical resource being requested. For example, a request for the ftp://ftp.gnu.org/iso may come to the file system server under the form 'cd ftp/gnu/org/iso’. The file system translator is accounted for translating such generic request into a series of requests capable of finding and fetching a port to that directory, regardless of the subsequent operations that such search would entail (in this case, a call to the network I/O server, then to the FTP protocol server, and so on).
The C Library:
The Hurd’s C library is basically a collection of C functions that provide a POSIX compatible interface to the microkernel. In the late 2002, the POSIX threads library was integrated into Hurd's GLIBC, replacing the old Mach's cthreads library implementation there. The Hurd servers make intense use of threads, taking advantage of the Mach's excellent IPC interface.
Where the Herd is going to
It's undisputed that the Hurd is very powerful and its approach to system design is extremely adequate for the present needs of O.S. development. Personally, however, I think that the lack of dedication to the project and the insistence on using Mach could have lead an advanced operating system framework to get excessively dependent on a microkernel that's still using ancient and incomplete device drivers and which has known performance deficiencies. Conceptually, the Hurd is said to be microkernel independent, but the fact that until now it has been used only with Mach, implies in some of Mach's mechanisms being so deeply rooted into the system that a port to any other microkernel will require a lot of work, in order to remove these specific 'hooks' off from the Hurd servers.
And just to end it well, some words of advice for young people:
Just say NO to drugs, and perhaps you won't end up like the Hurd people. ---Linus Torvalds.