A One Paragraph Explanation
The INITial Ram Disk (initrd) is a feature of the Linux kernel, introduced circa 1996 by Werner Almesberger and Hans Lermen. The initrd mechanism allows an image of a filesystem to be loaded into memory right after the kernel, during the initial steps of the bootstrap process. This filesystem will later be mounted by the kernel as the root filesystem, after the kernel has initialized. A filesystem mounted as an initial ramdisk differs from a plain root filesystem in that it can be unmounted, and replaced by the "real" root filesystem.
The motivation
This ability of the Linux kernel can be used to add much flexibility to the boot process. Once the initial ramdisk is mounted and until the initrd stage is over, this filesystem acts as a full-fledged root filesystem from which programs and scripts can be launched. kernel module can be loaded and various tests can be executed - before the "true" root filesystem has been determined and the kernel has finished its part of the boot process.
To demonstrate this flexibility, imagine you'd like to prepare a live Linux system burnt to a CD-ROM. Bootable CD-ROM technicalities aside, once your kernel is loaded from the CD-ROM, how would it know in which drive is the CD-ROM which should be mounted as the root filesystem (hda, hdb, sr0, sr1, ...?). Theoretically it's possible that the boot loader would know which device is booting the system, and patch the kernel on-the-fly (like rdev does on-the-not-fly) to mount the root filesystem from it. Even if you had such a boot loader (I don't know of any), asking it to patch the kernel assumes a lot of logic on the boot loader's part, which is probably a Bad Thing.
Using initrd, all the boot loader needs to do is load a bit more than just the kernel into memory (merely load the initrd image right after the kernel). Once the initrd is mounted as root and a process is run from it, you can use convenient userspace tools (did anyone say 'Perl script'?) to find the device from which the root filesystem is mounted. Set the kernel to mount that, and bravo - problem solved. Same goes for booting from other exotic media (USB storage devices, NFS, you name it). Creating an initrd disk is relatively easy (assuming you're a reasonable UNIX administrator, which is a reasonable assumption if you've read this far).
It's very common these days to ship general purpose Linux distributions (from Red Hat to Knoppix) with a highly modular kernel that loads the kernel modules necessary to mount the root filesystem from an initrd.
Boot sequence with initrd (taken from 2.4.20 documentation):
- The boot loader loads the kernel and the initial RAM disk.
- The kernel converts initrd into a "normal" RAM disk and frees the memory used by initrd.
- initrd is mounted read-write as root.
- /linuxrc is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything init can do).
- linuxrc mounts the "real" root file system.
- linuxrc places the root file system at the root directory using the pivot_root system call.
- The usual boot sequence (e.g. invocation of /sbin/init) is performed on the root file system.
- The initrd file system is removed (by some startup script for instance).
What might have raised an eyebrow or two is the reference to the little-known
pivot_root(2) system call, which is used to replace the root filesystem with another mounted filesystem, and move the old root filesystem to a mount point on the new one. pivot_root is a relatively new mechanism. Until pivot_root, one had to write a
dword of the
device major/minor number of the root filesystem to be used after the initrd stage into a file in /
proc (/proc/sys/kernel/
real-root-dev, if you must know). The old, deprecated mechanism is commonly called "
change_root", while the new, supported mechanism is called "pivot_root".
For the sake of completeness, here's how change_root used to work:
- The boot loader loads the kernel and the initial RAM disk.
- The kernel converts initrd into a "normal" RAM disk and frees the memory used by initrd.
- initrd is mounted read-write as root.
- /linuxrc is executed (this can be any valid executable, including shell scripts; it is run with uid 0 and can do basically everything init can do).
- When linuxrc terminates, the "real" root file system is mounted.
- if a directory /initrd exists, the initrd is moved there otherwise, initrd is unmounted.
- The usual boot sequence (e.g. invocation of /sbin/init) is performed on the root file system.
How to do it - High speed crash course
mount -o loop initrd /mnt/initrd ; cp `which sash` /mnt/initrd ;
ln -s /sash /mnt/initrd/linuxrc ; cp `which busybox` /mnt/initrd ; umount /mnt/initrd
Next you should probably compress your initrd image and set your boot loader to load it (read its documentation, LoadLin, SysLinux and LILO all support initrd).
gzip initrd
Once your executeable is running, it can do whatever it wants (insmod kernel modules, print stuff to the console, run various tests, launch X11, download pr0n, you name it). Once you have the real root filesystem up, you can use pivot_root to make it the final filesystem. There are many wrappers to the pivot_root system call, a builtin wrapper can be found in the excellent super-utility busybox.
/sbin/busybox pivot_root . /old_root ; exec /sbin/busybox chroot . 2>&1 > /dev/console < /dev/console
It may seem like you need a dead chicken to make it work, but it's simpler than you think. Take a look at initrd.txt in the kernel's source Documentation directory, and you'll be fine after a while.
A few last words on cleanliness
Those of you who have been paying close attention might ask if pivot_root can be used more than once. The answer is yes, it can. This means that the special filesystem mechanisms of initrd are no longer needed - initrd is just a regular root filesystem which is loaded by the boot loader, then replaced by another (larger?) root filesystem instead, which may or may not be replaced again by another root filesystem. pivot_root is a well documented, clean interface for changing root directories, and other than pivot_root(2) itself, there are no peculiarities left in the initrd process (for instance, there's no need for separation between real-root-dev and real-nfs-server for non-NFS and NFS root directories - actually, there's no need for special /proc hacks at all). This can be used as an example for a shameless hack (a very useful one, but still, change_root is a hack) that was redesigned and made elegant (pivot_root). Next time you teach a newbie what's elegance, tell them about this.
To those of you who think pivot_root isn't elegant, show me something that does magic so deep and yet is so clean.