NachOS

Introduction

Nachos is an instructional, monolithic-kernel operating system. When I took an operating systems class, NACHOS was portrayed as an acronym for "Not Another Completely Heuristic Operating System". However, I don't see this documented at what is apparently the Nachos page.1 Perhaps this is for good reason because NachOS is not entirely an operating system; it is a user space program which simulates a simple MIPS machine and an operating system on top of that. Since the name is unclear, I'll write "Nachos" with minimal capitalization.

Nachos was initially developed at Berkeley by Wayne A. Christopher, Steven J. Procter, and Thomas E. Anderson from 1991 through the Spring of 1992.2 Up through version 3.4, Nachos restricted itself to a subset of C++. A 23 page paper is included with Nachos 3.4 and 4.0 which introduces this subset to those with moderate proficiency in C. Version 4.0 is written in a slightly broader subset, utilizing templates to reduce repeated code. Judging by comments in its source files, 4.0 was finished in 1996. Further, a Java version, 5.0j, has been developed at Berkeley by a Dan Hettena and Rick Cox.3 5.0j is a nearly total rewrite, with a similar structure to 4.0. It fixes many old bugs and adds an autograder under which everything runs. 5.0j was developed in 2001 judging from its license.

Nachos lacks many important components of an operating system including synchronization, system calls, support for multiple, simultaneous user programs, networking, a proper filesystem, and virtual memory. Instead, a framework for the introduction of these facilities exists. The job of a student learning about operating systems with Nachos is to actually code most of the kernel of this operating system. Many classes which teach Nachos 3.4 utilize the Nachos roadmap by Thomas Narten4,5 which is a guide to the underlying functions and behavior of Nachos with valuable hints. Since Nachos 4.0 is a large rewrite, Narten's roadmap does not accurately reflect its classes, and so 4.0's author recommends continued use of 3.4 for pedagogical purposes.

License

Not surprisingly, Nachos is licensed under a variant of the BSD license. It is produced below both to be informative and to be in accordance with its terms so that I may produce its code below without concerns for fair use. Nachos 4.0 doesn't extend the date of copyright even though it clearly was developed beyond 1993. Nachos 5.0j's license extends its copyright date through 2001 but is otherwise identical.

Copyright (c) 1992-1993 The Regents of the University of California. All rights reserved.

Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without written agreement is hereby granted, provided that the above copyright notice and the following two paragraphs appear in all copies of this software.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Personal Experience

I took an operating systems class wherein we utilized Nachos. We used Nachos 3.4 but my commentary attempts to cover all material relating to 4.0 and 5.0j as well. My view of Nachos is somewhat biased because those who worked on it in my class elected to do so in lieu of doing the (generally easier) projects most people did. Therefore, the material in class did not track with Nachos and much research had to be done to even understand the assignments. Case in point, the first thing we needed to implement were condition variables. None of our group of three had heard of these previously, and they weren't covered in class; that was one small part of the first project. From there, things got worse. See Issues below for elaboration.

Design

Multiple Threads, One at a Time

All versions of Nachos feature a multi-threaded kernel which is constricted so it only runs a single thread at once. In 3.4/4.0, this is done using purely user mode threads; switching is done with a small trick of assembly. In the Java version, native threads (from java.lang.Thread) are used, but Nachos insures that only one of its threads is run at once. Each thread stores its registers and process and simulated machine information before it switches out and restores these when it begins executing again. This setup makes debugging relatively easy with a few caveats. Most notably, in 3.4/4.0 one should never use until in gdb since there is a fair chance that a thread switch will occur before the current block returns. If this happens, gdb loses track of the block, and Nachos runs to completion.

Interrupt Queue

Hardware and software interrupts are all handled by an interrupt queue. The queue is necessary since Nachos is a simulation so interrupts can not be generate in real time. Interrupts are scheduled to occur at some time from the present. Whenever the simulation time is advanced, all necessary interrupt handlers are called before control is returned to the previously executing code. If no threads are ready, Nachos continually advances time to the time of the first interrupt in the queue and executes its handler.

This configuration has a small problem with realism. Simulated machine time is only advanced after executing a user program instruction and each time interrupts are enabled. Further, context switches only occur when time is advanced. Due to these restrictions, some incorrect code will never fail. It is possible to make unlocked access to shared data, and it will never fail so long as no synchronization is done within the code. Further, it's possible to make non-reentrant exception handlers without Nachos ever having a problem. A dilligent teacher will hopefully check for such shenanigans; mine didn't.

Dual Mode Operation

As with all modern day OSes, Nachos utilizes dual mode operation. Some code runs in kernel mode wherein all of the machine's assets can be directly accessed. Some code runs in user mode which can only see its personal portion of memory and can only access hardware by calling functions in the kernel through syscalls.

This division is greater than normal because user programs are actually interpreted by Nachos and presented with an emulated MIPS machine. Memory of the machine is a large array of bytes named mainMemory (Machine::mainMemory in C++; Processor.mainMemory in Java). When an exception or syscall occurs, it is handled by kernel code running natively.

To start executing a user program, the path to an executable file is passed with the '-x' command line option. The file is loaded into memory, and then a never-ending loop starts. This loop loads the next instruction, executes it, advances the simulated clock a small amount, and then repeats. This is done in Processor.run() in 5.0j and Machine::Run() in 3.4/4.0. Aside from the fact that a thread running a user program is always running in this simple loop, it is no different from any other kernel thread.

Self-Testing

In versions 3.4 and prior, there were no self test functions. This was a little unfortunate for my group. Luckily, Nachos 4.0 provides a SelfTest member function for all of its classes. Nachos 5.0j continues this tradition, but makes sure it is special by naming these functions selfTest. Such functions can be invaluable in regression testing, as any software engineer could tell you.

Deterministic

To ease debugging, Nachos always behaves the same if called with the same arguments. Different context switching behavior can be generated with the '-rs' flag. When Nachos is started with the '-rs' flag an interrupt is scheduled ever few ticks — chosen randomly between 1 and 200 each interrupt — to force a context switch. The random number generator is seeded with the number following the flag so that this variation of behavior is repeatable.

Without this flag, context switches occur only when executing user programs or when a thread explicitly yields. As such, race conditions may never be detected without the use of '-rs'. With this flag, whenever interrupts are enabled or disabled, which happens whenever a synchronization occurs, a switch may occur. This feature can uncover lurking synchronization problems when utilized in automated testing. We always used the following shell script to test for deadlocks and other weird behavior:

#!/bin/ksh

if [ "$1" -eq "" ]
then
 runTimes=10;
else
 runTimes=$1;
 shift;
 args=$*;
fi

echo "Running nachos $runTimes times.\n\n";

while [ $runTimes -gt 0 ]
do
  myRand=$RANDOM;
  echo "nachos -rs $myRand $args";
  runTimes=`expr $runTimes - 1`;
  nachos -rs $myRand $args;
done

Special Executable Format

Nachos 3.4/4.0 has its own, simplified, executable file format: Nachos Object File Format (NOFF). Obviously, compilers don't output this sort of file format. So one uses gcc to (cross)compile his user program to a MIPS COFF executable. Then the coff2noff program, a simple C program whose source comes with Nachos, converts the COFF file to NOFF. Unfortunately, NOFF is really limited. It contains exactly three sections named .code, .initData, and .uninitData, and its entry point is always at address 0. Therefore, a COFF file with any sections beyond .(r)data, .(s)bss, and .text can not be converted to NOFF.

Nachos 5.0j avoids NOFF entirely. It directly uses the MIPS COFF executables.

User Program Debugger

Nachos 3.4/4.0 provide a built-in debugger for debugging user programs. It can single step through a program, displaying the MIPS assembly of the next instruction and current state of all registers. It can also skip forward till a given machine time. Unfortunately, this debugger is completely ignorant of symbols and any high level programming concepts. Further, it does not support breakpoints. My group found it mildly useful for solving a problem where we corrupted a register during syscalls. Otherwise, it is not terribly useful, especially because user programs themselves are severely limited. See below under Issues:Toy Soldiers.

Nachos 5.0j did not see fit to include a user program debugger at all despite its greater ability to handle complex user programs.

Projects

Nachos is structured to allow a series of largely independent projects which can be completed in somewhat arbitrary order, once the first is completed. However, the actual recommended projects vary a little with each version of Nachos. Further, Nachos is open-ended enough that professors may design projects of almost any type. Therefore, consider the below to be a sampling of possible projects.

Nachos 3.4
  1. Threads (Synchronization)

    This project is necessary for all future projects since they all require synchronization to some degree.

    First, implement locks, semaphores, and condition variables. Second, use these mechanisms to solve concurrency problems possibly including the consumer producer problem and the elevator problem.

  2. Filesystem

    A stub filesystem is included with Nachos which maps file functions to the underlying OS file functions. It does not provide syscalls dealing with directories, but this is more of an issue with the next project. This project is not necessary to do any of the others.

    One is provided with a simple filesystem which utilizes a single file on the host system as its "disk". First, make access to the disk threadsafe. Next allow for larger files, allow for files which can grow after being created, improve disk performance, and add a directory structure.

  3. Multiprogramming (User Programs)

    Nachos can already load a single program and run it. First, add the ability to load multiple user programs at once and implement handlers for all exceptions and most defined syscalls. Add a process scheduling policy. Secondly, write useful user programs. Thirdly, extend the exec syscall to allow passing of command line arguments; add the fork and yield syscalls.

  4. Virtual Memory

    This assignment requires Multiprogramming to be completed first.

    First, manage the TLB appropriately and add virtual memory. Second, evaluate performance of your system and write some user programs which test it.

  5. Networking

    This assignment requires Multiprogramming to be completed first.

    A simple 'network' exists which allows multiple instances of Nachos running on one machine to pass messages between each other and a flag exists which lets one specify what percent of packets should be dropped. First, implement reliable message passing with no size limitations over the unreliable network. Secondly, implement a distributed application over this network.

Nachos 4.0

Nachos 4.0 notably drops the file system project from its recommended list, even though support for the project remains. On the other hand, the authors of Nachos 5.0j did not port the filesystem project. The projects are largely similar to those for Nachos 3.4 and I will reference those in the below descriptions.

  1. Threads (Synchronization)
  2. First, re-implement condition variables and use them to solve the consumer producer problem. Secondly, implement Thread::Join(), an alarm clock class, and a preemptive priority scheduling. Finally, use the finished pieces to solve the elevator problem.

  3. Multiprogramming (User Programs)

    Identical to the assignment in 3.4, but requires using either multilevel feedback or lottery scheduling.

  4. Virtual Memory

    First, do same as in 3.4. Secondly, implement memory mapped files like mmap supplies. Thirdly, implement a sbrk-like system call which allows increasing size of the heap and mmaped files.

  5. Distributed Systems

    First, do same as in 3.4's Networking project. Secondly, implement an n-way chat application. Thirdly, implement distributed shared memory.

Nachos 5.0j

Nachos 5.0j inherits all of the same basic projects from Nachos 4.0, but naturally has variation in the specific structure.

  1. Threads (Synchronization)

    This project is nearly identical to 4.0's version save the final portion and slight order differences. Instead of an elevator problem at its end, a problem involving Hawaiins and boats is to be solved.

  2. Multiprogramming (User Programs)

    Again, this project is the same as 4.0's version save two differences: User programs need not be written since a variety are already available, and multilevel feedback scheduling is not an option.

  3. Virtual Memory

    First, do as in 3.4. Secondly, implement lazy loading of user programs.

  4. Distributed Systems

    Identical to 4.0 in assignment, though the methods are different.

Issues

When working with Nachos my group repeatedly found flaws which made it much more difficult to work with than it should have been. Some might argue it is more educational to retain incidental flaws for the educational value of solving them. You may review the problems I list below and determine if this is accurate in all cases. Regardless, review of these flaws should be instructional to anyone working an operating system or machine simulator, including Nachos.

Toy Soldiers

See also: "Issues: Strings? What're those?" below which was resolved before this issue was discovered.

The greatest annoyance I had with Nachos wasn't even a design flaw or unfixable. My problem was that my group cumulatively spent hundreds of hours working to get it to resemble a somewhat realistic operating system, and, after all of that effort, it couldn't even handle a trivial clone of expr that I wrote. The problem? We had yet to implement virtual memory, so the largest program Nachos would accept was 3k! You're not misreading, that's 3072 bytes.

After spending days of blood, sweat, and tears, we had developed a massive, futuristic arena complete with chainsaws, rabid monkeys, and a sliding window network protocol for our wargames, and we were left to play with toy soldiers. This happened because, by default, Nachos 3.4 and 4.0 provide a grand total of 32 pages of memory, each 128 bytes in size. Further, 4 of these pages are used for each program's stack, and one is used for the stub that starts the executable and maps functions to syscalls. Luckily, Nachos 5j greatly expands memory available for projects prior to the implementation of virtual memory. The hard statistics are below:

  Nachos user program size limitations
             3.4/4.0   5j
 ..........................
 Page size::  128B    1024B
     Pages::    32    varies for each project (2:64, 3:16, 4:16)
StackPages:: 4 8 TLB Size:: 4 4

Memory restriction problems can be fixed in 3.4/4.0 by changing the #define for NumPhysPages found in machine.h. Unfortunately, this necessitates recompiling nearly all of Nachos. It would be wonderful if this were a global const int, so that memory size could be adjusted without going through a full build cycle. Nachos 5.0j simply makes it a variable read from a configuring file, an even more refreshing solution.

Also neglected for Nachos 3.4 and 4.0 is a standard library. Simple string functions such as strlen(), strcat(), atoi() are strangely absent. Although it is a small pain in the side to write these, it is survivable. In fact, it can be seen as a learning experience. But implementing one's own standard library without dynamic link libraries only exacerbates problems with limited memory. Nachos 5.0j saw fit to provide a relatively complete standard library including an implementation of printf().

Of course, after fixing all of this, my expr still wouldn't actually add because Nachos 3.4/4.0 doesn't completely emulate the LWR instruction. Yet again, Nachos 5.0j fixes this.

Done Already?

A sample run of Nachos may produce the following output:

HHHHHeeeeellllllllllooooo     WWWWWooooorrrrrlllllddddd!!!!!
No threads ready or runnable, and no pending interrupts.
Assuming the program completed.
Machine halting!

Ticks: total 8330, idle 0, system 8330, user 0
Disk I/O: reads 0, writes 0
Console I/O: reads 0, writes 0
Paging: faults 0
Network I/O: packets received 0, sent 0

Cleaning up...

Note how it detects that it is finished and stops, printing valuable statistics -- important statistics you often need to know when debugging. Unfortunately, Nachos sometimes predicts the end of execution prematurely. Nachos determines it is finished when a thread sleeps, there is zero or one pending interrupt, and all other threads are sleeping. This decision is made in Interrupt::CheckIfDue() found in code/machine/interrupt.cc called from Interrupt::Idle() called from Thread::Sleep().

The reason for "zero or one" interrupts being the end is because of a "clock" interrupt that is often present. See Design:Deterministic above for an explanation.

Unfortunately, the "zero or one" behavior causes problems if one runs Nachos without the -rs switch. In our case, we were solving a problem involving an elevator and passengers. The elevator would be called at a time in the future via a scheduled interrupt...er, passenger. Unfortunately, the elevator would never pick up the last passenger. The elevator process would sleep, waiting to be awoken by an interrupt, the next-event interrupt handler would see only one remaining interrupt and our kernel would terminate. Whoops.

Ultimately, this problem can be fixed, but two things need to be done. First, hardware and software interrupts must be given separate queues. Secondly, a blocked state for threads must be added which threads will enter when waiting on an interrupt. The end of execution would then occur when all threads are sleeping and the software interrupt queue is empty. That solution is overkill for this one case — we solved it by always running with the -rs switch. But it came back to bite us.

As with most issues, this problem did not carry over to the Java version of Nachos. In 5.0j, the machine is always explicitly terminated when it needs to be finished.

Never Ending Execution

Normally Nachos prints out useful statistics when it finishes execution. Unfortunately, it only prints these statistics on a normal exit. If one presses Ctrl-C to terminate Nachos, it will simply print:

Cleaning up...

Then it exits. This is an issue because of the way in which Nachos detects that it has finished execution. Simulation end is detected when there is zero or one pending interrupt and no ready threads. But each simulated piece of hardware adds an interrupt. For the first project, there is just the hardware timer, so this works fine. Once a console, network, or filesystem is added, a new issue arises. Nachos never detects that it should stop. Therefore, one has to explicitly call Machine::Halt() to end the simulation. This, combined with the "Done Already?" issue, makes Nachos' detection of its finish mostly useless.

Since Nachos 5.0j doesn't try to detect the end of simulation, this is not an issue with it at all.

Have the Time?

A small amount of statistics are kept about Nachos' operation to aid in debugging purposes and performance comparisons. One of these statistics is totalTicks in the Stats class in Java and in the Statistics class in C++. This statistic is also used as the current machine time. In Nachos 3.4/4.0 it a signed int while it is a Long in 5.0j. On a 32-bit machine, this allows the value to hold values around 2.15 million. However, if a signed number is near its maximum value and something is added to it, it will overflow and become a very large negative number.

All versions of Nachos eventually lock up in programs which accept user input given enough time. The problem is that the interrupt simulator continually advances to the next interrupt. When user input is being waited for and all threads are idle, time continually advances to the next interrupt which is often the interrupt which checks for console input. Since interrupts are scheduled to occur at the time totalTicks + fromNow, once totalTicks is advanced far enough, interrupts will inevitably be scheduled in the past. Once this happens, Interrupt::Idle() in C++ or Interrupt.checkIfDue() in Java will enter an infinite loop.

The recurring console read interrupt will be run because its time to run is less than the current time. The interrupt handler will then schedule the interrupt again, to occur at some time from the current time. Due to overflow, that scheduled time will also be negative. Therefore, that interrupt handler will then be run as well, rescheduling itself at the exact same time. This will continue ad infinitum and Nachos will have to be forcefully terminated.

The situation can be improved by using unsigned long longs. As it stands, it takes approximately two minutes on a reasonably fast Pentium 4 for Nachos 3.4 to lock up in this fashion; with an unsigned long long, this time would extend to about 32,000 years. This is likely a complete enough solution, but an ultimate solution to this problem would be to make all of the tick values be BigNum objects which dynamically resize themselves to prevent overflow.

Strings? What're those?

Nachos 3.4 and 4.0 utilize NOFF executable files, described above in Design:Special Executable Format. To make it easier to convert COFF to NOFF, when user programs are linked, a script is passed which tells ld to place nearly everything into the .data and .text sections. Unfortunately, some things are missed by this script. For instance, constant strings and jump tables for case statements. gcc places these symbols in a section called .rodata which stands for read only data. The result is that the following program can not be used in stock Nachos:

#include "syscall.h"

int main()
{
   Write("Hello World!\n", 13, ConsoleOutput);
   Exit(0);
}

The solution to this problem is to add a single line in the script file, reproduced below in unified diff format.

   .data  . : {
     *(.data)
+    *(.rodata)
     CONSTRUCTORS
   }

Because Nachos 5.0j uses COFF files directly, it does not have this issue.

Sorting?

Nachos provides a sample user program to use for testing. It sorts an array of 1024 integers using a bubble sort. This is designed to test the effectiveness of one's virtual memory system. Aside from the fact that it runs incredibly slowly, it also is incorrect in versions 3.4 and 4.0 of Nachos. The code for main() is below:

int A[1024];    /* size of physical memory; with code, we'll run out of space!*/

int
main()
{
    int i, j, tmp;

    /* first initialize the array, in reverse sorted order */
    for (i = 0; i < 1024; i++)		
        A[i] = 1024 - i;

    /* then sort! */
    for (i = 0; i < 1023; i++)
        for (j = i; j < (1023 - i); j++)
	   if (A[j] > A[j + 1]) {	/* out of order -> need to swap ! */
	      tmp = A[j];
	      A[j] = A[j + 1];
	      A[j + 1] = tmp;
    	   }
    Exit(A[0]);		/* and then we're done -- should be 0! */
}

The observant reader will note there are several errors with this code. One, the array is filled with values from 1024 to 1, so if it properly sorted, the program would return 1, not 0. Two, it oversteps the bounds of the array A by comparing A[1023] with A[1024] the first time through the first for loop. Three, it doesn't sort properly because it shortens its sorting by two elements instead of one each pass through the first for loop. The line:

        for (j = i; j < (1023 - i); j++)
should read:
        for (j = 0; j < (1022 - i); j++)

Nachos 5.0j not only solves these problems in its provided sort.c, but it also verifies whether the sort was correct. Then it returns either 0 to indicate success or 1 to indicate failure.

End of File

While debugging an interactive program, I frequently find it convenient to use a file for input rather than typing. This helps insure that tests remain consistent and saves me from much odious typing. A small side effect of doing this is that when the end of the input file is reached, an EOF is sent to the running program. One can manually send an EOF by pressing Ctrl-D, but this is normally not useful.

Unfortunately, either way of sending an EOF will cause Nachos 3.4 and 4.0 to die with an assertion failure. The problem is that select() is used to check whether input is available to be read. When select() returns a positive value, it indicates that an attempt to read input will not block. This means that there is either input available or the pipe is closed — an end of file. Nachos does not account for the possibility that an EOF was received; it attempts to read a single byte from input, finds zero bytes were read, and causes an assertion failure.

A solution to this problem for Nachos 3.4 would be to alter Console::CheckCharAvail() in console.cc as follows:

    // do nothing if character is already buffered, or none to be read
    if ((incoming != EOF) || !PollFile(readFileNo))
        return;   

    // otherwise, read character and tell user about it
-    Read(readFileNo, &c, sizeof(char));
+    int rBytes = ReadPartial(readFileNo, &c, sizeof(char));
+    // On EOF, return.
+    if (rBytes == 0)
+       return;

    incoming = c ;

Not having worked with Nachos 4.0, I have not developed a solution for this problem for that version. Since its code is very similar to 3.4, it should require a similar fix in ConsoleInput::CallBack() in console.cc.

As with most issues, Nachos 5.0j gracefully handles this one. EOFs are silently discarded already.

References

  1. Dr. Thomas E. Anderson's Nachos Page
    http://www.cs.washington.edu/homes/tom/nachos/
    previously located at: http://www.cs.berkeley.edu/~tea/nachos/
  2. Early Overview Paper on Nachos
    http://www.cs.washington.edu/homes/tom/nachos/nachos.ps
  3. "A Guide to Nachos 5.0j"
    Previously at http://www-inst.eecs.berkeley.edu/~cs162/projects/walk/
    http://www-inst.eecs.berkeley.edu/~cs162/Nachos/walk/walk.html
    As of 2005 November 17
  4. Thomas Narten's Dated Homepage at Duke University
    http://www.cs.duke.edu/~narten/
  5. "A Road Map Through Nachos", Narten, Thomas
    http://www.cs.duke.edu/~narten/110/nachos/main/main.html