disk scheduling - Everything2.com

Disk scheduling is the problem of deciding which particular request for data by your computer from your hard drive (or other storage medium) should be serviced first. In other words, when several programs are trying to pull data from your hard drive at once, which one gets the data first? It is a fundamental problem in operating system scheduling in terms of minimizing the wait for the user when he or she simply wants to do some things that require information from the hard drive.

To understand the discussion to follow, the basics of how information is stored on a hard drive must be understood. It's easiest to think of a hard drive as a bunch of circles, one inside the other; each circle is called a cylinder or a track (depending on context that we don't need to worry about right now). Each of these circles is actually a line of 1's and 0's; in other words, binary information. For this example, we will refer to each cylinder by a number, indicating how far it is from the center of the circle; you can think of this as perhaps the length of the radius of any cylinder. The reason that solving this problem is important is that it takes a certain amount of time for the head of the hard drive to move from cylinder to cylinder, so significant time is saved by figuring out ways to minimize the movement of the head.

There are several approaches for solving this problem. The simplest and most naive solution is the first-come first-served algorithm. With this scheme, requests are processed in the order that they arrive. This is very easily implemented with a FIFO queue; when tasks come in, they are put at the end of the queue, and when the hard drive is ready to deliver more information, it pulls the request from the start of the queue.

The average waiting time for this technique, though, is often very long, and that is the big drawback. Let's take, for example, four requests that arrive in this order:

Request   Cylinder
-------   --------
  R-1       100
  R-2        20
  R-3       112
  R-4        18

Now, if we serve these requests up in order, we get these wait times.

Moving from R-1 to R-2 requires movement over 80 cylinders.
Moving from R-2 to R-3 requires movement over 92 cylinders.
Moving from R-3 to R-4 requires movement over 94 cylinders.

Thus, the average distance travelled is (80+92+94)/3, or 89 cylinders. Now, if we switched R-2 and R-3 in the order, then we could have a different movement arrangement.

Moving from R-1 to R-3 requires movement over 12 cylinders.
Moving from R-3 to R-2 requires movement over 92 cylinders.
Moving from R-2 to R-4 requires movement over 2 cylinders.

With just one simple change, the average distance travelled by the head is (12+92+2)/3, or 35 cylinders, a savings of 54 cylinders of movement on average. Clearly, this is not the best scheme to use.

The next idea that comes to mind is the shortest-seek-time-first algorithm, where the queue is sorted based on which has the shortest seek time in comparison to the request before it. In this case, we would sort the requests in the example above in this order (assuming that we started with R-1):

Request   Cylinder
-------   --------
  R-1       100
  R-3       112
  R-2        20
  R-4        18

This results in the average movement of 35 cylinders per request figure we calculated earlier, a significant improvement over the original. However, with this algorithm, we run into the problem of one request being ignored because every other request is nearer to the drive head. For example, if virtually all requests were between cylinders 16 and 24, but one request came in that was at cylinder 100, the latter request would constantly be pre-empted and would have to wait a long time to be serviced, if ever.

This algorithm also isn't optimal. Say, for instance, that we swapped R-1 and R-2; this would increase our savings by roughly 4 cylinders per request on average. The big advantage of this scheme is that it is quite quick and is also easy to implement.

Another algorithm is the SCAN algorithm, where the head starts at one end of the disk and moves to the other, going back and forth. If that were the case, the requests above could be sorted from least to most and be accessed in that fashion, meaning that the specific example above would happen quite quickly.

But what happens if, say, as cylinder 20 is being accessed, a request for cylinder 18 pops up? The drive head has to travel all the way to the edge of the disk (which might be 200 cylinders wide) and back again to 18, meaning that request will come slowly. If you visualize an even distribution of requests, it will become clear that when a drive head is at one edge of the disk, the majority of the requests are near the other edge of the disc. So why not go there first?

The C-SCAN (or circular SCAN) algorithm does just that. When the edge of the disc is reached, it returns to the opposite edge without dealing with any requests, and then starts again from there. This provides a slight speedup over the SCAN algorithm, and is thus preferable to it.

Another issue is worth looking at here: why go to the edge of the cylinders if there is no data near the edges? The LOOK and C-LOOK algorithms act the same as SCAN and C-SCAN, except they stop if there are no requests waiting that need cylinders between the just-processed request and the edge of the disc. If most of the data is in the middle, the LOOK algorithms provide another notable speedup.

Which algorithm is best? It depends mostly on the data use on your hard drive. Most operating systems select one of these depending on the shape that your hard drive is in in terms of data distribution. Together, however, these algorithms provide many robust ways for serving the data request needs of your operating system.

Low-level formatting an IDE hard drive	Trace Cache	CPU scheduling	process scheduling
Hard disk vibrations and how you can stop them	GUI Performance	block scheduling	FIFO
cylinder	queue	hard drive	radius
circle	circular	distribution	binary
algorithm	Operating system