Overview

In the UNIX operating system, process scheduling is done through a technique called round-robin with multi-level feedback [Bach86]. At every time unit, the process having highest priority is picked to run. After the completion of its alloted time cycle, it is context-switched out (with an adjusted priority value), and the cycle repeats again. Processes are maintained in a dynamic priority queue. Therefore, a basic scheduler might implement the following algorithm:

  while (no process is chosen to execute)
  {
      pick highest priority process from queue;
      if ( no such process exists)
	 idle until interrupt is received;
  }
  remove chosen process from queue;
  re-insert currently running process into priority queue with
      adjusted priority.
  context switch to chosen process;

In addition, the scheduler would also execute the following algorithm once every few clock ticks:

  for each process in the queue
  {
      new priority <- f(old priority, recent CPU usage)
  }

This is used to implement a notion called aging, where the priority of an older process gradually increases with time, to increase its chances of being selected for execution.

Note: This model is a simplified version of the actual scheduler implementation in UNIX, but it captures some essential details.

The goal of the programming project is to implement and document a a simple UNIX scheduler that is built over a dynamic priority queue data structure. What we will provide you with is the following:

A sample implementation of a priority queue using sorted, doubly-linked circular lists.
A template for a scheduler driver as described above.
Sample test inputs to allow you to check your code.
A library of routines to manage the comparison and update of priorities.
Various other scripts and utilities to assist in code development.

For more details, see the Code Details section.

What you will hand in is the following:

Routines for manipulating a priority queue
A simple scheduler driver that implements the schematic described earlier.
Documentation for the above.

We will run your scheduler on data similar to the test data that will be available to you. The course staff will evaluate your work on the basis of several factors; the primary ones will be:

correctness,
speed, as measured relative to our implementation,
code readability and in-line documentation (embedded comments),
write-up quality.

You may work on your project alone or in a group of up to three partners; only one submission per group is required. You must submit your work by 9:30 am on Tuesday, 4 June, 2002. No late submissions will be accepted under any conditions. A competitive implementation of a known data structure should take a few evenings of work.

If you have any questions, please contact the TAs or the instructor. To help us help you efficiently, make sure that:

you first contact Distributed Computing Consulting for help with maintaining your leland account or using computing resources available at Sweet Hall, and
you stay tuned to the announcements and project FAQ WWW pages, where corrections, clarifications, and answers to common questions will be posted. Important announcements will also be mailed to the class.

Data Structures

Many data structures can support the priority queue operations defined in Section 7.5 of CLR or Section 6.5 of CLRS, or even the extended set of mergeable heap operations listed in the beggining of Chapter 20 in CLR, Chapter 19 in CLRS. CLR(S) describes several implementations of priority queues in varying levels of detail. Pseudocode is provided for some, while for others only a brief description is given. Many other ideas for priority queues and variants have appeared in the literature. Here is a list of some relevant data structures:

Doubly linked lists. CLR, Section 11.2, CLRS, Section 10.2.
Heaps, CLR, Chapter 7, CLRS, Chapter 6.
Binomial Queues
- Brown, M. Implementation and analysis of binomial queue algorithms, SIAM Jour. Computing 7,3 (Aug. 1978), 298-319.
- Vuillemin, J. A data structure for manipulating priority queues, Comm. ACM 21,4 (April 1978), 309-315.
- CLR, Chapter 20, CLRS, Chapter 19.
Leftist Trees
- Crane, C. A. Linear lists and priority queues as balanced binary trees, Technical Report STAN-CS-72-259, Computer Science Dept., Stanford U., Stanford, CA, 1972.
- Knuth, D. The Art of Computer Programming, Vol. 3: Sorting and Searching, Addison-Wesley, Reading, MA, 1973.
- Tarjan, R. Data Structures and Network Algorithms, Series: CBMD-NSF regional conference in applied mathematics, vol. 44, Philadelphia, 1983.
Pagodas
- Francon, J., Viennot, G., and Vuillemin, J. Description and analysis of an efficient priority queue representation, Proc. 19th Annual Symp. on Foundations of Computer Science, IEEE, Piscataway, NJ, 1978, 1-7.
Fibonacci Heaps
- Fredman, M. L., and Tarjan, R. E. Fibonacci heaps and their uses in improved network optimization algorithms, Jour. ACM 34,3 (July 1987), 596-615.
- CLR, Chapter 21, CLRS, Chapter 20.

You may use any of the above data structures, use data structures described elsewhere, or invent your own, possibly by combining multiple familiar ones. The Union (or merge) operation will not be used in any of our test data, and need not be implemented. Throughout the sequel we will use the term heap to refer to your priority queue structure, no matter how you have chosen to really implement it.

We have provided a sample implementation of the scheduler that implements the priority queue as a sorted, doubly linked, circular list, which you can use to compare your data structure against.

Structure of the problem

Our heaps are dynamic (i.e. of varying size) collections of elements, each comprising two fields:

A unique element identifier (id), stored as an unsigned int
a field that we will refer to as the priority of the element. All structures to be considered should provide immediate access to the present element of highest priority, so the top of the heap always contains the element having the maximum priority value.

In an implementation, elements are stored in nodes. A node may comprise other, implementation-specific fields in addition to a priority and an id. Also, a node may contain the special field SENTINEL if it does not represent an element, in which case the id field can be used to define alternate varieties of sentinel nodes.

Your heaps must support the operations listed below; this is the list of page 400 of CLR, minus the Union (merge) operation. All operations, unless stated otherwise, may assume that

their element argument E is a non-zero pointer to a non-sentinel node whose associated element is in the heap, and
their priority argument K is not the special field SENTINEL.

Operation Description

Insert(K,I) Inserts the element with priority K and id I, into the heap. If an item with the same priority is already in the heap, the one with the higher id value is treated as if it had the higher priority.

Delete(E) Removes *E (i.e. the element represented by the node E points to) from the heap.

Increase-Key(E,K) Resets the priority of the node pointed to by E in the heap to the new value K, which is assumed to be greater than the current priority of that node.

Maximum() Returns the element (i.e. a pointer to the node) with the highest priority in the heap, or NULL if the set is empty.

Extract_Max() Deletes from the heap the element containing the highest priority, and returns a pointer to that node.

Operation	Description
`Insert(K,I)`	Inserts the element with priority `K` and id `I`, into the heap. If an item with the same priority is already in the heap, the one with the higher id value is treated as if it had the higher priority.
`Delete(E)`	Removes `*E` (i.e. the element represented by the node `E` points to) from the heap.
`Increase-Key(E,K)`	Resets the priority of the node pointed to by `E` in the heap to the new value `K`, which is assumed to be greater than the current priority of that node.
`Maximum()`	Returns the element (i.e. a pointer to the node) with the highest priority in the heap, or `NULL` if the set is empty.
`Extract_Max()`	Deletes from the heap the element containing the highest priority, and returns a pointer to that node.

Optimizing performance

A major factor in deciding which data structure will perform best is the expected mixture of operations that will be performed on it. A fairly realistic assumption to make is that process arrival times are governed by a Poisson distribution of mean R, and process durations are drawn from an exponential distribution of mean aR. We will supply you with sample data sets drawn from such (a,R) distributions.

We will compile and run your code on more elaborate tests than the provided ones. These tests will have the same distributions of operations as described, but there is no guarantee that they will occur in the same order; hence caching schemes you may implement are not guaranteed to exhibit the same behavior. Also, we will test you code multiple times on the same inputs, allowing randomized algorithms to do well.

Almost any data structure listed above, with the exception of our lists, will make a competitive submission. Cleverness in coding, however, can make a difference: you may find it helpful to browse the short book by Jon L. Bentley called Writing Efficient Code, available on reserve in the Math library; an extensive summary is available on the WWW. Inline assembly, although allowed, is probably a waste of time since

modern compilers produce much better machine code that humans,
trying to maintain and debug inline assembly is a pain, and
your choice of data structure and algoritm, as well as the quality of your writeup, matter much much much more than low level coding.

Still, providing reasonable hints to the compiler, e.g. via clever loop reordering or using the register qualifier (see Bentley's book), is a good compromise between the competing attitudes "the compiler knows it all" and "the compiler knows scrap".

Your code will be evaluated using the provided 161make script, which compiles your code using our Makefile. No other optimization flags - besides those in our Makefile - will be used for compiling and linking your code during grading.

Turning in your work

Your project has to be submitted by 9:30 am, Tuesday, 4 June, 2002. Here are detailed submission instructions, including a description of the write-up we require.

Platforms

Your code must be written in C. The speed will be judged on a Sun Sparc computer, so your code should run on such a machine, although you may wish to program and debug it on another. If you choose to use another platform for development, make sure that you optimize its performance on a Sun Sparc; also, before submitting your work, make sure your code compiles and executes - producing correct results - on one of the Sun Sparcs in Sweet Hall, such as elaine1 through elaine43. For further information on these machines, see Distributed Computing Consulting.

If you don't yet have an account on these machines, consult the Distributed Computing Group Accounts Information WWW page for information on how to obtain one.

If you are not familiar with software development on a UNIX environment, don't fret. Our provided sample code and development environment should ease your task.

References

[Bach86 ]
M. J. Bach.
The Design of the Unix Operating System.
Prentice Hall, 1986.

Happy Hacking!

last updated by Guibas on May 6, 2002