CS
161 Programming Project Handout
Spring
2002
Overview
In the UNIX operating system, process scheduling is done through a
technique called round-robin with multi-level feedback
[Bach86]. At every time unit, the process having highest priority is
picked to run. After the completion of its alloted time cycle, it is
context-switched out (with an adjusted priority value), and the cycle
repeats again. Processes are maintained in a dynamic priority queue.
Therefore, a basic scheduler might implement the following algorithm:
while (no process is chosen to execute)
{
pick highest priority process from queue;
if ( no such process exists)
idle until interrupt is received;
}
remove chosen process from queue;
re-insert currently running process into priority queue with
adjusted priority.
context switch to chosen process;
In addition, the scheduler would also execute the following algorithm once
every few clock ticks:
for each process in the queue
{
new priority <- f(old priority, recent CPU usage)
}
This is used to implement a notion called aging, where the
priority of an older process gradually increases with time, to
increase its chances of being selected for execution.
Note: This model is a simplified version of the actual scheduler
implementation in UNIX, but it captures some essential details.
The goal of the programming project is to implement and document a
a simple UNIX scheduler that is built over a dynamic priority queue data
structure. What we will provide you with is the following:
- A sample implementation of a priority queue using sorted,
doubly-linked circular lists.
- A template for a scheduler driver as described above.
- Sample test inputs to allow you to check your code.
- A library of routines to manage the comparison and update of
priorities.
- Various other scripts and utilities to assist in code development.
For more details, see the Code Details section.
What you will hand in is the following:
- Routines for manipulating a priority queue
- A simple scheduler driver that implements the schematic described
earlier.
- Documentation for the above.
We will run your scheduler on data similar to the test data that will be available
to you. The course staff will evaluate your work on the basis of several factors;
the primary ones will be:
- correctness,
- speed, as measured relative to our implementation,
- code readability and in-line documentation (embedded comments),
- write-up quality.
You may work on your project alone or in a group of up to three
partners; only one submission per group is required. You must submit your work
by 9:30 am on Tuesday, 4 June, 2002. No late submissions will be accepted under
any conditions. A competitive implementation of a known data structure
should take a few evenings of work.
If you have any questions, please contact the TAs or the
instructor. To help us help you efficiently, make sure that:
- you first contact Distributed Computing
Consulting for help with maintaining your leland account or using computing
resources available at Sweet Hall, and
- you stay tuned to the announcements and project FAQ WWW pages, where corrections, clarifications,
and answers to common questions will be posted. Important announcements will
also be mailed to the class.
Data Structures
Many data structures can support the priority queue operations defined
in Section 7.5 of CLR or Section 6.5 of CLRS, or even the extended set of mergeable
heap operations listed in the beggining of Chapter 20 in CLR, Chapter 19
in CLRS. CLR(S) describes several implementations of priority queues in varying
levels of detail. Pseudocode is provided for some, while for others only a brief
description is given. Many other ideas for priority queues and variants have appeared
in the literature. Here is a list of some relevant data structures:
- Doubly linked lists. CLR, Section 11.2, CLRS, Section 10.2.
- Heaps, CLR, Chapter 7, CLRS, Chapter 6.
- Binomial Queues
- Brown, M. Implementation and analysis of binomial queue algorithms, SIAM
Jour. Computing 7,3 (Aug. 1978), 298-319.
-
Vuillemin, J. A data structure for manipulating priority queues,
Comm. ACM 21,4 (April 1978), 309-315.
- CLR, Chapter 20, CLRS, Chapter 19.
- Leftist Trees
-
Crane, C. A.
Linear lists and priority queues as balanced binary trees,
Technical Report STAN-CS-72-259, Computer Science Dept., Stanford
U., Stanford, CA, 1972.
-
Knuth, D.
The Art of Computer Programming, Vol. 3: Sorting and Searching,
Addison-Wesley, Reading, MA, 1973.
- Tarjan, R. Data Structures and Network Algorithms, Series: CBMD-NSF
regional conference in applied mathematics, vol. 44, Philadelphia, 1983.
- Pagodas
- Francon, J., Viennot, G., and Vuillemin, J. Description and analysis of
an efficient priority queue representation, Proc. 19th Annual Symp.
on Foundations of Computer Science, IEEE, Piscataway, NJ, 1978, 1-7.
- Fibonacci Heaps
-
Fredman, M. L., and Tarjan, R. E.
Fibonacci heaps and their uses in improved network
optimization algorithms, Jour. ACM 34,3 (July 1987), 596-615.
- CLR, Chapter 21, CLRS, Chapter 20.
You may use any of the above data structures, use data structures
described elsewhere, or invent your own, possibly by combining
multiple familiar ones. The Union (or merge) operation will
not be used in any of our test data, and need not be
implemented. Throughout the sequel we will use the term heap
to refer to your priority queue structure, no matter how you have
chosen to really implement it.
We have provided a sample implementation of the scheduler that
implements the priority queue as a sorted, doubly linked, circular
list, which you can use to compare your data structure against.
Structure of the problem
Our heaps are dynamic (i.e. of varying size) collections of
elements, each comprising two fields:
- A unique element identifier (id), stored as an unsigned
int
- a field that we will refer to as the priority of the
element. All structures to be considered should provide immediate
access to the present element of highest priority, so the top of the
heap always contains the element having the maximum priority
value.
In an implementation, elements are stored in nodes. A node
may comprise other, implementation-specific fields in addition to a
priority and an id. Also, a node may contain the special field
SENTINEL if it does not represent an element, in which case
the id field can be used to define alternate varieties of
sentinel nodes.
Your heaps must support the operations listed below; this is the list
of page 400 of CLR, minus the Union (merge) operation. All
operations, unless stated otherwise, may assume that
- their element argument E is a non-zero pointer to a
non-sentinel node whose associated element is in the heap, and
- their priority argument K is not the special field
SENTINEL.
Operation
|
Description
|
Insert(K,I)
|
Inserts the element with priority K and id I,
into the heap. If an item with the same priority is already in the
heap, the one with the higher id value is treated as if it had the
higher priority.
|
Delete(E)
|
Removes *E (i.e. the element represented by the node
E points to) from the heap.
|
Increase-Key(E,K)
|
Resets the priority of the node pointed to by E in the heap
to the new value K, which is assumed to be greater than the
current priority of that node.
|
Maximum()
|
Returns the element (i.e. a pointer to the node) with the highest priority
in the heap, or NULL if the set is empty.
|
Extract_Max()
|
Deletes from the heap the element containing the highest priority, and
returns a pointer to that node.
|
Optimizing performance
A major factor in deciding which data structure will perform best
is the expected mixture of operations that will be performed on
it. A fairly realistic assumption to make is that process arrival
times are governed by a Poisson distribution of mean R, and
process durations are drawn from an exponential distribution of mean
aR. We will supply you with sample data sets drawn from such (a,R)
distributions.
We will compile and run your code on more elaborate tests than the
provided ones. These tests will have the same distributions of
operations as described, but there is no guarantee that they
will occur in the same order; hence caching schemes you may implement
are not guaranteed to exhibit the same behavior. Also, we will test
you code multiple times on the same inputs, allowing randomized
algorithms to do well.
Almost any data structure listed above, with the exception of our
lists, will make a competitive submission. Cleverness in coding,
however, can make a difference: you may find it helpful to browse the
short book by Jon L. Bentley called Writing Efficient
Code, available on reserve in the Math library; an extensive
summary is available on the WWW. Inline assembly, although
allowed, is probably a waste of time since
- modern compilers produce much better machine code that humans,
- trying to maintain and debug inline assembly is a pain, and
- your choice of data structure and algoritm, as well as the
quality of your writeup, matter much much much more than low level
coding.
Still, providing reasonable hints to the compiler, e.g. via clever
loop reordering or using the register qualifier (see
Bentley's book), is a good compromise between the competing attitudes
"the compiler knows it all" and "the compiler knows scrap".
Your code will be evaluated using the provided 161make script,
which compiles your code using our Makefile. No other
optimization flags - besides those in our Makefile - will
be used for compiling and linking your code during grading.
Turning in your work
Your project has to be submitted by 9:30 am, Tuesday, 4 June, 2002. Here are detailed
submission instructions, including a description
of the write-up we require.
Platforms
Your code must be written in C. The speed will be judged on a Sun Sparc
computer, so your code should run on such a machine, although you may wish to
program and debug it on another. If you choose to use another platform for development,
make sure that you optimize its performance on a Sun Sparc; also, before submitting
your work, make sure your code compiles and executes - producing correct results
- on one of the Sun Sparcs in Sweet Hall, such as elaine1 through elaine43.
For further information on these machines, see Distributed Computing Consulting.
If you don't yet have an account on these machines, consult the Distributed Computing Group
Accounts Information WWW page for information on how to obtain
one.
If you are not familiar with software development on a UNIX environment, don't
fret. Our provided sample code and development environment should ease your
task.
References
[Bach86 ]
M. J. Bach.
The Design of the Unix Operating System.
Prentice Hall, 1986.
Happy Hacking!
last updated by Guibas on May 6, 2002