From: Bill Dally [billd@csl.stanford.edu]
Sent: Friday, October 12, 2001 7:33 AM
To: Ben Serebrin; Ian Buck
Cc: William R Mark; hanrahan@cs.stanford.edu; Ian Andrew Buck;
billd@cva.stanford.edu
Subject: RE: Conditional code?

Ian,

We're all agreed that on the software side you should be able to express 
any "C" control construct
in a kernel - if, for, do, while, subroutine calls.

How to efficiently implement this is an interesting research 
question.  Instruction bandwidth is *very* expensive - it costs more to 
deliver an instruction to an ALU than the ALU costs (even a 64-bit FP ALU).

Ben is looking at a bunch of alternatives ranging from full SIMD to full 
MIMD.  To evaluate them he needs some benchmarks.  If all we have to do is 
Bresenham, then SIMD+conditional streams will work just fine - we run a lot 
of kernels just like this on Imagine with very high efficiency.  However, I 
had the impression from Pat that there were applications with more complex 
conditional structures in the inner loop.

If you know of such programs, it would help a lot to put a good benchmark 
set together.

----Bill

PS.  This comparison of control alternatives will also make a great ISCA 
paper and perhaps a chapter of Ben's thesis.


At 12:41 AM 10/12/01 -0700, Ben Serebrin wrote:
>Ian,
>
>I think I understand from your mail that you mean each cluster should
>independently be able to do all these things that C can do.  I agree that
>we have plenty of logic gates for MIMD properties, but as soon as you
>start spending gates on memory, you can hit troubles.  As soon as we need
>to have separate instruction memories for each of N clusters, we have
>N*(instruction memory size) area used up for instruction memory. It's
>likely that programs are big, so we'd need to provide a lot of memory.
>
>Since streams benefit from data parallelism, it's likely that the work
>being performed on streams will often be the same or similar in each
>cluster, so complete independence of all the clusters may not be
>necessary.
>
>I'm working on a document now based on a discussion Bill Dally and I had
>at lunch today, looking at the various shapes MIMD can take.  I'll send
>the completed version to you tomorrow, hopefully.  But as a preview, the
>most interesting form to me is the following:
>
>As in Imagine, there is a large, shared instruction memory that broadcasts
>instructions to all clusters.  Each cluster will have its own PC and its
>own small instruction memory.  The PC can address either the current
>instruction word coming from the shared memory, or the PC can address the
>small cluster-local instruction memory.  At a conditional branch, a
>cluster may branch into (or out of) its local memory.
>
>This solves the problem of having N large instruction memories, and takes
>advantage of the fact that any deviations from the main program are likely
>to be small variances followed by a merge back into the main program.
>This could, for example, allow a cluster that is rasterizing a large
>triangle to work continuously while other clusters fetch the next
>triangle.  In current Imagine code, the clusters that are loading share
>instructions with the clusters that are still working, so the working
>clusters must stall each time the finished clusters require a data load.
>
>Synchronization issues need to be addressed.  Also, if we keep the SRF
>form, the independently-running clusters will still need to coordinate
>their SRF accesses (since the SRF is ganged--there's only one address).
>An alternative to this may be N side-by-side RAMs with separate addresses
>that can act either as an SRF, or as an independent set of RAMs.  There
>seems to be some problem with this last notion, but I can't see it yet.
>
>Ben
>
>
>On Thu, 11 Oct 2001, Ian Buck wrote:
>
> >
> >   Basically, you should be able to do all that C can do: loops, data
> > conditionals, branches, perhaps even recursion.  You should look at the 
> line
> > rasterizer in the Brook document, has all three.  I think we've reached a
> > point in chip design that your so limited by the pinout (pad ring
> > determining die size), the extra gates needed for MIMD should be free.
> >
> >   http://graphics.stanford.edu/streamlang/brook_v0.1.pdf
> >
> >   Ian.
> >
> > > -----Original Message-----
> > > From: Ben Serebrin [mailto:serebrin@Stanford.EDU]
> > > Sent: Thursday, October 11, 2001 4:01 PM
> > > To: William R Mark; hanrahan@cs.stanford.edu; Ian Andrew Buck
> > > Subject: Conditional code?
> > >
> > >
> > > Hi, all,
> > >
> > > Bill Dally and I were talking today about the degree of MIMDism in the
> > > SSS; there's an interesting spectrum between SIMD and MIMD, and some of
> > > the midpoints are rather interesting.  I'm working on a memo 
> comparing the
> > > architectures.
> > >
> > > One of the things that would beneficially drive my thinking is a few good
> > > examples of what kinds of conditional program code we might be 
> looking at.
> > > Any suggestions?
> > >
> > > Thanks very much!
> > > Ben
> > >
> >
>
>
>

--------------------------------------------------------------------------
Bill Dally                billd@csl.stanford.edu             (650)725-8945
Professor of Electrical Engineering and Computer Science  FAX(650)725-6949
Computer Systems Laboratory, Stanford University
Gates Room 301
Stanford, CA  94305                         http://csl.stanford.edu/~billd
--------------------------------------------------------------------------