Brook Language

Getting Started
    BRT C++
Current Issues

Brook is an extension of standard ANSI C and is designed to incorporate the ideas of data parallel computing and arithmetic intensity into a familiar, efficient language. The general computational model, referred to as streaming, provides two main benefits over traditional conventional languages:

  • Data Parallelism: Allows the programmer to specify how to perform the same operations in parallel on different data.
  • Arithmetic Intensity: Encourages programmers to specify operations on data which minimize global communication and maximize localized computation.
More about Brook can be found at the Merrimac web site which contains complete specifications for the language. BrookGPU implements a subset the Brook specification for use on the GPU.

A Brook program consists of legal C code plus syntactic extensions to declare streams and denote given functions as kernels.


A stream is a new data type addition which represents a collection of data which can be operated on in parallel. Streams are declared with angle brackets syntax similar to arrays. A sample stream declaration could look like:
    float s<10, 10>;
to denote a 2 dimentional stream of floats. Each stream is made up of stream elements. In this example, s is a stream consisting of 100 stream elements of type float. The shape of the stream refers to the dimensionality of the stream. In this example, s is a stream of shape 10 by 10. The stream is representing in row major order, similar to C arrays. Therefore, a stream of shape <100> is equivalent to a stream of shape <1, 100> and <1, 1, 100>.

Although similar to C arrays, stream in BrookGPU have the following differences:

  • Indexing to gain access to stream elements (i.e. s[3][2]) is not allowed outside of kernels (discussed below).
  • No static initializers are allowed, i.e.:
      float s<100> = {1.0f, 2.0f, ...
  • Streams must be local (stack) variables.
  • Streams can only be read and written inside kernel functions or through special operators that fill streams from regular pointers and vice versa:
      streamRead(s, data_s);    /* Fill stream s<> with the data at *data_s */
      streamWrite(s, data_s);   /* Fill the data at *data_s from stream s<> */
    This operation effectively performs a copy operation into the stream, though it may be optimized away by the compiler.
For GPU compiled code, streams correspond to regions of texture memory. The compiler may perform code flow analysis to better determine when and where to allocate texture memory, or eliminate entirely the need for temporary storage of a stream variable.

For convenience, Brook also extends C to include float2, float3, and float4 as basic types. These types are equivalant typedef'ed structs, i.e.:

typedef struct {
  float x;
  float y;
  float z;
  float w;
} float4;
In addition, these types can be created using a constructor syntax:
float4 a (0.2f, 1.0f, 3.2f, 1.0f);  // x = 0.2f, y = 1.0f ...


Kernels are special functions that operate on streams. A kernel is a parallel function applied to every element of the input streams. Calling a kernel function on a set of input streams performs an implicit for loop over the elements of streams, invoking the body of the kernel for each element. With GPU hosted kernels, the streams are pushed into video memory and the kernels compiled down to fragment programs that 'render' from them. Kernel definitions closely resemble function definitions, except that the keyword kernel precedes the declaration, the return type is always void, and one of the stream parameters is marked with the type qualifier 'out'. Global memory and static variables are not accessible inside kernels. A sample kernel declaration looks like:

  kernel void k(float s<>, float3 f, float a[10][10], out float o<>) {...
In this example, the body of kernel k will be called on each element of input stream s. The variable s is of the type float inside the kernel body and is initialized with a different element of s for each implicit invocation. The variable f is a constant parameter of the kernel which maintains the same value for each iteration. Writes to either stream inputs or constant parameters is not permitted inside kernels.

The variable a is a 2D array which may be accessed via normal C-syntax array access. The dimension sizes of a do not need to be specified, however, the compiler may be able to generate faster code if provided. Any stream may be passed as an array input to a kernel.

The variable o is an output stream. The output stream is a write only parameter which is assigned a value by the body of the kernel function. The type of the variable o is a native float inside the kernel body. The kernel body is implicitly executed on each element of s, producing elements of o.

Note that both kernel prototypes and bodies are limited to the subset of C/C++ supported by Cg / HLSL. This includes the vector float types, matrix types, and standard library functions. For more information, please visit the CG language specification and HLSL documentation.

Calling a kernel function is similar to calling any C function:

  kernel void k(float s<>, float3 f, float a[10][10], out float o<>);

  float a<100>;
  float b<100>;
  float c<10,10>;

  streamRead(a, data1);
  streamRead(b, data2);
  streamRead(c, data3);

  // Call kernel "k"
  k (a, 3.2f, c, b);

  streamWrite(b, result);


Brook provides support for parallel reductions on streams. A reduction is an operation from one stream to another stream of smaller dimensions or to a single value. The operation can be defined by a single, two-input operator that is both associative and commutative. Given this property, the elements of the stream can be treated as an unordered set and the operator is applied to combine those elements in any order until it yields the reduced values.

Reduction functions only support reductions which are both associative and commutative. The compiler must be able to compute the reduction in any order. For example, computing sum of a stream is both commutative and associative:

  a+b+c+d = (a+b)+(c+d) = c+a+d+b

Examples of legal reductions are sum, product, min/max, OR, AND, and XOR bit operations. Examples of invalid reductions include subtraction and dividision. Note that the compiler may not attempt to validate that the reduction function is legal. The programmer is asserting to the compiler that the reduction function is valid. Specifying an invalid reduction results in undefined behavior.

Reduction functions are specified similarly to kernel functions with few additional restrictions. Below is a sample reduction function which computes the sum of a float stream.

  void reduce sum (float a<>,
                   reduce float result<>) {
    result = result + a;
Reduction functions can only take two stream arguments, one input stream and one output stream labeled with the reduce keyword. The remaining arguments cannot be streams. In addition, the types of the two streams must match. The kernel is allowed to both read and write to the reduce parameter in order to compute the reduction of the two values.

The reduce parameter passed to the reduction kernel can either be a scalar value or a stream. If the reduce parameter is a scalar, the reduction function is called on all of the stream elements to produce a single value which is placed in the reduce variable. The initial value of the reduction is defined to be the first value of the input stream.

Multi-dimensional Reductions

If the reduce parameter is a stream, the relative shapes of the input stream with the reduction stream determines how the reduction is performed.
   float s<100,200>;

   float t;
   sum(s, t);  // Sum all elements of s, places the result in t 

   float t<100, 1>;
   sum(s, t);  // Sum across the "200" dimension of s

   float t<1, 200>;
   sum(s, t);  // Sum across the "100" dimension of s
The relative difference in the input dimensions and the reduce stream dimensions is resolved by calling the reduction kernel to "reduce" the elements. For example:
   float s<100,200>;

   float t<50, 20>;
   sum(s, t);
In this example, t differs by a factor of 2 in the "y" direction and a factor of 10 in the "x" direction. The reduction kernel is called on 2x10 tiles of s to produce a single element of t.

If the number of dimensions does not match, a compiler error is generated. If a dimension in the reduction stream is larger or not an even multiple of the input stream, a compiler error is generated.

Using Stream Shape

In order to generate code which performs effienently on today's graphics hardware, Brook for GPUs does not implement the majority of the stream operators present in the official Brook specification. Rather, the language takes advantage of the relative stream shapes of the input and output streams when calling kernels and implicitly performs the stream operators.

The operators streamStride and streamRepeat are combined and called implicitly based on stream shapes. For example:

  kernel void foo (float a<>, out float b<>);

  float a<10,20>;
  float b<50,10>;

In this example, the "y" dimension of the output stream b is 5x larger than the input stream a. However, the "x" dimension of b is half as large as the input stream.

The kernel foo is called once for each stream element in b. The stream a is implicitly resized to match the size of the output. In this example, we grow in the y direction by repeating elements (1,1,1,1,1,2,2,2,2,2,3,3,3,3,3....) and shrink in the x direction by eliminating every other element (1,3,5,7,9....). This is repeated for each stream input. Arrays parameters are not affected.

Iterator Streams

Iterator streams are a special stream type which are preinitialized with sequential values (1,2,3,4,...). Constructing an iterator stream uses the iter keyword and operator:
  iter float s<100> = iter(0.0f, 100.0f);
  // s initialized with 0.0, 1.0, 2.0, ..., 99.0
The first argument of the iter operator is always the initial value of the stream. The second operator is the upper bound of the stream, producing the uniform distribution [initial, upper bound). The step size between stream elements is equal to (initial - upper bound) / number of elements in the stream.
  iter float s<100> = iter(0.0, 1.0f);
  // s: 0.00, 0.01, 0.02, 0.03, ..., 0.99

  iter float s<10> = iter(2.0f, 7.0f);
  // s: 2.0, 2.5, 3.0, 3.5, ..., 6.5
One dimensional iterator streams also works with the float2, float3, float4 types. Each component is interpolated separately:
  iter float2 s<10> = iter(float2(0, 0), float2(10, 5));
  // s: (0,0), (1,0.5), (2,1), ..., (9,4.5).
The current compiler only supports 1D and 2D iterator streams. If the shape of the stream is 2D, the stream element type must float2. In the 2D case, the interpolator works slightly differently.
  iter float2 s<4, 4> = iter (float2(0,0), float2(4, 10))
   (0,   0)  (1,   0)  (2,   0) (3,   0)
   (0, 2.5)  (1, 2.5)  (2, 2.5) (3, 2.5)
   (0,   5)  (1,   5)  (2,   5) (3,   5)
   (0, 7.5)  (1, 7.5)  (2, 7.5) (3, 7.5)
Each component of the stream element is interpolated to min/max of corresponding each dimension.

Currently the BrookGPU compiler and runtime do not support iterator streams of shape larger than 2D. Furthermore, if an iter stream is passed to a kernel, its shape must match the output stream shape. These restrictions may be lifted in future releases.

Iterator streams are currently a different type than data streams. Therefore arguments which expect iterator streams must contain the iter keyword. Passing an iterator stream as a output to a kernel is not allowed. This may change in future releases.


indexof( stream );
The indexOf operator can be used within a kernel to return the index of the given stream for the current kernel invocation.


streamScatterOp(s, index_stream, array, STREAM_SCATTER_ASSIGN);
The streamScatterOp operator performs an indirect write operation taking as input the stream s containing the data to scatter, the index stream containing the offsets within the array to write the data. The fourth argument is the operation to perform to combine the incoming data and the data already stored in the array. This argument can either be a reduction function, or a build in enumerated function, such as STREAM_SCATTER_ASSIGN which performs a straightforward write of the stream data to the array.


streamGatherOp(t, index_stream, array, STREAM_GATHER_FETCH);
streamGatherOp performs an indirect read on the array using the index stream to produce a stream of fetch values (t). If the array is multidimensional, the index stream provides a linear offset into the array (based on C row-major array layout). The fourth argument can be any kernel function which consists of a single output stream or a pre-defined operator. The STREAM_GATHER_FETCH parameter indicates that the Gather operation should simply read the value from the array and place it into the stream. Logo The fly fishing flies featured on this web site can be purchased at The English Fly Fishing Shop