Video Driver Information

Adapter and Driver Info...

Card Description: ATI Radeon HD 2900 XT

Vendor Id: 4098

Renderer Id:

Pixel shader caps :

Pixel Shader Version : 4294902528

Max PS30 Instruction slots : 32768

Max Dynamic Flow Control Depth : 24

Num temporary registers : 32

Max Static Flow Control Depth : 4

Driver and Version: ati2dvag.dll 393230.662075

Streaming: Basic Throughput

A comparison of pixel throughput when executing a simple shader
program that fetches once from a floatN texture, performs a few ADD
operations, and outputs the result to a floatN buffer.

fpfilltest -p -n -c 1

fpfilltest -p -n -c 2

fpfilltest -p -n -c 3

fpfilltest -p -n -c 4

fpfilltest -b -n -c 1

fpfilltest -b -n -c 2

fpfilltest -b -n -c 3

fpfilltest -b -n -c 4

**Note:** Both graphs represent the same test. One shows MPix/sec and
one reports GB/sec.

Streaming: Triangle vs. Quad

A comparison of performance obtained when issuing a shader using
either a screen covering triangle or a large quad. The shader
reads from a (1 or 4-component) float texture, performs a few ADD
instructions, and outputs the result to a (1 or 4-component)
pbuffer.

fpfilltest.exe -n -c 1 -r triangle

fpfilltest.exe -n -c 1 -r quad

fpfilltest.exe -n -c 4 -r triangle

fpfilltest.exe -n -c 4 -r quad

4-Component Floating Point Input Bandwidth

Floating point input bandwidth test (4-component texels).
Single access test (SGL) measures bandwidth when repeatedly accessing
texel (0,0). Sequential access (SEQ) is 1-to-1 copy of input texels
to output texels. Random access (DEP-RAND) uses dependent texturing
to randomly fetch from the dependent texture. The shaders perform a
total of 4 fetches, each from a unique input texture
(dependent texturing case performs an additional fetch from an index
texture, which is factored into the bandwidth computation).

floatbandwidth -n -c 4 -f 4 -a single

floatbandwidth -n -c 4 -f 4 -a seq

floatbandwidth -n -c 4 -f 4 -a random -d

Bandwidth: MRT Output bandwith

Measures output bandwidth when storing results into multiple
4-component floating point render targets. Output into 1 to 4 render
targets is tested.

outputbandwidth -n -o 1

outputbandwidth -n -o 2

outputbandwidth -n -o 3

outputbandwidth -n -o 4

Cache Hit Fetch Cost

Measures time taken to execute a shader containing a fixed number
texture fetches followed by various numbers of MAD instructions. The
number of instructions following the fetches is increased (x-axis)
until the shader becomes compute bound. Above this threshold,
running time is a linear function of the length of the program.

fetchcosts -n -m 1 -x 60 -f 1 -a single -i 2

fetchcosts -n -m 1 -x 60 -f 2 -a single -i 2

fetchcosts -n -m 1 -x 60 -f 3 -a single -i 2

fetchcosts -n -m 1 -x 60 -f 4 -a single -i 2

fetchcosts -n -m 1 -x 60 -f 5 -a single -i 2

fetchcosts -n -m 1 -x 60 -f 6 -a single -i 2

Streaming Access Fetch Cost

Measures time taken to execute a shader containing a fixed number
texture fetches followed by various numbers of MAD instructions. The
number of instructions following the fetches is increased (x-axis)
until the shader becomes compute bound. Above this threshold,
running time is a linear function of the length of the program.

fetchcosts -n -m 1 -x 60 -f 1 -a seq -i 2

fetchcosts -n -m 1 -x 60 -f 2 -a seq -i 2

fetchcosts -n -m 1 -x 60 -f 3 -a seq -i 2

fetchcosts -n -m 1 -x 60 -f 4 -a seq -i 2

fetchcosts -n -m 1 -x 60 -f 5 -a seq -i 2

fetchcosts -n -m 1 -x 60 -f 6 -a seq -i 2

Bandwidth: Readback

Measures readback performance of glReadPixels(0, 0, 512, 512, FORMAT,
TYPE, ptr) from a single buffered window and 4-component float
pbuffer.

readback.exe -x -r

readback.exe -f -a

Bandwidth: Download

Measures rate at which texture data can be loaded onto the card using
multiple calls to glTexSubImage2D(). The texture is a 512x512
n-component float texture.

download.exe -n -c 1

download.exe -n -c 2

download.exe -n -c 3

download.exe -n -c 4

Instruction Issue

Measures rates at which various shader instructions can be executed.
Vector instructions are executed with 4-component vector operands.

instrissue -n -a -l 64 -m

Scalar vs Vector Instruction Issue

Compares the rate a which the card can perform ADD, SUB, MUL, and MAD
instructions when operating on both scalar and 4-component vector
operands. Any dual issue capability in the hardware should be
apparent when performing the single component version of the
instructions.

instrissue -n -c 1 -i add -l 40 -m

instrissue -n -c 4 -i add -l 40 -m

instrissue -n -c 1 -i sub -l 40 -m

instrissue -n -c 4 -i sub -l 40 -m

instrissue -n -c 1 -i mul -l 40 -m

instrissue -n -c 4 -i mul -l 40 -m

instrissue -n -c 1 -i mad -l 40 -m

instrissue -n -c 4 -i mad -l 40 -m

Branching: Early-Z

Measures the effectiveness of the early-z test at saving work. The test
generates a computation mask which is loaded into the depth buffer and
used to kill all fragments beneath a given threshold while a big shader
is run. The test measures a uniformly distributed set of thresholds, a
uniformly distributed set of 4x4 blocks, and a wavefront
pattern.

branching --step 5 --block --earlyz

branching --step 5 --blockN 4 --earlyz

branching --step 5 --rand --earlyz

Branching: PS30

Measures the effectiveness of pixel-shader 3.0 branching at saving work.
The test generates a computation mask and then runs a pixel-shader 3.0
shader which fetches the mask value and then conditionally executes a
big shader. The test measures a uniformly distributed set of
thresholds, a uniformly distributed set of 4x4 blocks,
and a wavefront pattern.

branching --step 5 --block --ps30

branching --step 5 --blockN 2 --ps30

branching --step 5 --blockN 4 --ps30

branching --step 5 --blockN 8 --ps30

branching --step 5 --blockN 16 --ps30

branching --step 5 --blockN 32 --ps30

branching --step 5 --rand --ps30

Instruction Precision

Plots average and maximum error of complex arithmetic instructions.
Comparison is performed against double precision CPU calculations.
The first graph samples inputs over the range of 0.001 to PI/2. The
second graph samples over 10^-12 to 0.001 (only tests functions which
behave nicely at the origin).

precision.exe -m 0.001 -x 1.5707963267949 -b

precision.exe -m 1.0e-12 -x 0.001 -bsce

GPUBench is a made possible by the letter E, the number 37, and the Stanford University Graphics Lab.

*Id: GPUBench.pl,v 1.1 2007/06/01 22:11:47 mikehouston Exp *