GPUBench Report: Radeon X1800 Series x86/SSE2

Video Driver Information

GL_VENDOR: ATI Technologies Inc.

GL_RENDERER: Radeon X1800 Series x86/SSE2

GL_VERSION: 2.0.5340 WinXP Release

Driver Version: 5340

ALU Instructions: 1024

TEX Instructions: 512

TEX Indirections: 511

MAX_TEXTURE_IMAGE_UNITS: 16

MAX_TEXTURE_COORDS: 8

GL_EXTENSIONS:

GL_ARB_multitexture GL_EXT_texture_env_add GL_EXT_compiled_vertex_array GL_S3_s3tc GL_ARB_depth_texture GL_ARB_fragment_program GL_ARB_fragment_program_shadow GL_ARB_fragment_shader GL_ARB_multisample GL_ARB_occlusion_query GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_shader_objects GL_ARB_shading_language_100 GL_ARB_shadow GL_ARB_shadow_ambient GL_ARB_texture_border_clamp GL_ARB_texture_compression GL_ARB_texture_cube_map GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_mirrored_repeat GL_ARB_transpose_matrix GL_ARB_vertex_blend GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_window_pos GL_ARB_draw_buffers GL_ATI_draw_buffers GL_ATI_element_array GL_ATI_envmap_bumpmap GL_ATI_fragment_shader GL_ATI_map_object_buffer GL_ATI_separate_stencil GL_ATI_texture_compression_3dc GL_ATI_texture_env_combine3 GL_ATI_texture_float GL_ATI_texture_mirror_once GL_ATI_vertex_array_object GL_ATI_vertex_attrib_array_object GL_ATI_vertex_streams GL_ATIX_texture_env_combine3 GL_ATIX_texture_env_route GL_ATIX_vertex_shader_output_point_size GL_EXT_abgr GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_clip_volume_hint GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_framebuffer_object GL_EXT_multi_draw_arrays GL_EXT_packed_pixels GL_EXT_point_parameters GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_specular_color GL_EXT_shadow_funcs GL_EXT_stencil_wrap GL_EXT_texgen_reflection GL_EXT_texture3D GL_EXT_texture_compression_s3tc GL_EXT_texture_cube_map GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_env_dot3 GL_EXT_texture_filter_anisotropic GL_EXT_texture_lod_bias GL_EXT_texture_mirror_clamp GL_EXT_texture_object GL_EXT_texture_rectangle GL_EXT_vertex_array GL_EXT_vertex_shader GL_HP_occlusion_test GL_NV_blend_square GL_NV_occlusion_query GL_NV_texgen_reflection GL_SGI_color_matrix GL_SGIS_generate_mipmap GL_SGIS_multitexture GL_SGIS_texture_border_clamp GL_SGIS_texture_edge_clamp GL_SGIS_texture_lod GL_SUN_multi_draw_arrays GL_WIN_swap_hint WGL_EXT_extensions_string WGL_EXT_swap_control

Streaming: Basic Throughput

A comparison of pixel throughput when executing a simple shader program that fetches once from a floatN texture, performs a few ADD operations, and outputs the result to a floatN buffer.

fpfilltest -p -n -c 1
fpfilltest -p -n -c 2
fpfilltest -p -n -c 3
fpfilltest -p -n -c 4

fpfilltest -b -n -c 1
fpfilltest -b -n -c 2
fpfilltest -b -n -c 3
fpfilltest -b -n -c 4

Note: Both graphs represent the same test. One shows MPix/sec and one reports GB/sec.

streamthroughput.pdf
streamthroughput.eps
streamthroughput.jgr

streamthroughputbytes.pdf
streamthroughputbytes.eps
streamthroughputbytes.jgr

Streaming: Triangle vs. Quad

A comparison of performance obtained when issuing a shader using either a screen covering triangle or a large quad. The shader reads from a (1 or 4-component) float texture, performs a few ADD instructions, and outputs the result to a (1 or 4-component) pbuffer.

fpfilltest.exe -n -c 1 -r triangle
fpfilltest.exe -n -c 1 -r quad
fpfilltest.exe -n -c 4 -r triangle
fpfilltest.exe -n -c 4 -r quad

streamtriquad.pdf
streamtriquad.eps
streamtriquad.jgr

4-Component Floating Point Input Bandwidth

Floating point input bandwidth test (4-component texels). Single access test (SGL) measures bandwidth when repeatedly accessing texel (0,0). Sequential access (SEQ) is 1-to-1 copy of input texels to output texels. Random access (DEP-RAND) uses dependent texturing to randomly fetch from the dependent texture. The shaders perform a total of 4 fetches, each from a unique input texture (dependent texturing case performs an additional fetch from an index texture, which is factored into the bandwidth computation).

floatbandwidth -n -c 4 -f 4 -a single
floatbandwidth -n -c 4 -f 4 -a seq
floatbandwidth -n -c 4 -f 4 -a random -d

inputfloatbandwidth.pdf
inputfloatbandwidth.eps
inputfloatbandwidth.jgr

Bandwidth: MRT Output bandwith

Measures output bandwidth when storing results into multiple 4-component floating point render targets. Output into 1 to 4 render targets is tested.

outputbandwidth -n -o 1
outputbandwidth -n -o 2
outputbandwidth -n -o 3
outputbandwidth -n -o 4

bandwidthMRT.pdf
bandwidthMRT.eps
bandwidthMRT.jgr

Cache Hit Fetch Cost

Measures time taken to execute a shader containing a fixed number texture fetches followed by various numbers of MAD instructions. The number of instructions following the fetches is increased (x-axis) until the shader becomes compute bound. Above this threshold, running time is a linear function of the length of the program.

fetchcosts -n -m 1 -x 30 -f 1 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 2 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 3 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 4 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 5 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 6 -a single -i 2

fetchcost_single.pdf
fetchcost_single.eps
fetchcost_single.jgr

Streaming Access Fetch Cost

Measures time taken to execute a shader containing a fixed number texture fetches followed by various numbers of MAD instructions. The number of instructions following the fetches is increased (x-axis) until the shader becomes compute bound. Above this threshold, running time is a linear function of the length of the program.

fetchcosts -n -m 1 -x 60 -f 1 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 2 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 3 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 4 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 5 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 6 -a seq -i 2

fetchcost_seq.pdf
fetchcost_seq.eps
fetchcost_seq.jgr

Bandwidth: Readback

Measures readback performance of glReadPixels(0, 0, 512, 512, FORMAT, TYPE, ptr) from a single buffered window and 4-component float pbuffer.

readback.exe -x -r
readback.exe -x -b
readback.exe -f -r
readback.exe -f -b

readback.pdf
readback.eps
readback.jgr

Bandwidth: Download

Measures rate at which texture data can be loaded onto the card using multiple calls to glTexSubImage2D(). The texture is a 512x512 n-component float texture.

download.exe -n -c 1
download.exe -n -c 2
download.exe -n -c 3
download.exe -n -c 4

download.pdf
download.eps
download.jgr

Instruction Issue

Measures rates at which various shader instructions can be executed. Vector instructions are executed with 4-component vector operands.

instrissue -n -a -l 64 -m

instrissue_all.pdf
instrissue_all.eps
instrissue_all.jgr

Scalar vs Vector Instruction Issue

Compares the rate a which the card can perform ADD, SUB, MUL, and MAD instructions when operating on both scalar and 4-component vector operands. Any dual issue capability in the hardware should be apparent when performing the single component version of the instructions.

instrissue -n -c 1 -i ADD -l 40 -m
instrissue -n -c 4 -i ADD -l 40 -m
instrissue -n -c 1 -i SUB -l 40 -m
instrissue -n -c 4 -i SUB -l 40 -m
instrissue -n -c 1 -i MUL -l 40 -m
instrissue -n -c 4 -i MUL -l 40 -m
instrissue -n -c 1 -i MAD -l 40 -m
instrissue -n -c 4 -i MAD -l 40 -m

instrissue_scalar.pdf
instrissue_scalar.eps
instrissue_scalar.jgr

Branching: Early-Z

Measures the effectiveness of the early-z test at saving work. The test generates a computation mask which is loaded into the depth buffer and used to kill all fragments beneath a given threshold while a big shader is run. The test measures a uniformly distributed set of thresholds, a uniformly distributed set of 4x4 blocks, and a wavefront pattern.

branching --step 5 --block --earlyz
branching --step 5 --blockN 4 --earlyz
branching --step 5 --rand --earlyz

earlyz.pdf
earlyz.eps
earlyz.jgr

Branching: PS30

Measures the effectiveness of pixel-shader 3.0 branching at saving work. The test generates a computation mask and then runs a pixel-shader 3.0 shader which fetches the mask value and then conditionally executes a big shader. The test measures a uniformly distributed set of thresholds, a uniformly distributed set of 4x4 blocks, and a wavefront pattern.

branching --step 5 --block --ps30
branching --step 5 --blockN 4 --ps30
branching --step 5 --rand --ps30

Benchmark Failed

 GLError: GL_INVALID_OPERATION at branching.cpp:429
Does your hardware support NV_Fragment_program/2 programs?

Instruction Precision

Plots average and maximum error of complex arithmetic instructions. Comparison is performed against double precision CPU calculations. The first graph samples inputs over the range of 0.001 to PI/2. The second graph samples over 10^-12 to 0.001 (only tests functions which behave nicely at the origin).

precision.exe -m 0.001 -x 1.5707963267949 -b
precision.exe -m 1.0e-12 -x 0.001 -bsce

precision_0.pdf
precision_0.eps
precision_0.jgr

precision_1.pdf
precision_1.eps
precision_1.jgr

GPUBench is a made possible by the letter E, the number 37, and the Stanford University Graphics Lab.

Id: GPUBench.pl,v 1.20 2005/03/29 22:42:12 jeremy_sugerman Exp