Video Driver Information
GL_VENDOR: 3Dlabs

GL_RENDERER: Wildcat Realizm

GL_VERSION: 2.0

Driver Version: 4.05-0484

ALU Instructions: 4096

TEX Instructions: 4096

TEX Indirections: 4096

MAX_TEXTURE_IMAGE_UNITS: 8

MAX_TEXTURE_COORDS: 8

GL_EXTENSIONS:

ARB_imaging GL_ARB_depth_texture GL_ARB_fragment_program GL_ARB_fragment_shader GL_ARB_half_float_pixel GL_ARB_imaging GL_ARB_multisample GL_ARB_multitexture GL_ARB_occlusion_query GL_ARB_point_parameters GL_ARB_point_sprite GL_ARB_shader_objects GL_ARB_shading_language_100 GL_ARB_shadow GL_ARB_texture_border_clamp GL_ARB_texture_compression GL_ARB_texture_cube_map GL_ARB_texture_env_add GL_ARB_texture_env_combine GL_ARB_texture_env_crossbar GL_ARB_texture_env_dot3 GL_ARB_texture_float GL_ARB_texture_mirrored_repeat GL_ARB_texture_non_power_of_two GL_ARB_transpose_matrix GL_ARB_vertex_buffer_object GL_ARB_vertex_program GL_ARB_vertex_shader GL_ARB_window_pos GL_EXT_422_pixels GL_EXT_bgra GL_EXT_blend_color GL_EXT_blend_func_separate GL_EXT_blend_minmax GL_EXT_blend_subtract GL_EXT_color_matrix GL_EXT_color_subtable GL_EXT_color_table GL_EXT_convolution GL_EXT_convolution_border_modes GL_EXT_draw_range_elements GL_EXT_fog_coord GL_EXT_fog_function GL_EXT_fog_offset GL_EXT_generate_mipmap GL_EXT_histogram GL_EXT_interlace GL_EXT_multi_draw_arrays GL_EXT_multisample GL_EXT_packed_pixels GL_EXT_rescale_normal GL_EXT_secondary_color GL_EXT_separate_specular_color GL_EXT_shadow_funcs GL_EXT_stencil_wrap GL_EXT_stencil_two_side GL_EXT_texture3D GL_EXT_texture_border_clamp GL_EXT_texture_color_table GL_EXT_texture_compression_s3tc GL_EXT_texture_edge_clamp GL_EXT_texture_env_combine GL_EXT_texture_filter_anisotropic GL_EXT_texture_lod GL_EXT_texture_lod_bias GL_EXT_texture_object GL_EXT_vertex_array GL_Autodesk_valid_back_buffer_hint GL_HP_occlusion_test GL_I3D_argb GL_I3D_interlace_read GL_IBM_cull_vertex GL_IBM_rescale_normal GL_KTX_buffer_region GL_NV_blend_square GL_NV_texgen_reflection GL_OML_interlace GL_SGI_color_matrix GL_SGI_color_table GL_SGI_texture_color_table GL_SGIS_generate_mipmap GL_SGIS_multisample GL_SGIS_texture_border_clamp GL_SGIS_texture_color_mask GL_SGIX_interlace GL_SGIX_fog_offset GL_3DL_direct_texture_access2 GL_ARB_color_buffer_float

Streaming: Basic Throughput
A comparison of pixel throughput when executing a simple shader program that fetches once from a floatN texture, performs a few ADD operations, and outputs the result to a floatN buffer.

fpfilltest -n -c 1
fpfilltest -n -c 2
fpfilltest -n -c 3
fpfilltest -n -c 4

Streaming: Triangle vs. Quad
A comparison of performance obtained when issuing a shader using either a screen covering triangle or a large quad. The shader reads from a (1 or 4-component) float texture, performs a few ADD instructions, and outputs the result to a (1 or 4-component) pbuffer.

fpfilltest.exe -n -c 1 -r triangle
fpfilltest.exe -n -c 1 -r quad
fpfilltest.exe -n -c 4 -r triangle
fpfilltest.exe -n -c 4 -r quad

4-Component Floating Point Input Bandwidth
Floating point input bandwidth test (4-component texels). Single access test (SGL) measures bandwidth when repeatedly accessing texel (0,0). Sequential access (SEQ) is 1-to-1 copy of input texels to output texels. Random access (DEP-RAND) uses dependent texturing to randomly fetch from the dependent texture. The shaders perform a total of 4 fetches, each from a unique input texture (dependent texturing case performs an additional fetch from an index texture, which is factored into the bandwidth computation).

floatbandwidth -n -c 4 -f 4 -a single
floatbandwidth -n -c 4 -f 4 -a seq
floatbandwidth -n -c 4 -f 4 -a random -d

Bandwidth: MRT Output bandwith
Measures output bandwidth when storing results into multiple 4-component floating point render targets. Output into 1 to 4 render targets is tested.

outputbandwidth -n -o 1
outputbandwidth -n -o 2
outputbandwidth -n -o 3
outputbandwidth -n -o 4

Benchmark Failed

 Could Not Load glDrawBuffersATI
 
Cache Hit Fetch Cost
Measures time taken to execute a shader containing a fixed number texture fetches followed by various numbers of MAD instructions. The number of instructions following the fetches is increased (x-axis) until the shader becomes compute bound. Above this threshold, running time is a linear function of the length of the program.


fetchcosts -n -m 1 -x 30 -f 1 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 2 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 3 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 4 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 5 -a single -i 2
fetchcosts -n -m 1 -x 30 -f 6 -a single -i 2

Streaming Access Fetch Cost
Measures time taken to execute a shader containing a fixed number texture fetches followed by various numbers of MAD instructions. The number of instructions following the fetches is increased (x-axis) until the shader becomes compute bound. Above this threshold, running time is a linear function of the length of the program.


fetchcosts -n -m 1 -x 60 -f 1 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 2 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 3 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 4 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 5 -a seq -i 2
fetchcosts -n -m 1 -x 60 -f 6 -a seq -i 2

Bandwidth: Readback
Measures readback performance of glReadPixels(0, 0, 512, 512, FORMAT, TYPE, ptr) from a single buffered window and 4-component float pbuffer.

readback.exe -x -r
readback.exe -x -b
readback.exe -f -r
readback.exe -f -b

Bandwidth: Download
Measures rate at which texture data can be loaded onto the card using multiple calls to glTexSubImage2D(). The texture is a 512x512 4-compoent float texture.

download.exe -n -c 1
download.exe -n -c 2
download.exe -n -c 3
download.exe -n -c 4

Instruction Issue
Measures rates at which various shader instructions can be executed. Vector instructions are executed with 4-component vector operands.

instrissue -n -a -l 64 -m

Scalar vs Vector Instruction Issue
Compares the rate a which the card can perform ADD, SUB, MUL, and MAD instructions when operating on both scalar and 4-component vector operands. Any dual issue capability in the hardware should be apparent when performing the single component version of the instructions.


instrissue -n -c 1 -i ADD -l 40 -m
instrissue -n -c 4 -i ADD -l 40 -m
instrissue -n -c 1 -i SUB -l 40 -m
instrissue -n -c 4 -i SUB -l 40 -m
instrissue -n -c 1 -i MUL -l 40 -m
instrissue -n -c 4 -i MUL -l 40 -m
instrissue -n -c 1 -i MAD -l 40 -m
instrissue -n -c 4 -i MAD -l 40 -m

Instruction Precision
Plots average and maximum error of complex arithmetic instructions. Comparison is performed against double precision CPU calculations. The first graph samples inputs over the range of 0.001 to PI/2. The second graph samples over 10^-12 to 0.001 (only tests functions which behave nicely at the origin).

precision.exe -m 0.001 -x 1.5707963267949 -b
precision.exe -m 1.0e-12 -x 0.001 -bsce


GPUBench is a made possible by the letter E, the number 37, and the Stanford University Graphics Lab.