GPUBench : How much does your GPU bench?

Fetchcosts is an yet another bandwidth analysis test. It generates shaders that can reveal the cost (in terms of number of MAD instructions) of texture fetches when accessing data in a variety of patterns. The goal of this test is to allow the user to specify an access pattern that is similar to that of a hypothetical or real life shader program, and to determine roughly how many instructions a shader would need to perform in order to be compute limited. Thus, given a texture access type, and a shader's ratio of texturing instructions to arithmetic ones, it becomes possible to estimate the performance of a shader.

The --size specifies the size of the framebuffer and input textures. At this time only 4-component textures and buffers are supported. --render specifies whether a screen covering quad or large triangle is rasterized to generate fragments.

To reduce timing noise, specify that the test should be repeated a large number of times using --iters.

Fetchcosts shader perform begin by performing a number of texture fetches. It is easy to specify complex dependent texturing scenarios. --dependentlevels is used to specify how many levels of dependent texturing to use. --fetches option specifies the number of fetches to perform at each dependent level, thus (dependentlevels+1)*fetches texture lookups are done in the shader. The value obtained from the first lookup on each dependent level is used as the coordinate for lookups into textures at the next level. Using --access, texture access can be set to single access, (texel 0,0), seq (sequential reading), and random. The values of stored in textures are set so that these access patterns apply to dependent texture fetches as well (eg. when in random access mode, random values are stored in the textures). However, if random access will never occur in the first level of texturing. Instead of accessing 'fetches' unique textures in each level, 'fetches' accesses from the same texture will be performed when --nomultitexture is specified.

The texture accesses form the first part of the generated shader. The texturing instructions are followed by a series of MAD instructions to create a shader whose total length matches the specified instruction count. --mininstr and --maxinstr specify the range of lengths of shaders to test in a particular test run. When the number of MAD instructions grows long enough, the shader will be compute limited, and running time will be a function of the number of instructions. When the instruction count is short enough that the shader is bandwidth limited, execution time remains largely dependent on the type and number texturing operations performed. As an example, in the graph below, it is very clear when the shader passes the threshold of being bandwidth limited to being compute limited. Each of the 4 lines corresponds to a different number of texture accesses. As expected, it takes more instructions to become compute limited in the case where more textures are accessed at the beginning of the shader.