Profiling Of 3 Games Running On The S3 ViRGE Chip

This web site reflects a project which was done for CS348c, the Computer Graphics Architectures Course at Stanford University.

Project Goals

In studying graphics architectures during the course it became apparent that the distribution of triangles being sent into the graphics pipeline could have a profound effect on its performance. Depending on triangle sizes, the system factor which determines performance could be either fill rate, or triangle setup bound. By knowing approximately what the distribution of triangle sizes to expect in a given environment, hardware can be designed to address fill rate or triangle setup depending on which is the critical bottleneck. Since PCs are an emerging market for accelerated 3D, we decided to look at the triangle content of 3D Dos games.

Unfortunately, there is no easy way to get at the internals of 3D Dos games which usually have highly optimized software engines. S3 Inc. recently released a new graphics accelerator chip with a 3D sub-system known as ViRGE. We arranged with S3 to have access to their driver code, so that we could log all triangles being sent to the on board accelerator and analyze the triangle content and texture use of Descent II, Terminal Velocity, and Actua Soccer, three games which were specially ported to ship as bundle sales with graphics boards which use ViRGE.

There is a copy of our initial project proposal here.

Project Phases

This project has basically consisted of three phases, modifying the driver to do logging, creating a parsing program to generate all of the statistics from the log files, and synthesizing everything into this web site.

Phase 1
Although the initial driver had been constructed with logging in mind, our first main task was to get the driver to successfully log all of the statistics we were interested in. S3 ships a toolkit library to developers which want to support their chip. The library can be linked in with the game developers program. The game code makes calls into the library to set the state of the graphics sub-system and to command it to render triangles. An interesting feature is that the library was designed so that a modified version of the library could be loaded before the game, and sub-routine calls could thereby be rerouted into the modified driver. This was the key to our being able to trap the calls from the game applications.

Once we had a good understanding of the architecture of the driver and library, we had to place the logging points at the appropriate locations in the code. Unfortunately, the S3 toolkit has multiple points of entry, so it was somewhat tricky to find the best location to place the logging code. If it was placed in too many locations, you got redundant information, and if placed in too few locations you didn't receive all of the data from the game. Eventually we were able to suitably instrument the code to reliably dump all of the information we needed (see the Graph Descriptions for exactly what was logged).

In looking at the driver code, we were also able to figure out a way to take a screen dump out of the graphics buffer everytime the application signaled it was ready for the next frame by waiting for the vertical synchronization signal to indicate that. This allowed us to create MPEGs of the sequences we eventually used for our statistics calculation.

The one problem we ran into at this stage was with Descent II, which did not seem to have any frame boundaries. Not only did this eleminate the possibility of generating frame based statistics for Descent II, but also made it impossible to get frame dumps since we could not determine when to take them. We were unable to solve this problem, but did determine that Descent II was checking the standard VGA vertical synchronization port on its own to determine frame boundaries. On account of this problem, the Descent II statistics are not as complete as for the other two games.

Phase 2
Once logging was working fairly well, we began to work on the parser program to generate the statistics from the log file. This was pretty straight-forward brute force work to crank through the output logs and generate the required data. All of the graphs that are created are documented on the Graph Description page. The complete source code is also available and should be compilable on most systems (it has been tested on SunOS, Linux and DOS using Watcom C/C++ 10.5). The complete log files are available for downloading from the sections for each game, and the parser program can be used to generate statistics for different sequences than we analyzed.

The only difficulty that we ran into with the parsing had to do with texture statistics. The S3 toolkit requires the individual application to manage the movement of textures between main memory and the graphics board, and only requires that a pointer be passed so that it knows what texture to use in rasterizing the triangles. On account of this, we were only able to see when a new texture was switched to, but could not determine if it was the same one that had been at that address locally. In Actua Soccer, it is clear that the game is doing extensive texture management, since it only uses three distinct texture addresses.

Phase 3 - Synthesis
The final step was using the tools completed in the previous phases to generate the real data. We selected the demo sequences from the games to use as our test sequences, since they tend to have a lot of action, and are fairly easily repeatable. Each game runs at a resolution of 640x480 in 16 bit color (ARGB 1555). The textures are all in the same color format and range in size from as big as 256x256 in Actua Soccer to 32x32 in Terminal Velocity. All of the games also use the 4TPP filtering mode to smooth out pixels which are magnified at close distances. Each game sequence was several minutes long, 1':50" for Descent II, 2':15" for Terminal Velocity, and 4':00" for Actua Soccer. For Actua Soccer and Terminal Velocity we created the full set of graphs, including "dot statistics" for frames 40 through 50, and a MPEG movie of the trace sequence. As mentioned earlier, for Descent II we could not find frame boundaries, and thus were not able to do per frame statistics, dot statistics, or generate a MPEG. We have video taped the Descent II sequence and hope to have it digitized soon.

Since our goal was generating the statistics, we have not had much chance to analyze the results. Some things are very noticeable, however. One example is the use of Painters algorithm to display triangles that are further away at the beginning of the frame. This is visible in these W-Ave dot statistics from both Actua Soccer and Terminal Velocity (recall that the dot statistics were taken over 11 frames and note that the periodicity is close to 11). Another thing that is evident is the typical exponential distribution of triangle sizes which can be seen in these graphs from Descent II, Actua Soccer, and Terminal Velocity. One final thing of note was the dot statistics for types of triangles rendered. Here they are for Actua Soccer and Terminal Velocity. These clearly show that within one frame textured triangles are done in a separate batch than gouraud shaded triangles (notice there are exactly 11 red bands for the 11 frames in this sequence).

Conclusions and Future Directions

Overall we are pleased with the amount of information we were able to extract from these games. With the ground work done, it should be possible to create a driver in the future that will allow statistics to be collected from games that adhere to the soon to be standard Direct3D interface. Hopefully the statistics, trace files, and tools we have encapsulated here in this web site will also serve as the basis for some more in depth analysis of the nature of current 3D Dos games.