Profiling Of 3 Games Running On The S3 ViRGE Chip
This web site reflects a project which was done for
CS348c, the Computer
Graphics Architectures Course at Stanford
University.
Project Goals
In studying graphics architectures during the course it became apparent that
the distribution of triangles being sent into the graphics pipeline could have
a profound effect on its performance. Depending on triangle sizes, the system
factor which determines performance could be either fill rate, or triangle
setup bound. By knowing approximately
what the distribution of triangle sizes to expect in a given environment,
hardware can be designed to address fill rate or triangle setup depending on
which is the critical bottleneck. Since PCs are an emerging market for
accelerated 3D, we decided to look at the triangle content of 3D Dos games.
Unfortunately, there is no easy way to get at the internals of 3D Dos games
which usually have highly optimized software engines. S3 Inc. recently released a new graphics accelerator chip with a 3D
sub-system known as ViRGE.
We arranged with S3 to have access to their driver code, so that we
could log all triangles being sent to the on board accelerator and analyze
the triangle content and texture use of Descent II, Terminal Velocity, and
Actua Soccer, three games which were specially ported to ship as bundle sales
with graphics boards which use ViRGE.
There is a copy of our initial project proposal
here.
Project Phases
This project has basically consisted of three phases, modifying the driver to
do logging, creating a parsing program to generate all of the statistics from
the log files, and synthesizing everything into this web site.
Phase 1
Although the initial driver had been constructed with logging in mind, our
first main task was to get the driver to successfully log all of the
statistics we were interested in. S3 ships a toolkit library to developers
which want to support their chip. The library can be linked in with the
game developers program. The game code makes calls into the library to set
the state of the graphics sub-system and to command it to render triangles. An
interesting feature is that the library was designed so that a modified
version of the library could be loaded before the game, and sub-routine calls
could thereby be rerouted into the modified driver. This was the key to our
being able to trap the calls from the game applications.
Once we had a good understanding of the architecture of the driver and library,
we had to place the logging points at the appropriate locations in the code.
Unfortunately, the S3 toolkit has multiple points of entry, so it was
somewhat tricky to find the best location to place the logging code. If it
was placed in too many locations, you got redundant information, and if
placed in too few locations you didn't receive all of the data from the game.
Eventually we were able to suitably instrument the code to reliably dump
all of the information we needed (see the Graph
Descriptions for exactly what was logged).
In looking at the driver code, we were also able to figure out a way
to take a screen dump out of the graphics buffer everytime the
application signaled it was ready for the next frame by waiting for
the vertical synchronization signal to indicate that. This allowed
us to create MPEGs of the sequences we eventually used for our
statistics calculation.
The one problem we ran into at this stage was with Descent II, which did not
seem to have any frame boundaries. Not only did this eleminate the
possibility of generating frame based statistics for Descent II, but also
made it impossible to get frame dumps since we could not determine when to
take them. We were unable to solve this problem, but did determine that
Descent II was checking the standard VGA vertical synchronization port
on its own to determine frame boundaries. On account of this problem, the
Descent II statistics are not as complete as for the other two games.
Phase 2
Once logging was working fairly well, we began to work on the parser program
to generate the statistics from the log file. This was pretty
straight-forward brute force work to crank through the output logs and
generate the required data. All of the graphs that are created are documented
on the Graph Description page. The complete
source code is also available and should be compilable
on most systems (it has been tested on SunOS, Linux and DOS using Watcom C/C++
10.5). The complete log files are available for downloading from the sections
for each game, and the parser program can be used to generate statistics for
different sequences than we analyzed.
The only difficulty that we ran into with the parsing had to do with texture
statistics. The S3 toolkit requires the individual application to manage the
movement of textures between main memory and the graphics board, and only
requires that a pointer be passed so that it knows what texture to use in
rasterizing the triangles. On account of this, we were only able to see
when a new texture was switched to, but could not determine if it was the
same one that had been at that address locally. In Actua Soccer, it is clear
that the game is doing extensive texture management, since it only uses three
distinct texture addresses.
Phase 3 - Synthesis
The final step was using the tools completed in the previous phases to
generate the real data. We selected the demo sequences from the games to
use as our test sequences, since they tend to have a lot of action, and
are fairly easily repeatable. Each game runs at a resolution of 640x480 in
16 bit color (ARGB 1555). The textures are all in the same color format and
range in size from as big as 256x256 in Actua Soccer to 32x32 in Terminal
Velocity. All of the games also use the 4TPP filtering mode to smooth out
pixels which are magnified at close distances. Each game sequence was
several minutes long, 1':50" for Descent II, 2':15" for Terminal Velocity,
and 4':00" for Actua Soccer. For Actua Soccer and Terminal Velocity we
created the full set of graphs, including "dot statistics" for frames 40
through 50, and a MPEG movie of the trace sequence. As mentioned earlier,
for Descent II we could not find frame boundaries, and thus were not able
to do per frame statistics, dot statistics, or generate a MPEG. We have
video taped the Descent II sequence and hope to have it digitized soon.
Since our goal was generating the statistics, we have not had much chance
to analyze the results. Some things are very noticeable, however. One
example is the use of Painters algorithm to display triangles that are
further away at the beginning of the frame. This is visible in these W-Ave
dot statistics from both Actua
Soccer and Terminal Velocity
(recall that the dot statistics were taken over 11 frames and note that
the periodicity is close to 11).
Another thing that is evident is the typical exponential distribution of
triangle sizes which can be seen in these graphs from
Descent II,
Actua Soccer,
and Terminal Velocity. One final
thing of note was the dot statistics for types of triangles rendered. Here
they are for Actua Soccer and
Terminal Velocity. These
clearly show that within one frame textured triangles are done in a separate
batch than gouraud shaded triangles (notice there are exactly 11 red bands for
the 11 frames in this sequence).
Conclusions and Future Directions
Overall we are pleased with the amount of information we were able to extract
from these games. With the ground work done, it should be possible to
create a driver in the future that will allow statistics to be collected from
games that adhere to the soon to be standard Direct3D interface. Hopefully the
statistics, trace files, and tools we have encapsulated here in this web site
will also serve as the basis for some more in depth analysis of the nature
of current 3D Dos games.