CS348C Final Report
Visualizing Memory Hierarchies

John Gerth
Nolan Glantz
Kekoa Proudfoot

Abstract

We have created several visualizations for studying cache and memory hierarchies, including a number cache visualizations, a conflict visualization, and a number of main memory visualizations. The final result of this work was a composite visualization consisting of five subviews. Based primarily on our initial visualizations, this final visualization can offer a great deal of insight into a program's memory access behavior during its execution.

There are quite a few reasons to visualize memory hierarchies. We chose to focus on creating views which would aid in program analysis, though these views also have other benefits.


Introduction

There are several potential benefits of a system for visualizing memory hierarchies. The primary uses of such a system fall into three main categories: education, program analysis, and system design.

Education

Visualization of memory hierarchies can be used in an introductory computer architecture class as a way to show students how caches work. Most computer architecture textbooks use a static block diagram approach to show cache and memory structure (for example, [Hennessy and Patterson 1994]). It would be much easier to see how associativity works and why cache conflicts happen with an animated visualization.

It would also be easier for students to see how a particular program uses memory with a visualization. For example, with our system you can easily see the memory access patterns of matrix multiply algorithms--it is easy to notice the benefits of using a blocking algorithm. The heavy penalties of vertically accessing a 2D array are quite obvious from our visualization, whereas they may not be obvious to inexperienced programmers.

Program Analysis

Seeing how a program is accessing its memory can be extremely useful when writing a program, both for understanding the behavior of the program, and for trying to tune its performance. Some things you may want to know about the program are: Not only can a visualization help you identify such problems, but by seeing a picture of the memory you can more easily determine how to correct any problems you find.

System Design

Visualizations can also be useful to designers of cache systems and memory hierarchies. They can let you see a particular program load on a particular cache, for example, and compare it side-by-side with a cache that has different parameters. This can let you see the benefits of different design parameters, such as adding associativity, for example.


Logistical Issues

Data Collection and Cache Simulator

We originally intended to use SimOS to get our memory trace data. We wanted a history of all the memory accesses of a program, including type of access, address, and result (such as, "Data Read Miss in the L1 Cache"). Eventually SimOS will be the ideal way to get trace data, because the traces would be the result of results realistic simulations and traces could be obtained from virtually any program. However, because we were not able to get precisely the data we wanted from SimOS, and because of the time it would take to get the system working the way we wanted, we tried a different approach.

We wrote our own cache simulator, which can simulate instruction, data, and/or unified caches; most cache parameters are adjustable. With this simulator, all we needed was a set of addresses (the memory accesses of a program) with the type of access for each (read, write, instruction fetch). We run these traces through the simulator which approximates the cache and memory behavior of the program. To simplify the simulation, we ignored such second order effects as page tables and context switches.

We used the .din file format which is used with dinero, as provided with Hennessy and Patterson, 1994. It is a human readable listing of addresses and their access types. We wrote a perl script which converts .din files into .t files, a more compact binary file format we defined to contain the necessary information for the cache simulator. We can run this conversion program on any .din file and it creates a binary .t file which we can use.

We obtained dinero traces for such programs as spice, cc, and tex, which were provided with Hennessy and Patterson's book. We wrote, and therefore intimately understood, some sample programs; understanding the structure of the programs and their memory reference behavior was helpful in creating the visualizations, and in tying their outputs back to the underlying programs. These sample programs include the CS248 rasterizer and a blocking and non-blocking matrix multiply program. We annotated the key inner loops and memory intensive sections of the programs to produce trace files, in the dinero file format, as their output.

Address Compaction

Most of our visualizations include a view of a program's entire memory space in main memory. Since the entire memory space is normally too big to fit on the screen, we decided to compact the addresses referenced by each program to eliminate the blocks of memory which the program did not use.

We divide a program's virtual main memory space into blocks of 64KB. All references in the first such block which is actually used get mapped to have a zero in their high 16 bits, references in the second used block of 64KB get mapped to have a 1 in the high 16 bits, etc. In this way, any block of 64KB which the program does not use will not be part of the address space for purposes of our visualizations. As long as the cache is less than or equal to 64KB, the simulation is not affected. The only effect is that the visualization looks more compact.


Preliminary Visualizations

Cache Visualizer ("Cachevis")

This was an animated display, showing where memory references occurred in the cache. We laid out the cache on screen as a square 2D array, wrapping line by line. Each memory word was a cell of the display. We lit up a cell when it got accessed, using red for a miss and blue for a hit. The colors would stay lit briefly and then gradually fade to black over time, to avoid having them flash which we thought would be distracting. Also, with fading, the intensity of a cell's color corresponds to the length of time since it was last accessed. See Figure 1 for a snapshot of Cachevis.

Cachevis
Cachevis
Figure 1: A snapshot of Cachevis, running on the trace file for the program tex.

With this display we could clearly see periods of linear accesses, as well as sections of the cache which did not get used much. However, although we were showing misses, it was difficult to see conflicts, and there was no way to associate the displayed accesses back to where they came from in memory. Also, when a miss is followed by a hit in the same location, the blue for the hit adds to the red for the miss. This creates pinkish-purple hues, which do not have a useful meaning--they only show that a location had some combination of hits and misses. Overall, Cachevis was not that beneficial.

Conflict Visualizer ("Conflicts")

This was a static x-y scatter plot of memory addresses, with each axis representing all of main memory (aggregated into uniform chunks of addresses). The display showed which addresses got knocked out of the cache by which addresses during the course of the running of a program. For each point plotted, the x-value was the address of a memory block which got knocked out of the cache, and the y-value was the address of the access which knocked it out. Pixels were lit up if there was a cache conflict somewhere within the particular aggregated chunk of main memory they represent. See Figure 2 for a snapshot of Conflicts.

Conflicts
Conflicts
Figure 2: A snapshot of Conflicts, running on the trace file for the CS248 rasterizer program.

This display was not terribly useful--it was difficult to glean much information from it. The intent was to highlight areas of conflict, but this was often not clear from the display. One problem was that there was too much aggregation; perhaps zooming in would make it more useful. But even more of a problem was that it was hard to relate the conflicts shown in the display back to the program at hand.

Memory Visualizer - Fading ("Memvis1")

We took the idea of Cachevis, and used it to show all of main memory instead of just the cache. As in Cachevis, we animated memory references, this time showing at once both where they occurred in main memory and where they occurred in the cache.

We laid out main memory as a 2D array, wrapping line by line. The width of the display was the size of the cache, so one row represented one entire cacheful. Each vertical column corresponded to a location in the cache--cells in the same column would conflict in the cache. The display was a fixed width (depending only on the size of the cache), and the height was determined by how much memory the program in question used. This was the first display which used the trace files with compacted addresses; we only showed the rows of the 64KB blocks which actually got accessed. As in Cachevis, we lit up a cell when it got accessed, using red for a miss and this time green for a hit (green is easier than blue to distinguish from the black background, and yellow is easier than purple to distinguish from the red misses). As in Cachevis, the colors would stay lit briefly and then gradually fade to black over time, so that intensity of a lit cell corresponds to length of time since last access. See Figure 3 for a snapshot of Memvis1.

Memvis1
Memvis1
Figure 3: A snapshot of Memvis1 at one-half its original height and width, running on the trace file for the 64x64 non-blocking matrix multiplier program.

This visualization clearly showed many patterns in the memory references, including linearity and loops. The key was that you could see the cache behavior within the context of main memory. It was also possible in many cases to discern the underlying data structures, such as arrays, matrices (2D arrays), and the frame buffer and image textures (in the case of the CS248 rasterizer). You could also pick up areas of concentrated misses and conflicting structures; quite a bit of red shows up in the matrix multiply program when one matrix is being accessed vertically. Overall, this display was fairly useful.

Memory Visualizer - Graying ("Memvis2")

This was exactly the same as Memvis1, with a minor adjustment--lit cells no longer simply fade over time. When a cell is accessed it gets lit as before, but as it is lit any cell in the same column which happens to be lit gets turned to a dull gray. In this way, there is at most one row lit up for each column in the display. The result is that the current state of the cache is shown in color. When a piece of memory gets knocked out of the cache, it is turned to gray. So, the gray shows the history of the memory accesses of the program. The gray cells also fade to black over time, showing progression. See Figure 4 for a snapshot of Memvis2.

Memvis2
Memvis2
Figure 4: A snapshot of Memvis2 at one-half its original height and width, running on the trace file for the CS248 rasterizer program.

In addition to the benefits described above for Memvis1, you now get a sense for what is in the cache right now, as well as what was in the cache at some point but got knocked out. You can also tell when it got knocked out. This display is really helpful for seeing conflicts, since your eye notices when things turn from colored to gray. It is also useful for seeing how efficiently you are using memory, since when you access something you can see if it had been accessed before or not (depending on if it was black or gray when it gets lit up). This display was quite useful.


Final Visualization

As a culmination of our previous efforts and accumulated ideas, we designed a final visualization which merged the best of these with some additional ideas we had thought of but hadn't implemented yet. We made one combined display out of our main memory visualization, a revamped cache visualizer, and three new displays: a cache activity strip chart, a memory activity bar graph, and a memory activity strip chart. Together, the five displays provided us with a global view of main memory, a summary view of current cache activity, a history view of cache activity, a summary view of current main memory activity, and a history view of main memory activity. We found that these displays, when combined with a modest number of controls to change their behavior, allowed us to generate a number of interesting and revealing views of our memory traces. See Figure 5 for a snapshot of our final visualization.

Final Visualization
The Final Visualization
Figure 5: A snapshot of our final visualization at one-half of its original height and width. The Main Memory Display is located in the upper-left corner. Below the Main Memory Display is the Cache Summary Bar Graph, and below that is the Cache Summary Strip Chart. To the right of the Main Memory Display is the Memory Summary Bar Graph, and to the right of that is the Memory Summary Strip Chart. The alignment of the two bar graph displays and the two strip chart displays relative to the Main Memory Display is signigicant: each column in the Main Memory Display is related to the corresponding column in the two cache summary displays, and each row of the Main Memory Display is related to the corresponding row in the two main memory summary displays. The operation depicted is a matrix multiplication, the details of which will be discussed in the following sections.

Having provided an overview of our final visualization, we proceed to describe each of the five displays in detail.

Main Memory Display

The primary component of our final visualization is the Main Memory Display, which provides a global view of the memory behavior of the traced program. Essentially the same as the main memory visualizations discussed previously (Memvis1 and Memvis2), the Main Memory Display contains all of memory laid out as a 2D array, with width proprotional to the size of the cache. Memory events (accesses, hits, misses, reads, writes, or memory delay) are plotted as bright bars which fade over time, allowing one to observe memory behavior trends over an adjustable period of time. The primary difference between the Main Memory Display in our final visualization and the views in Memvis1 and Memvis2 is a simplified color scheme, with the frequency of a single type of event displayed using a decaying shade of white. (Optionally, one may specify an additional type of event, which will be plotted in color on top of the white events.) See Figure 6 for an example of the Main Memory Display for a matrix multiplier.

Main Memory Display
The Main Memory Display
Figure 6: The Main Memory Display of our final visualization, which corresponds to the complete display shown in Figure 5. Depicted is a matrix multiplication operation, where the first source matrix is accessed in row major order and the second source matrix is accessed in column major order. The white bar at the top of the display shows accesses to the first source matrix, while the vertical red bars show accesses to the second source matrix. The latter are red because of the conflict misses caused by accessing the matrix in column major order. The bright red mark on the white bar representing the first source matrix is due to conflict misses with the second source matrix. The short, bright white mark toward the bottom of the display is the result matrix, which is being mistakenly read and written for every pair of multiplies during the computation.

Cache Summary Bar Graph

The second most important view of our final visualization is the Cache Summary Bar Graph, which is actually a modified version of Cachevis. The Cache Summary Bar Graph can be thought of as a summary of each column of the Main Memory Display, and because each column of the Main Memory Display represents one set in the cache, columns of the Cache Summary Bar Graph actually summarize information about specific sets in the cache.

The Cache Summary Bar Graph displays the same type of events as the Main Memory Display, namely memory accesses, reads, writes, cache hits, and cache misses, and plots frequency of occurrence against cache location as a bar graph. When a memory event occurs, the height of the bar in the corresponding column is increased, and as time passes, the height of the bar decays at a user-adjustable rate. As with the Main Memory Display, the user has the option of displaying a single event type in white or a single event type as a color on top of the total number of accesses displayed in white. In the latter case, colored bars reflect the percentage of the total number of accesses which are of the specified type. See Figure 7 for a snapshot of the Cache Summary Bar Graph.

Cache Summary Bar Graph
The Cache Summary Bar Graph
Figure 7: The Cache Summary Bar Graph display of our final visualization, which corresponds to the displays of Figures 5 and 6. The red bars correspond to recent misses caused by column major accesses to the second source matrix. The wide white peak which tapers off to the left correposnds to row major accesses to the first source matrix. These accesses primarily hit in the cache. The narrow white peak which leaves the top of the display corresponds to result matrix accesses.

Cache Summary Strip Chart

The Cache Summary Strip Chart attempts to assign the temporal dimension to a spatial axis, with the effect of preserving a history of recent cache access patterns. The display is derived by counting events during a specified and adjustable time interval and outputting a row of pixels after the time interval has elapsed. Each column in the row of pixels is aligned to the corresponding column in the Main Memory Display and Cache Summary Bar Graph Display above. The display scrolls downward as each new row is added to the top of the display. The intensity of the each pixel is set to be proportional to the number of events counted during the time interval. An alternative display can be derived by sampling the cache access bar graph after the specified time interval has elapsed. This display tends to smear columns of the graph by an amount proportional to the number of events which occur in those columns.

As with the Main Memory Display and the Cache Summary Bar Graph displays, the Cache Summary Strip Chart allows one to choose to show a variety of event types using either a gray scale for a single event type, or a mixture of color and gray for a single event type paired with the total number of accesses. See Figure 8 for a snapshot of the Cache Summary Strip Chart.

Cache Summary Strip Chart
The Cache Summary Strip Chart
Figure 8: The Cache Summary Strip Chart display of our final visualization, which corresponds to the displays in Figures 5, 6, and 7. Note that time moves top to bottom. The red lines represent misses caused by column major accesses to the second source matrix, the parallel white lines correspond to row major accesses to the first source matrix, and the single white line parallel to the red lines corresponds to the reading and writing of the destination matrix with every multiplications during the computation. A subtle feature to note is the faint pink lines which appear when the computation moves from one row of the first source matrix to the next. Also, the slopes and spacings of the lines reflect the memory access strides used during the computations.

Memory Summary Bar Graph and Memory Summary Strip Chart

The Memory Summary Bar Graph and Memory Summary Strip Chart displays are similar to the Cache Summary Bar Graph and the Cache Summary Strip Chart displays, except that the memory summary displays summarize events which occur in the rows of the Main Memory Display rather than the columns. Effectively, this means that the displays summarize information about particular blocks of main memory rather than information about particular sets in the cache. Other than this one crucial difference and a ninety-degree rotation, the two memory summary displays operate in a manner identical to the operation of the two cache summary displays. See Figures 9 and 10 for snapshots of the Memory Summary Bar Graph and Memory Summary Strip Chart.

Memory Summary Bar Graph
The Memory Summary Bar Graph
Figure 9: The Memory Summary Bar Graph display of our final visualization, which corresponds to Figures 5 through 8. Misses are plotted in red, while all other accesses are plotted in white. At the top of the display is a white bar corresponding to accesses to a single row of the first source matrix, which is being accessed in row major order. These accesses are primarily hits, with an occasional miss caused by conflicts with the second source matrix and the result matrix. The band of red bars just below the middle of the display correspond to conflict misses which occur when the second source matrix is accessed in column major order. The lowest bar, whose length extends off the left side of the display, represents the result matrix, which is being accessed mistakenly after every multiplication during the computation.

Memory Summary Strip Chart
The Memory Summary Strip Chart
Figure 10: The Memory Summary Strip Chart, which corresponds to Figures 5 through 9. Note that time moves left to right. The gray bar at the top of the display represents row major accesses to the first source matrix. The red lines near the middle of the display correspond to column major accesses to the second source matrix. Finally, the bright white band just below the red lines corresponds to accesses to the result matrix. Note the faint red bar within the gray band at the top of the display. This bar corresponds to a new row being accessed in the first source matrix; a similarly shaded band corresponding to the same event can be seen in Figure 8. Also, note the steep slope of the red lines indicating the column major traversal of the second source matrix.


Future Work

While the display just described can reveal an enormous amount about the memory behavior of a particular program, some additional features might greatly increase the usefulness of the visualization. These features are a better use of color, the ability to use brushing to highlight various parts of the display, the ability to zoom in and zoom out of a display, and the ability to relate the displayed information back to something which a programmer can understand. We will now turn to a brief discussion of each of these possible extensions.

Improved Color

The current color scheme has a number of problems. First, the black background, while allowing an easy implementation of the decaying intensities in the Main Memory Display, needs to be made more grey. Second, the current color scheme calls for colors to be mixed by mixing their red, green, and blue components; this is somewhat inappropriate and fails in a number of ways. The solution to both of these problems is to perform a color mapping step sometime during display update.

Brushing

The addition of brushing to the current system will allow one to select a region of memory to be highlighted. Most useful will be the ability to select a range of addresses on the memory summary displays or a range of cache sets on the cache summary displays and have accesses to the locations appear highlighted in the other displays; however, other useful ways of brushing the data can also be imagined. One possible use of brushing will be to track down conflicts. In the current system, if a series of conflicts occurs, it is difficult to determine which memory addresses are conflicting because projections make the answer ambiguous. With brushing, however, it is expected that these ambiguities will be resolved.

Zooming

The ability to zoom will be required if the current display is to scale to programs which use large amounts of memory. Current plans for zooming are to use the empty area in the lower right of the display for a high-level, zoomed-out view of the memory occupied by the program. The display might compress unused addresses, or it might leave unused address spaces blank. It may or may not be scrollable or zoomable. Whatever the high-level display shows, it will allow regions of memory to be quickly selected for display in the detail windows, which will be composed of the five displays currently making up the visualization.

Relating Trace Data Back To Program Structure

The last important feature to be added (at least in the next round of feature additions) will be the ability to relate the displayed information back to something which a programmer can understand, such as a symbol name to identify a data structure or a function name to identify a particular phase of the program's execution. This feature will allow a programmer to better understand the visualization as it progresses.


See Also

For related information, please see the following:

"Animation of Cache Systems," by Nolan Glantz, and
"Visualizing Memory Hierarchies," by Kekoa Proudfoot.

You might also find our project proposal interesting:

"CS348C Project Proposal: Visualizing Memory Hierarchies."

[Added 9/27/98 by Kekoa: If you've read this far, you will almost certainly find the Rivet project of interest to you. It expands on many of the ideas discussed in this report.]


Kekoa Proudfoot .... kekoa@graphics.stanford.edu
Nolan Glantz .... nolan@cs.stanford.edu
December 9, 1996.