Mike Houston

Mike Houston

Contact Info
Email: mhouston at graphics dot stanford dot edu
I graduated from Rio Americano High School in 1997 and then moved on to receive my BS in Computer Science with honors from the University of California, San Diego (UCSD) in 2001. During my time at UCSD, I worked at the San Diego Supercomputer Center in the Visualization Group. I came to the Stanford University Computer Science department in the fall of 2001 and began pursuing my MS. I received my MS in 2003 and transitioned to the PhD program. My time at Stanford was spent in the Stanford Graphics Lab under the advisement of Pat Hanrahan. My research for 2006 and 2007 was supported by a generous Intel PhD Fellowship. I completed the defense of my dissertation in October 2007, and handed in my dissertation in March 2008. I spent several years at ATI and then as an AMD Fellow in the Advanced Technology Development Group working on hardware and software for heterogeneous computing. I am currently a Principal Engineer for mobile and cloud computing at Nvidia.

Former Research Projects


Sequoia is a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines with different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within the machine. We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and have demonstrated efficient performance running Sequoia programs on both of these platforms.

Folding@Home - GPGPU

I've been lucky enough to help the Folding@Home folks get the first GPU based client out. This has been a long time coming and I'm glad they had the patience to keep at it and keep working with Brook and GPGPU programming. I especially want to thank Vishal for the hard work adapting algorithms to get the code running well. I also want to thank ATI for all the support getting things up and running. The current estimate is that a R580 is 20-40X a single core 2.8GHz Pentium4. At these rates, we can now do in close to a month what used to take years of simulation. See current stats here.


BrookGPU is a compiler and runtime implementation of the Brook stream programming language which provides an easy, C-like programming environment for today's GPU. As the programmability and performance of modern GPUs continues to increase, many researchers are looking to graphics hardware to solve problems previously performed on general purpose CPUs. In many cases, performing general purpose computation on graphics hardware can provide a significant advantage over implementations on traditional CPUs.


GPUBench is a benchmark suite designed to analyze the performance of programmable graphics processors in areas of particular importance to general purpose computation. More so than being a measuring stick for GPU performance, GPUBench is intended to be a tool for developers of GPU-accelerated applications, since a more complete understanding of machine capabilities will dictate better software design decisions. Information included in GPUBench reports is useful in accurately estimating the performance of shader programs as well as determining if a particular GPU is a viable target platform for a particular computation.


Chromium is a system for interactive rendering on clusters of workstations. It is a completely extensible architecture, so that parallel rendering algorithms can be implemented on clusters with ease. Various parallel rendering techniques such as sort-first and sort-last may be implemented with Chromium. Furthermore, Chromium allows filtering and manipulation of OpenGL command streams for non-invasive rendering algorithms. Chromium runs on Microsoft Windows and several types of Unix such as Linux and IRIX.

Graphics Clusters

I've spent a great deal of time researching, developing, and building graphics clusters, including helping design several research and production clusters. I have also given many talks on building graphics clusters.


Raptor is a parallel volume renderer written for use with Chromium. Raptor provides an example of a sort-last parallel application utilizing Chromium's distributed event model, CRUT, and compositing systems, such as the binaryswap SPU.

NPACI Vistools

While at SDSC, I worked on the NPACI VisTools. NPACI's VisTools are a collection of software libraries and applications that allow users to create high resolution visualizations of large complex scientific data. Current applications let users interactively navigate large volumes on their desktop and do full color perspective volume rendering. The toolkits are designed to work with data sets that are too large to fit in memory using out-of-core techniques.


OpenMM 4: A Reusable, Extensible, Hardware Independent Library for High Performance Molecular Simulation

Peter Kenneth Eastman, Mark S. Friedrichs, John Damon Chodera, Randall J. Radmer, Christopher M. Bruns, Joy P. Ku, Kyle A. Beauchamp, Thomas J. Lane, Lee-Ping Wang, Diwakar Shukla, Tony Tye, Michael Houston, Timo Stich, Christoph Klein, Michael R. Shirts, and Vijay S. Pande

Journal of Chemical Theory and Computation (October 2012) DOI: 10.1021/ct300857j

A practical visualization strategy for large-scale supernovae CFD simulations

Derek K. Gerstmann, Toby Potter, Michael Houston, Paul Bourke, Kwan-Liu Ma, Andres Wicenec

SIGGRAPH Asia 2011 Sketches

Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Jayanth Gummaraju, Laurent Morichetti, Mike Houston, Ben Sander, Benedict R. Gaster, Bixia Zheng

PACT 2010

Accelerating Molecular Dynamic Simulation on Graphics Processing Units

Mark Friedrichs, Peter Eastman, Vishal Vaidyanathan, Mike Houston, Scott LeGrand, Adam Beberg, Daniel Ensign, Christopher Bruns, Vijay Pande

Journal of Computational Chemistry (February 2009) DOI: 10.1002/jcc.21209

Accelerating Molecular Dynamic Simulation On The Cell Processor And Playstation 3

Edgar Luttmann, Daniel L. Ensign, Vishal Vaidyanathan, Mike Houston, Noam Rimon, Jeppe land, Guha Jayachandran, Mark Friedrichs, Vijay S. Pande

Journal of Computational Chemistry (January 2009) Volume 30, Issue 2, Pages 268 - 274

A Closer Look at GPUs

Kayvon Fatahalian and Mike Houston

Communications of the ACM. Vol. 51, No. 10 (October 2008)

A Tuning Framework for Software-Managed Memory Hierarchies

Manman Ren, Ji Young Park, Mike Houston, Alex Aiken and William Dally

PACT 2008

A Portable Runtime Interface For Multi-Level Memory Hierarchies

Michael C. Houston

Stanford Computer Science Ph.D Dissertation

GPUs: A Closer Look

Kayvon Fatahalian and Mike Houston

ACM Queue. March/April. 2008

GPU Computing

John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips

Proceedings of the IEEE, 96(3), March 2008

A Portable Runtime Interface For Multi-Level Memory Hierarchies

Mike Houston, Ji-Young Park, Manman Ren, Timothy Knight, Kayvon Fatahalian, Alex Aiken, Bill Dally, and Pat Hanrahan

PPoPP 2008

N-Body Simulations on GPUs

Erich Elsen, V. Vishal, Mike Houston, Vijay Pande, Pat Hanrahan, Eric Darve

Tech Report - arXiv:0706.3060

Interactive k-D Tree GPU Raytracing

Daniel Horn, Jeremy Sugerman, Mike Houston, and Pat Hanrahan

I3D 2007

Compilation for Explicitly Managed Memory Hierarchies

Timothy Knight, Ji-Young Park, Manman Ren, Mike Houston, Mattan Erez, Kayvon Fatahalian, Alex Aiken, Bill Dally, and Pat Hanrahan

PPoPP 2007

Sequoia: Programming The Memory Hierarchy

Kayvon Fatahalian, Timothy Knight, Mike Houston, Mattan Erez, Daniel Horn, Lark-Hoon Leem, Ji-Young Park, Manman Ren, Alex Aiken, Bill Dally, and Pat Hanrahan

Supercomputing 2006

N-Body Simulation on GPUs

Erich Elsen, Mike Houston, V. Vishal, Eric Darve, Pat Hanrahan, Vijay Pande

Supercomputing 2006

ClawHMMer: A Streaming HMMer-Search Implementation

Daniel Horn, Mike Houston, and Pat Hanrahan

Supercomputing 2005

Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware

Tim Foley, Mike Houston, and Pat Hanrahan

Graphics Hardware 2004

Visualizing Dynamic Architectural Environments

Mike Houston, Chris Niederauer, Maneesh Agrawala, Greg Humphreys

Communications of the ACM, August 2004

Brook for GPUs: Stream Computing on Graphics Hardware

Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan


Interactively Guided Volumetric Segmentation Using Programmable Graphics Hardware

Anthony Sherbondy, Mike Houston, and Sandy Napel

Radiological Society of North America 89th Scientific Sessions, November 2003

Fast Volume Segmentation With Simultaneous Visualization Using Programmable Graphics Hardware

Anthony Sherbondy, Mike Houston, and Sandy Napel

IEEE Visualization 2003

Non-Invasive Interactive Visualization of Dynamic Architectural Environments

Christopher Niederauer, Mike Houston, Maneesh Agrawala, and Greg Humphreys

ACM SIGGRAPH 2003 Symposium on Interactive 3D Graphics

Chromium: A Stream Processing Framework for Interactive Rendering on Clusters

Greg Humphreys, Mike Houston, Yi-Ren Ng, Randall Frank, Sean Ahern, Peter Kirchner, and James T. Klosowski






Standard Portable Intermediate Represenation for OpenCL

Workgroup Chair



OpenCL D3D sharing

10 and 11 sharing extensions

OpenCL AMD Media Operations




Beyond Programmable Shading



Beyond Programmable Shading



CS448s: Beyond Programmable Shading


Stanford Spring 2011

Beyond Programmable Shading





CS448s: Beyond Programmable Shading


Stanford Spring 2010

Beyond Programmable Shading



Beyond Programmable Shading



GPGPU: General-Purpose Computation on Graphics Hardware



CS448 - Real-Time Graphics Architectures

Teaching Assistant

Stanford, Spring Quarter 2007

GPGPU: General-Purpose Computation on Graphics Hardware


Supercomputing 2006

CS315a - Parallel Architecture and Programming

Teaching Assistant

Stanford, Spring Quarter 2006

CS240 - Advanced Topics in Operating Systems

Teaching Assistant

Stanford, Winter Quarter 2005


Workshop on Parallel Visualization and Graphics 2003

IEEE Visualization 2003

Workshop on Parallel Visualization Architectures and Chromium 2004

IEEE Visualization 2004

Public Talks

Advanced Programming (GPGPU)

Stanford CS448: Real-Time Graphics Architectures (2007)

General Purpose Computation on Graphics Processors (GPGPU)

ATI HD 2000 Series Launch, Tunis, Tunisia (2007)

High-Level GPGPU Languages

IEEE Supercomputing 2006

Understanding GPUs Through Benchmarking

IEEE Supercomputing 2006

GPGPU Cluster Computing

IEEE Supercomputing 2006

Sequoia: Programming the Memory Hierarchy

ORNL/Summit on Sofware and Algoritms for the Cell Processor (2006)

ATI X1k Series

UC Davis EEC 277 (Graphics Architecture) (2006)

General Purpose Computation on Graphics Processors (GPGPU)

ATI X1000 Series Launch, Ibiza, Spain (2005)

Designing Graphics Clusters

IEEE Visualization 2004 - Parallel Visualization Workshop

Class Projects

Aperture Based High Dynamic Range Imaging

CS448 - Digital Photography

Realistic Indoor Daylight Illumination

CS348B - Computer Graphics: Image Synthesis Techniques

Compression in the Graphics Pipeline

CS448a - Real-Time Graphics Architectures