This page list topics in computer science that interest me and that I've written up in more detail. It also hosts my code library and all the major projects I've worked on over the years. There are several projects on the publications page that are not listed here.
Provincial: A kingdom-adaptive AI for Dominion
This is an AI for the popular card game Dominion. By specializing its strategy to a specific set of supply cards, Provincial is able to play at a very high skill level and develops strategies comparable with experienced Dominion players. The resulting strategies can also be visualized to make it easy to see how the AI will play a given kingdom. On the downside, however, the AI must train specifically on each new kingdom which can take several minutes before a good strategy is developed. To alleviate this problem, the AI comes with a large number of pre-computed kingdoms you can play against.
General Game Learning
AIs have been written that can play a variety of specific games such as Chess or Tetris. However all these AIs start from the assumption that a human has already done the very challenging task of taking a game and encoding its rules into a format that can be parsed by a computer and solved by a standard algorithm such as a minimax tree. General game learning looks at the problem of developing a general AI that can learn to play a game just by interacting with it. The AI must break the game down into meaningful components, learn rules for how these components interact with each other and evolve through time, learn the goal of the game, and develop a strategy for achieving the goal by playing the game. This idea applies to all types of games including both real-time (Mario) and turn-based (Freecell), discrete (Tetris) and continuous (Space Invaders), single player (Zelda) and competitive (Chess).
All of the code I write by myself uses a common framework. It supports basic operations like vector arithmetic and string manipulation, as well as more complex things like abstracting a bunch of linear solvers, compressible file streams, audio playback, KD-trees, image processing, etc. It also abstracts OpenGL and Direct3D and can make graphics programming much simpler.
Direct3D 9 API Hooks & Interceptor
Direct3D is the API that is used by virtually all 3D games on Windows. This project stands as an intermediary between any application that uses D3D9 and the operating system. It intercepts all D3D9 calls and sends them off to a separate DLL that can process and react to the command stream. This can be used to accomplish a wide variety of tasks ranging from the very simple (turn all players bright blue or make all walls translucent) to the very complex (insert objective waypoints in 3D, capture all 3D models and textures in the program, or fully automate playing the game.) I have iterated on its design several times and the current system is very good at facilitating reverse-engineering of the host application's graphics pipeline. This API interceptor is the foundation for the Starcraft 2 automated player (below,) and the content capture system that is part of my graduate thesis.
Starcraft 2 Automated Player
This is an AI I wrote to play Starcraft 2. The AI works by intercepting the Direct3D command stream using the Direct3D 9 API Interceptor. It then parses and interprets this command stream to convert it to higher level events, such as "a hydralisk belonging to the Red player was rendered at (431, 509)". At the end of each frame all of this information is aggregated and eventually the AI decides on an action to take which is sent back to the game by simulating keyboard and mouse commands. Initially I wrote this AI to play Warcraft 3, Defense of the Ancients, and (for practice with D3D9) Spider Solitaire, and have significantly revamped it for Starcraft 2. The AI is quite complicated, and provides a way of looking at the details of the Starcraft 2 graphics engine. For example, you can easily break a complete frame down into a set of render calls. This is by a decent margin my favorite project and I had to design a lot of complex machinery during its development, such as the API interceptor and a HLSL bytecode to C++ translator (which produces results like this.)
This is a puzzle game I made with some friends. It uses DirectX (so is Windows only.) It has some pretty challenging and fun levels, and a cool soundtrack and artwork. Feel free to send feedback to my email.
The Kinect is an attachment for the Xbox 360 that combines a depth and color camera. With a little tweaking it can also be attached to PCs to let you develop all sorts of interesting applications like hands-free computing and cheap but highly reactive robots. I have started a few projects on the Kinect to get a feel for the accuracy and resolution of the depth sensor, and am currently working on a painting application that lets you use an arbitrary region of space as a canvas.
Context-Based Search for 3D Models
At present, the task of designing a large 3D scene containing a realistic density of models can be extremely time consuming because of the sheer number of models that have to be inserted. Rather than having the user issue a large number of independent queries for models, the goal of this work is to return a good set of models based on what the user has already modeled. Using the supporting context to guide the search, the algorithm can return reasonable models even if the user doesn't specify keywords for what kinds of models they are looking for. In order to train this search engine, we have to learn what constitutes a good scene composition from an existing database of scenes, such as Google 3D Warehouse, World of Warcraft, or Second Life. The algorithm then learns what kinds of relationships are found in these scenes and recommends models that mimic the design of its training database. Our group has published two papers in this area, one at SIGGRAPH Asia 2010 and the other to appear at SIGGRAPH 2011.
GPUView is a program I wrote at Microsoft to help facilitate the development of drivers and investigate and improve game performance on Windows. It is used to understand the performance interaction between the operating system, the graphics applications on your computer, the graphics kernel, and the graphics driver. It is still widely used today to debug performance bottlenecks in the graphics pipeline and offers a considerably different view than standard profiling or the Direct3D analysis tool, PIX. Its development has been taken over by full-time employees, and since Windows 7 it was been officially released as part of the Windows SDK.
This program is used to capture audio and video to a file from a window (or the entire desktop.) It uses the H.264 codec and is very high quality (although not lossless.) Unfortunately, it uses a codec that is only available on Windows 7 or later. It is extremely lightweight: less than 200k, two files, and does not need to be installed. It also serves as a good demonstration of audio capture in Windows, how to configure the H.264 codec in the Windows Media Foundation API, and how to integrate this with the audio capture routines.
Scene reconstruction is the problem of reconstructing a 3D mesh from a set of 2D images. This was my attempt at solving this problem using several variants on state-of-the-art techniques. It even ran on a GPU so it takes several minutes to generate the mesh instead of several days. My approach did relatively well on the standard Multi-View Stereo Dataset but I have yet to formally publish my approach.
Image Segmentation and Classification (Results)
Image segmentation algorithms break up an image into a set of regions that are more amenable to analysis than the original pixel grid. These regions can be used by an image classifier to place the model into a set of categories --- for example, we might identify whether an image contains a human face or not, and what the gender of the face is, if present. This page documents my approach to image segmentation which is based on varitational shape approximation. This segmentation is then used in an image classification algorithm based on graph kernels that is described in this CVPR paper. This code is also part of my SIGGRAPH 2011 paper.
Micropolygon Rendering (Paper)
This is the first project my group worked on at Stanford. Current graphics hardware is optimized for the case where most polygons span a large number of pixels on the screen; the goal of this project is to look at alternative algorithms that are efficient even when most polygons are at most a few pixels in size. We are working on changes that we hope will facilitate achieving photorealistic quality renderings in real-time. We feel this field will receive even more attention as many-core hardware like Larrabee becomes available. Our group has published three papers in this area; my focus is on improving the quality of micropolygon tessellations and was published in a paper that I presented at SIGGRAPH Asia 2009.
Variance Shadow Maps
Shadows add a significant amount of realism to a scene by conveying information about the geometry of objects and the location and characteristics of light. Many methods for rendering shadows, such as ray tracing, are not well suited for modern graphics hardware and require a lot of computation to produce soft shadow boundaries. Shadow mapping is a very intuitive approach to light mapping that can be efficiently implemented on current hardware. However, classic shadow mapping suffers from severe aliasing, resulting in blocky images, and the shadows are unrealistically sharp. Percentage Closer Filtering is an approach that can make antialiased soft shadows, however the filtering must be done in screen space and making it look good is extremely expensive. Variance Shadow Maps is an easily implemented extension to classic shadow mapping that allows you to prefilter the shadow maps. An efficient, separable kernel can be used and the final render with the shadow map requires only a single texture fetch per shadow map. This lets you get large filters radii that would be much more expensive to achieve with traditional percentage closer filtering.
Marching cubes is an algorithm for extracting a polygonal mesh of an isosurface from a three-dimensional scalar field. Although the meshes it generates have extremely poor aspect ratios, it is very easy to implement and run.
This page provides my supervised learning C++ library, which implements several of the more useful binary and multiclass classifiers. Classifiers include support vector machines, decision trees, and nearest neighbor, along with several meta-classifiers such as AdaBoost and bagging. I mostly use this library for convenience; my implementations are not nearly as polished and optimized as the implementations found in Weka or LIBSVM.
Japanese and Chinese Text Reader for Tourists
This is a program I wrote after visiting Japan for SIGGRAPH Asia 2009. Its goal is to let non-Japanese speakers read Japanese signs and menus and is ultimately targeted at mobile platforms (although currently is just an app for Windows PCs.) The user takes a picture of Japanese text with a camera, selects the region they are interested in, and the program uses machine learning to transcribe the text into Unicode. The program then either does a piece-wise dictionary lookup on the text or runs it through Google Translate. The dictionary uses a probability distribution over the set of possible characters and words to make it robust to characters that are misclassified.
Simulating cloth, or any other thin-shell material, is a rich topic in which a wide variety of concepts come together.
Subdivision is a powerful algorithm for constructing smooth surfaces.
Web Page Preprocessor
This is the HTML preprocessor I used to make this webpage. It helps avoid duplicate code between website pages, and greatly simplifies putting code online. See the about page for more information on this specific webpage.
Texture synthesis attempts to generate an infinite amount of texture given a small input sample of the texture (acquired, for example, from a digital camera.)
This page documents my entry for the CS348b rendering competition. I modeled chatoyancy, an interesting visual effect defined by a characteristic band of reflected light due to aligned grooves in the gem. It is most often seen in the tiger's eye gemstone. This page is not representative of my best work in raytracing but I've kept it around for reference.Final Image
Video - Spin
Video - Gamma Ramp