:: Rendering Space Art ::
"Small" Final Image (3 hour render on GTX 560). Click for full version.
Space has inspired mankind for ages. In the last century or so, a great deal of inspirational artwork has featured space and the exotic potential that it holds for bustling empires, unknown riches, and devastating warfare. Personally, I love space artwork, and am constantly inspired by the amazing concept art on sites like ConceptShips, SolarVoyager, and even the classic DeviantArt.
Unfortunately, like all artwork, space artwork poses a real challenge for rendering. In particular, artists often portray beautiful spacescapes rich with futuristic planets, hulking ships, and beautifully-lit dust clouds. The problem, from a rendering point-of-view, is that space is not like this. Space is sparse, dark, and overwhelmingly empty. How boring!
Luckily, there's still at least one aspect of space that is quite awe-inspiring: nebulae. Nebulae display some of the most beautiful lighting effects conceivable. Powered by millions of billions of billions of watts of energy (no, seriously...the sun has a luminosity of about 400 septillion watts...), nebulae light up as inconceivable numbers of photons scatter within their dusty innards. Now that...that would be a cool thing to render!
From the beginning, my goal with this project was ill-defined. I knew I wanted to render a nebula. I knew I wanted to build a spaceship. I knew that, if I could create something that would pass as a piece of space concept art (rendered concept art, that is), then I would be happy. Without a reference image, I struggled and wandered in the dark for quite a while, but finally settled on a scene of a ship dramatically emerging from a cloud of dust within a dense nebula. And, of course, I would have to include asteroids for dramatic effect
Although I could have downloaded models to produce the scene above, I decided early on that it would be much more exciting (and challenging!) to attempt procedural construction of the scene geometry. To that end, I developed a simple (but powerful) procedural modeling language that I used to generate my final scene. The language was an invaluable resource in toying with geometric ideas and, ultimately, in producing high-quality ships and asteroids!
Welder: A Language for Procedural Geometry
Welder is the name of the procedural modeling language that I designed for the project. It was intended to be the absolute fastest way to prototype procedural algorithms. Although I didn't write up a full specification for the language (as it's actually quite large and complex in terms of features), I'll highlight a few of the most important features below.
This was one of the main ways of constructing geometry that I used in the development of my algorithms. Cylindrical extrusion creates a primitive that is topologically equivalent to a capped cylinder, but takes as an input a function F(t, h) that is used to determine the radius of each point on the cylinder with height h and angle t.
In welder, the callback function is simply given as the block of code enclosed by the extrusion call, making it incredibly easy to produce geometry in this way.
For example, the following code creates a cylinder of radius 1, made of 10 'stacks' and 10 'slices' of triangles:
extrude 10 10 return 1
While the following creates a cone:
extrude 10 10 return h
Being able to displace all the vertices in a mesh is a very useful operation. Using this, one can, for example, easily perturb a sphere into many cool shapes without having to have a closed-form description of said shapes (in terms of some kind of function like that required for extrusion).
Again, displacement is performed by providing a callback function in the block within the displacement call. Rather than returning something, the displacement call is expected to modify the variables x, y, and z, which are set for each vertex, and the vertex is moved to the new location of the updated x, y, and z variables after calling the callback function.
For example, the following code snippet squares each coordinate of whatever mesh is currently active:
displace set x x^2 set y y^2 set z z^2
To make the language as powerful as possible, it was absolutely necessary to include the ability to parse fully-complex expressions. This was definitely the trickiest part of writing the Welder compiler, but it paid off in the end, because it enabled rapid and natural coding of advanced math operations.
The language supports the following operators: +, -, *, /, ^ (exponentiation), (), and function calls.
Anywhere that a numeric value is expected, it is legal to write an expression.
As a practical example of the language, here's my implementation of the asteroid field algorithm that was used to generate the field for the final rendering. You can see in this code some more advanced usage of mathematical expressions, functions, loops, variable usage, etc.
// Inputs: // SEED : seed value // QMULT : quality multiplier set AF_ROOMX 15 set AF_ROOMY 10 set AF_ROOMZ 10 material asteroid_rock texture rock3.jpg type 1 tile .5 .5 .5 function crater maxsize maxdepth set csx rand(-1, 1) set csy rand(-1, 1) set csz rand(-1, 1) set mag sqrt(csx*csx + csy*csy + csz*csz) set csx csx / mag set csy csy / mag set csz csz / mag set thresh rand(0, sqrt(maxsize))^4 set depth rand(maxdepth, 1) set power rand(5, 30) displace set dist (x-csx)^2 + (y-csy)^2 + (z-csz)^2 if dist < thresh set mult lerp(depth, 1, smoothstep(dist/thresh)^power) set x x*mult set y y*mult set z z*mult function asteroid ast_qual craters sphere ast_qual ast_qual loop i 0 craters-1 call crater .4 .98 displace set total 1 set total total + .8*perlin(x, y, z, seed)^2 + .8*perlin(x, y, z, seed+52)^2 loop i 0 5 set freq 2^(i) set amp 1 / (freq^1.7) set power lerp(.1, 2, noise1D(i)) set total total + amp*worley(freq*x, freq*y, freq*z, seed + 5*i)^power set x x*total set y y*total set z z*total apply asteroid_rock function asteroidField asteroid_count asteroid_size srand seed loop as 0 asteroid_count set sc 1 + abs(gauss(0, 1))^asteroid_size set tx rand(-1, 1)*AF_ROOMX set ty rand(-1, AF_ROOMY) set tz rand(-1, 1)*AF_ROOMZ translate tx ty tz rotate rand(0,pi) rand(0,pi) rand(0,pi) scale sc*rand(.02,.03) sc*rand(.02,.03) sc*rand(.02,.03) call asteroid [max(min(6*sc, 50), 15)]*QMULT 0 set seed rand(0, 100000)
In the end, my final ship algorithm was about 300 lines of Welder code. It was by far the most complicated piece of the scene. However, I was pleased with the algorithm - it consistently produced good-looking ships, which meant that I had an infinite number from which to choose the final candidate! Here's a closeup of one of my favorites (though not the one chosen for the final render):
Not surprisingly, asteroids were much easier to make. I gave the Welder code for them above. They're just spheres that are displaced with various types of noise, and then subjected to a 'cratering' algorithm that puts dents in them.
By far the most challenging technical aspect of the rendering is the nebula. Rendering a nebula requires implementing a volumetric light transport simulation that supports a thick, high-frequency, heterogeneous medium. In addition, a convincing rendering requires wavelength-dependent scattering (which real nebulae exhibit), a good phase function, and, of course, multiple scattering (most of the light that diffuses through a nebula scatters many times before exiting).
Given these technical challenges, a few solutions come to mind. Volumetric photon mapping is a classic biased solution for volumetric effects that require multiple scattering. Irradiance caching is another biased solution. However, both have limitations.
Volumetric photon mapping is not well-suited to nebula rendering, particularly because of the fact that the nebula is thick and high-frequency. The number of photons that would be required to fully map the light transport through the nebula would be enormous. For evidence, see the 2008 project on the horsehead nebula. Albeit the fact that they used 4 million photons, the result is still low-frequency and splotchy, even though the dust is quite thin! Photon mapping is simply unable to capture high-frequency detail across such a large volume; it is better suited for small regions of concentrated detail like caustics, or large regions of low-frequency detail like indirect illumination in a room.
Irradiance caching would probably be the best choice of a biased method for rendering a nebula. However, I chose to pursue an unbiased method.
The Unbiased Solution
The classic, unbiased solution to the problem of a full light-transport simulation is Monte Carlo path tracing. In the context of rendering nebula gas, path tracing would simply shoot enormous numbers of light rays through the gas, simulating scattering events along the way. This solution is no doubt the most accurate solution, and would capture every detail of the entire nebula. However, path tracing is very slow, especially when participating media is involved.
Enter the GPU: Getting the Best of Both
Modern-day GPUs are capable of enormous computation. Why are we still using CPUs to perform rendering tasks that are perfectly-suited for the vastly-more-powerful GPUs? Driven by this question, I decided to explore a GPU solution to path tracing the nebula. If I could perform the work on the GPU, I knew that I would have enough computational power to render the nebula with direct path tracing.
Building a GPU Path-Tracer
Scene Representation / Acceleration Structure
Perhaps the most fundamental challenge in any raytracer is finding an efficient representation of the scene, and building a so-called "acceleration structure" to enable fast queries of scene information (in particular, ray-scene intersections are usually of greatest importance). One of the most-frequently-used such structures is the k-D tree. K-D trees enable very fast ray-scene intersections, but are also relatively easy to build.
To build a scene representation, I take the Welder triangle mesh output and use the surface area heuristic (SAH) algorithm to find splitting planes, as described in . I also perform an iterative refinement pass on top of SAH that dramatically increases the accuracy of the optimal splitting plane computation. I implement this on the CPU, as a pre-processing phase before sending any data to the GPU.
Traversing a k-d tree is straightforward on a CPU. The algorithm, however, uses recursion (or, equivalently, a stack bounded by the maximal depth of the tree) to keep track of state during traversal. Unfortunately, such memory usage is not practical for current GPU architectures, and would completely destroy any hopes of reasonable performance of a GPU path tracer. Instead, a memory-aware algorithm must be used to traverse k-d trees on the GPU.
To do so, I implement the ropes++ algorithm as described in . The algorithm requires baking a bit of extra data into the k-d tree, such that leaf nodes have six pointers to surrounding cells. The gain, however, is that the traversal algorithm does not require a stack, nor does it perform redundant work (as other algorithms like k-d-restart do). The literature suggests that this is the fastest-known algorithm for k-d tree traversal on the GPU.
Below is a visualization of the relative amount of work performed during the computation of each pixel of a simple scene (two floating asteroids of roughly 100k triangles each). Using a k-D tree accelerator, most of the pixels require almost no computation. The regions in and around the asteroids, however, require more. Interestingly, you can see outlines of the tree's splitting planes if you look closely at the asteroids.
To store the volumetric data, I directly use a CUDA 3D texture. Setting the addressing mode to "tile" allows for the illusion of an "infinite" volume.
Rendering the volumetric data requires raymarching the volume and simulating scattering events in-between surface intersections. To do so, I implement the Monte Carlo technique given in . Furthermore, I implement wavelength-dependent scattering, as suggested by , to simulate the real behavior of light inside of a nebula. Finally, I use the Henyey-Greenstein phase function to attenuate scattered light, again as suggested by .
Some work-in-progress images of the participating media implementation during construction:
Metropolis Light Transport
Unfortunately, even the power of a GPU isn't enough to combat the fact that forward path tracing with a thick participating medium is slow. Most of the light in the scene gets scattering away or attenuated too much to matter. Without a smarter strategy, even a GPU would be unable to render a nebula scene in a reasonable amount of time.
The Metropolis Light Transport algorithm is the perfect solution to accelerating rendering of a thick participating medium on the GPU. MLT evaluates the "quality" of previous samples to help determine the next sample. In doing so, it explores the space of all possible paths in an intelligent way, rather than doing so in a purely-random fashion, as forward path tracing would do.
I implement the algorithm given in  for MLT. However, this algorithm alone is not well-suited for the GPU, because it uses a stack to store random numbers. My contribution to the implementation of MLT on the GPU is as follows: rather than using a stack, use a circular buffer (also known as ring buffer) to represent the sample space. The buffer can be set to any size, but the key is that it is fixed in memory requirements! As an example, if we are using a size-8 buffer, then the 9th random number used to construct the path will be the same as the 1st. In theory, this introduces a bias because there is correlation between the elements in sample space. However, in practice, this correlation does not introduce artifacts unless the buffer is very small (< 8 numbers). In my implementation, I use a 16-element buffer, and no artifacts are visible! This implementation is vastly more efficient than using a stack, and, of course, is well-suited to the GPU. Note that this method is not described in any papers (that I could find), so I consider this my "novel" contribution to rendering on the GPU
It is also worth noting that most of the papers on MLT are quite dense, but , although not dealing directly with MLT, was an invaluable resource in helping me to understand and implement the Metropolis algorithm.
5-minute render of a scene with no media, using MLT (left), and standard path tracing (right)
2-minute render of a scene with thin media, using MLT (left), and standard path tracing (right)
2-minute render of a scene with moderately-dense media, using MLT (left), and standard path tracing (right)
From the above comparisons, it is evident that, as the density of the participating medium increases, MLT gains more and more of an advantage over standard path tracing. In a thick medium like a nebula, MLT is several orders of magnitude more efficient.
To achieve a dramatic and artistic composition, I implement a post-process kernel in CUDA that applies several effects to the final image. To convert the image from HDR to LDR, I apply the filmic tonemapping formula given in . In addition, I implement a bloom effect to allow the brightness of the central star to bleed into surrounding pixels, giving the effect of a very bright light source. Together, these effects significantly boost the visual appeal of the final image!
Because you made it all the way to the end! :P
 "On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N)". http://dcgi.felk.cvut.cz/home/havran/ARTICLES/ingo06rtKdtree.pdf.
 "kD-Tree Traversal Implementations for Ray Tracing on Massive Multiprocessors: a Comparative Study". https://www.gprt.ufpe.br/grvm/Publication/FullPapers/2009/SBACPAD2009_Santosetal.pdf.
 "Unbiased Global Illumination with Participating Media." http://www.uni-ulm.de/fileadmin/website_uni_ulm/iui.inst.100/institut/Papers/ugiwpm.pdf.
 "Reflection Nebula Visualization." https://www.mpi-inf.mpg.de/~lintu/papers/vis05.pdf.
 "A Simple and Robust Mutation Strategy for the Metropolis Light Transport Algorithm." http://sirkan.iit.bme.hu/~szirmay/paper50_electronic.pdf.
 "Implementing Energy Redistribution Path Tracing". http://www.cs.columbia.edu/~batty/misc/ERPT-report.pdf.
 "Uncharted 2: HDR Lighting". http://www.slideshare.net/ozlael/hable-john-uncharted2-hdr-lighting.