parnell/Final Project

:: Rendering Space Art ::

Josh Parnell

"Small" Final Image (3 hour render on GTX 560). Click for full version.

final_sat.png

Motivation

Space has inspired mankind for ages. In the last century or so, a great deal of inspirational artwork has featured space and the exotic potential that it holds for bustling empires, unknown riches, and devastating warfare. Personally, I love space artwork, and am constantly inspired by the amazing concept art on sites like ConceptShips, SolarVoyager, and even the classic DeviantArt.

Unfortunately, like all artwork, space artwork poses a real challenge for rendering. In particular, artists often portray beautiful spacescapes rich with futuristic planets, hulking ships, and beautifully-lit dust clouds. The problem, from a rendering point-of-view, is that space is not like this. Space is sparse, dark, and overwhelmingly empty. How boring!

Luckily, there's still at least one aspect of space that is quite awe-inspiring: nebulae. Nebulae display some of the most beautiful lighting effects conceivable. Powered by millions of billions of billions of watts of energy (no, seriously...the sun has a luminosity of about 400 septillion watts...), nebulae light up as inconceivable numbers of photons scatter within their dusty innards. Now that...that would be a cool thing to render!

From the beginning, my goal with this project was ill-defined. I knew I wanted to render a nebula. I knew I wanted to build a spaceship. I knew that, if I could create something that would pass as a piece of space concept art (rendered concept art, that is), then I would be happy. Without a reference image, I struggled and wandered in the dark for quite a while, but finally settled on a scene of a ship dramatically emerging from a cloud of dust within a dense nebula. And, of course, I would have to include asteroids for dramatic effect :)

Scene Construction

Although I could have downloaded models to produce the scene above, I decided early on that it would be much more exciting (and challenging!) to attempt procedural construction of the scene geometry. To that end, I developed a simple (but powerful) procedural modeling language that I used to generate my final scene. The language was an invaluable resource in toying with geometric ideas and, ultimately, in producing high-quality ships and asteroids!

Welder: A Language for Procedural Geometry

Welder is the name of the procedural modeling language that I designed for the project. It was intended to be the absolute fastest way to prototype procedural algorithms. Although I didn't write up a full specification for the language (as it's actually quite large and complex in terms of features), I'll highlight a few of the most important features below.

Cylindrical Extrusion

This was one of the main ways of constructing geometry that I used in the development of my algorithms. Cylindrical extrusion creates a primitive that is topologically equivalent to a capped cylinder, but takes as an input a function F(t, h) that is used to determine the radius of each point on the cylinder with height h and angle t.

In welder, the callback function is simply given as the block of code enclosed by the extrusion call, making it incredibly easy to produce geometry in this way.

For example, the following code creates a cylinder of radius 1, made of 10 'stacks' and 10 'slices' of triangles:

extrude 10 10 
   return 1

While the following creates a cone:

extrude 10 10 
   return h

Displacement

Being able to displace all the vertices in a mesh is a very useful operation. Using this, one can, for example, easily perturb a sphere into many cool shapes without having to have a closed-form description of said shapes (in terms of some kind of function like that required for extrusion).

Again, displacement is performed by providing a callback function in the block within the displacement call. Rather than returning something, the displacement call is expected to modify the variables x, y, and z, which are set for each vertex, and the vertex is moved to the new location of the updated x, y, and z variables after calling the callback function.

For example, the following code snippet squares each coordinate of whatever mesh is currently active:

displace 
   set x x^2
   set y y^2
   set z z^2

Expressions

To make the language as powerful as possible, it was absolutely necessary to include the ability to parse fully-complex expressions. This was definitely the trickiest part of writing the Welder compiler, but it paid off in the end, because it enabled rapid and natural coding of advanced math operations.

The language supports the following operators: +, -, *, /, ^ (exponentiation), (), and function calls.

Anywhere that a numeric value is expected, it is legal to write an expression.

Example Program

As a practical example of the language, here's my implementation of the asteroid field algorithm that was used to generate the field for the final rendering. You can see in this code some more advanced usage of mathematical expressions, functions, loops, variable usage, etc.

// Inputs: 
//              SEED : seed value
//              QMULT : quality multiplier

set AF_ROOMX 15
set AF_ROOMY 10
set AF_ROOMZ 10

material asteroid_rock
        texture rock3.jpg
        type 1
        tile .5 .5 .5

function crater maxsize maxdepth
        set csx rand(-1, 1)
        set csy rand(-1, 1)
        set csz rand(-1, 1)
        set mag sqrt(csx*csx + csy*csy + csz*csz)
        set csx csx / mag
        set csy csy / mag
        set csz csz / mag
        set thresh rand(0, sqrt(maxsize))^4
        set depth rand(maxdepth, 1)
        set power rand(5, 30)
        displace
                set dist (x-csx)^2 + (y-csy)^2 + (z-csz)^2
                if dist < thresh
                        set mult lerp(depth, 1, smoothstep(dist/thresh)^power)
                        set x x*mult
                        set y y*mult
                        set z z*mult

function asteroid ast_qual craters
        sphere ast_qual ast_qual
        loop i 0 craters-1
                call crater .4 .98
        displace
                set total 1
                set total total + .8*perlin(x, y, z, seed)^2 + .8*perlin(x, y, z, seed+52)^2
                loop i 0 5
                        set freq 2^(i)
                        set amp 1 / (freq^1.7)
                        set power lerp(.1, 2, noise1D(i))
                        set total total + amp*worley(freq*x, freq*y, freq*z, seed + 5*i)^power
                set x x*total
                set y y*total
                set z z*total
        apply asteroid_rock

function asteroidField asteroid_count asteroid_size
        srand seed      
        loop as 0 asteroid_count
                set sc 1 + abs(gauss(0, 1))^asteroid_size
                set tx rand(-1, 1)*AF_ROOMX
                set ty rand(-1, AF_ROOMY)
                set tz rand(-1, 1)*AF_ROOMZ
                translate tx ty tz
                        rotate rand(0,pi) rand(0,pi) rand(0,pi)
                                scale sc*rand(.02,.03) sc*rand(.02,.03) sc*rand(.02,.03)
                                        call asteroid [max(min(6*sc, 50), 15)]*QMULT 0
                set seed rand(0, 100000)

Results

Ships

In the end, my final ship algorithm was about 300 lines of Welder code. It was by far the most complicated piece of the scene. However, I was pleased with the algorithm - it consistently produced good-looking ships, which meant that I had an infinite number from which to choose the final candidate! Here's a closeup of one of my favorites (though not the one chosen for the final render):

144.png

Note that I attached the emblem manually. It is the emblem of Flanders, the Northern part of Belgium, where I grew up. I've always thought it was a cool emblem!

Asteroids

Not surprisingly, asteroids were much easier to make. I gave the Welder code for them above. They're just spheres that are displaced with various types of noise, and then subjected to a 'cratering' algorithm that puts dents in them.

108.png 110.png 138.png

Rendering

By far the most challenging technical aspect of the rendering is the nebula. Rendering a nebula requires implementing a volumetric light transport simulation that supports a thick, high-frequency, heterogeneous medium. In addition, a convincing rendering requires wavelength-dependent scattering (which real nebulae exhibit), a good phase function, and, of course, multiple scattering (most of the light that diffuses through a nebula scatters many times before exiting).

Biased Solutions

Given these technical challenges, a few solutions come to mind. Volumetric photon mapping is a classic biased solution for volumetric effects that require multiple scattering. Irradiance caching is another biased solution. However, both have limitations.

Volumetric photon mapping is not well-suited to nebula rendering, particularly because of the fact that the nebula is thick and high-frequency. The number of photons that would be required to fully map the light transport through the nebula would be enormous. For evidence, see the 2008 project on the horsehead nebula. Albeit the fact that they used 4 million photons, the result is still low-frequency and splotchy, even though the dust is quite thin! Photon mapping is simply unable to capture high-frequency detail across such a large volume; it is better suited for small regions of concentrated detail like caustics, or large regions of low-frequency detail like indirect illumination in a room.

Irradiance caching would probably be the best choice of a biased method for rendering a nebula. However, I chose to pursue an unbiased method.

The Unbiased Solution

The classic, unbiased solution to the problem of a full light-transport simulation is Monte Carlo path tracing. In the context of rendering nebula gas, path tracing would simply shoot enormous numbers of light rays through the gas, simulating scattering events along the way. This solution is no doubt the most accurate solution, and would capture every detail of the entire nebula. However, path tracing is very slow, especially when participating media is involved.

Enter the GPU: Getting the Best of Both

Modern-day GPUs are capable of enormous computation. Why are we still using CPUs to perform rendering tasks that are perfectly-suited for the vastly-more-powerful GPUs? Driven by this question, I decided to explore a GPU solution to path tracing the nebula. If I could perform the work on the GPU, I knew that I would have enough computational power to render the nebula with direct path tracing.

Building a GPU Path-Tracer

Scene Representation / Acceleration Structure

Perhaps the most fundamental challenge in any raytracer is finding an efficient representation of the scene, and building a so-called "acceleration structure" to enable fast queries of scene information (in particular, ray-scene intersections are usually of greatest importance). One of the most-frequently-used such structures is the k-D tree. K-D trees enable very fast ray-scene intersections, but are also relatively easy to build.

Building

To build a scene representation, I take the Welder triangle mesh output and use the surface area heuristic (SAH) algorithm to find splitting planes, as described in [1]. I also perform an iterative refinement pass on top of SAH that dramatically increases the accuracy of the optimal splitting plane computation. I implement this on the CPU, as a pre-processing phase before sending any data to the GPU.

Traversing

Traversing a k-d tree is straightforward on a CPU. The algorithm, however, uses recursion (or, equivalently, a stack bounded by the maximal depth of the tree) to keep track of state during traversal. Unfortunately, such memory usage is not practical for current GPU architectures, and would completely destroy any hopes of reasonable performance of a GPU path tracer. Instead, a memory-aware algorithm must be used to traverse k-d trees on the GPU.

To do so, I implement the ropes++ algorithm as described in [2]. The algorithm requires baking a bit of extra data into the k-d tree, such that leaf nodes have six pointers to surrounding cells. The gain, however, is that the traversal algorithm does not require a stack, nor does it perform redundant work (as other algorithms like k-d-restart do). The literature suggests that this is the fastest-known algorithm for k-d tree traversal on the GPU.

Below is a visualization of the relative amount of work performed during the computation of each pixel of a simple scene (two floating asteroids of roughly 100k triangles each). Using a k-D tree accelerator, most of the pixels require almost no computation. The regions in and around the asteroids, however, require more. Interestingly, you can see outlines of the tree's splitting planes if you look closely at the asteroids.

58.png

Participating Media

To store the volumetric data, I directly use a CUDA 3D texture. Setting the addressing mode to "tile" allows for the illusion of an "infinite" volume.

Rendering the volumetric data requires raymarching the volume and simulating scattering events in-between surface intersections. To do so, I implement the Monte Carlo technique given in [3]. Furthermore, I implement wavelength-dependent scattering, as suggested by [4], to simulate the real behavior of light inside of a nebula. Finally, I use the Henyey-Greenstein phase function to attenuate scattered light, again as suggested by [4].

Some work-in-progress images of the participating media implementation during construction:

152.png 161.png 165.png

Metropolis Light Transport

Unfortunately, even the power of a GPU isn't enough to combat the fact that forward path tracing with a thick participating medium is slow. Most of the light in the scene gets scattering away or attenuated too much to matter. Without a smarter strategy, even a GPU would be unable to render a nebula scene in a reasonable amount of time.

The Metropolis Light Transport algorithm is the perfect solution to accelerating rendering of a thick participating medium on the GPU. MLT evaluates the "quality" of previous samples to help determine the next sample. In doing so, it explores the space of all possible paths in an intelligent way, rather than doing so in a purely-random fashion, as forward path tracing would do.

I implement the algorithm given in [5] for MLT. However, this algorithm alone is not well-suited for the GPU, because it uses a stack to store random numbers. My contribution to the implementation of MLT on the GPU is as follows: rather than using a stack, use a circular buffer (also known as ring buffer) to represent the sample space. The buffer can be set to any size, but the key is that it is fixed in memory requirements! As an example, if we are using a size-8 buffer, then the 9th random number used to construct the path will be the same as the 1st. In theory, this introduces a bias because there is correlation between the elements in sample space. However, in practice, this correlation does not introduce artifacts unless the buffer is very small (< 8 numbers). In my implementation, I use a 16-element buffer, and no artifacts are visible! This implementation is vastly more efficient than using a stack, and, of course, is well-suited to the GPU. Note that this method is not described in any papers (that I could find), so I consider this my "novel" contribution to rendering on the GPU :)

It is also worth noting that most of the papers on MLT are quite dense, but [6], although not dealing directly with MLT, was an invaluable resource in helping me to understand and implement the Metropolis algorithm.

Comparison

5-minute render of a scene with no media, using MLT (left), and standard path tracing (right)

shot5.jpg shot6.png

2-minute render of a scene with thin media, using MLT (left), and standard path tracing (right)

shot7.png shot8.png

2-minute render of a scene with moderately-dense media, using MLT (left), and standard path tracing (right)

shot9.png shot10.png

From the above comparisons, it is evident that, as the density of the participating medium increases, MLT gains more and more of an advantage over standard path tracing. In a thick medium like a nebula, MLT is several orders of magnitude more efficient.

Post-Processing

To achieve a dramatic and artistic composition, I implement a post-process kernel in CUDA that applies several effects to the final image. To convert the image from HDR to LDR, I apply the filmic tonemapping formula given in [7]. In addition, I implement a bloom effect to allow the brightness of the central star to bleed into surrounding pixels, giving the effect of a very bright light source. Together, these effects significantly boost the visual appeal of the final image!

Bonus Images

Because you made it all the way to the end! :P

24.png

"My First GPU Ray Tracer"

41.png

"Lovin' That Chrome"

34.png

"Sweet Glitch"

139.png

"What is this, EVE Online???"

eve.png

"Well, Since you Mentioned it..."

Sources

Recent