Real-Time Volume Rendering on Shared Memory Multiprocessors
Using the Shear-Warp Factorization
To appear in Proc. 1995 Parallel Rendering Symposium (Atlanta,
Georgia, October 30-31, 1995).
This paper presents a new parallel volume rendering algorithm that can
render 256^3 voxel medical data sets at over 10 Hz and 128^3 voxel
data sets at over 30 Hz on a 16-processor Silicon Graphics Challenge.
The algorithm achieves these results by minimizing each of the three
components of execution time: computation time, synchronization time,
and data communication time. Computation time is low because the
parallel algorithm is based on the recently-reported shear-warp serial
volume rendering algorithm which is over five times faster than
previous serial algorithms. Synchronization time is minimized by
using dynamic load balancing and a task partition that minimizes
synchronization events. Data communication costs are low because
the algorithm is implemented for shared-memory multiprocessors, a class
of machines with hardware support for low-latency fine-grain
communication and hardware caching to hide latency.
We draw two conclusions from our implementation. First, we find that
on shared-memory architectures data redistribution and communication
costs do not dominate rendering time. Second, we find that cache
locality requirements impose a limit on parallelism in volume
rendering algorithms. Specifically, our results indicate that
shared-memory machines with hundreds of processors would be useful
only for rendering very large data sets.
Additional information available: