Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.
Jan 9 |
Multi-core, SIMD, and hardware multi-threading in the context of modern multi-core CPUs, GPUs, FPGAs, ASICs; understanding latency and bandwidth constraints
|
Part 1: High Efficiency Image and Video Processing
|
|
Jan 11 |
Algorithms taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting aberrations, autofocus/autoexposure, high-dynamic range processing via multi-shot techniques
|
Jan 16 |
Pyramidal/multi-resolution techniques, Local laplacian filters, bilateral filters (via the bilateral grid), optical flow
|
Jan 18 |
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
|
Jan 23 |
Contrasting efficiency of GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis; DSLs for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
|
Jan 25 |
Basics of JPG and H.264 encoding, motivations for ASIC acceleration, future opportunities for compression when machines, not humans, will observe most images
|
Jan 30 |
Light field representation, light field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline
|
Part 2: Efficient Training and Evaluation of DNNs for Visual Understanding
|
|
Feb 1 |
DNN topologies, reduction to dense linear algebra, challenges of direct implementation, where the compute lies in the network, motivations for three modern "trunks": Inception/Resnet/MobileNet, what it means to be fully convolutional
|
Feb 6 |
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
|
Feb 8 |
Motivating R-CNN/Fast R-CNN/Faster R-CNN, alternative design of SSD/Yolo, philosophy of end-to-end training
|
Feb 13 |
Neural module networks (and their surprising effectiveness for VQA), learning to compress images/video, discussion on value of modularity vs. end-to-end learning
|
Feb 15 |
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision, recent ISCA/MICRO papers on DNN acceleration
|
Feb 20 |
Specialization to scene, exploiting temporal coherence in video, sharing across applications
|
Feb 22 |
Facebook SVE, ExCamera, Scanner
|
Part 3: GPU Implementation of the Real-Time 3D Graphics Pipeline
|
|
Feb 27 |
3D graphics pipeline as a machine architecture (abstraction), basic pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
|
Mar 1 |
Texture sampling and prefiltering basics, texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
|
Mar 6 |
Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
|
Mar 8 |
Large Scale Distributed Image Processing at Facebook (guest lecture)
Facebook Lumos
|
Mar 13 |
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages
|
Mar 15 |
Topic TBD
|
due Jan 19 | Assignment 1: Performance Analysis on a Multi-Core CPU (a mini assignment for review) |
due Feb 7 | Assignment 2: RAW Processing for the kPhone 348V |
due Mar 16 | Assignment 3: Scheduling a MobileNet Layer (optional assignment) |
end of quarter | Self-Selected Term Project |