Stanford CS348V, Winter 2018

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info
Tues/Thurs 1:30-2:50pm
Mitchell Earth Sciences B67
Instructor: Kayvon Fatahalian
See the course info page for more info on course policies, logistics, and how to prepare for the course.
Winter 2018 Schedule (subject to change)
Jan 9
Multi-core, SIMD, and hardware multi-threading in the context of modern multi-core CPUs, GPUs, FPGAs, ASICs; understanding latency and bandwidth constraints
Part 1: High Efficiency Image and Video Processing
Jan 11
Algorithms taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting aberrations, autofocus/autoexposure, high-dynamic range processing via multi-shot techniques
Jan 16
Pyramidal/multi-resolution techniques, Local laplacian filters, bilateral filters (via the bilateral grid), optical flow
Jan 18
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Jan 23
Contrasting efficiency of GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis; DSLs for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
Jan 25
Basics of JPG and H.264 encoding, motivations for ASIC acceleration, future opportunities for compression when machines, not humans, will observe most images
Jan 30
Light field representation, light field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline
Part 2: Efficient Training and Evaluation of DNNs for Visual Understanding
Feb 1
DNN topologies, reduction to dense linear algebra, challenges of direct implementation, where the compute lies in the network, motivations for three modern "trunks": Inception/Resnet/MobileNet, what it means to be fully convolutional
Feb 6
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
Feb 8
Motivating R-CNN/Fast R-CNN/Faster R-CNN, alternative design of SSD/Yolo, philosophy of end-to-end training
Feb 13
Neural module networks (and their surprising effectiveness for VQA), learning to compress images/video, discussion on value of modularity vs. end-to-end learning
Feb 15
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision, recent ISCA/MICRO papers on DNN acceleration
Feb 20
Specialization to scene, exploiting temporal coherence in video, sharing across applications
Feb 22
Facebook SVE, ExCamera, Scanner
Part 3: GPU Implementation of the Real-Time 3D Graphics Pipeline
Feb 27
3D graphics pipeline as a machine architecture (abstraction), basic pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
Mar 1
Texture sampling and prefiltering basics, texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
Mar 6
Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Mar 8
Large Scale Distributed Image Processing at Facebook (guest lecture)
Facebook Lumos
Mar 13
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages
Mar 15
Topic TBD
Assignments and Projects
All students will be expected to perform academic paper readings approximately every other class, complete three simple programming exercises (to reinforce concepts), and complete a self-selected final project (projects can be performed in teams of up to two).
due Jan 19Assignment 1: Performance Analysis on a Multi-Core CPU (a mini assignment for review)
due Feb 7Assignment 2: RAW Processing for the kPhone 348V
due Mar 16Assignment 3: Scheduling a MobileNet Layer (optional assignment)
end of quarterSelf-Selected Term Project