Stanford CS348V, Winter 2018

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info
Tues/Thurs 1:30-2:50pm
Mitchell Earth Sciences B67
Instructor: Kayvon Fatahalian
See the course info page for more info on course policies, logistics, and how to prepare for the course.
Winter 2018 Schedule (subject to change)
Jan 9
Multi-core, SIMD, and hardware multi-threading in the context of modern multi-core CPUs, GPUs, FPGAs, ASICs; understanding latency and bandwidth constraints
Part 1: High Efficiency Image and Video Processing
Jan 11
Algorithms taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting aberrations, autofocus/autoexposure, high-dynamic range processing via multi-shot techniques
Jan 16
Pyramidal/multi-resolution techniques, Local laplacian filters, bilateral filters (via the bilateral grid), optical flow
Jan 18
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Jan 23
Specialized Hardware for Image Processing
Contrasting efficiency of GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis; DSLs for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
Jan 25
Lossy Image (JPG) and Video (H.264) Video Compression
Basics of JPG and H.264 encoding, motivations for ASIC acceleration, future opportunities for compression when machines, not humans, will observe most images
Jan 30
Video Processing/Synthesis for Virtual Reality Display
Computational challenges of synthesizing video streams for VR output, cross stream alignment/matching, Google's Jump VR pipeline, Facebook's Surround 360
Part 2: Efficient Training and Evaluation of DNNs for Visual Understanding
Feb 1
Workload Characteristics of DNN Inference for Image Analysis
DNN topologies, reduction to dense linear algebra, challenges of direct implementation, where the compute lies in the network, motivations for three modern "trunks": Inception/Resnet/MobileNet, what it means to be fully convolutional
Feb 6
Scheduling and Algorithms for Parallel DNN Training at Scale
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
Feb 8
A Case Study of Algorithmic Optimizations for Object Detection
Motivating R-CNN/Fast R-CNN/Faster R-CNN, alternative design of SSD/Yolo, philosophy of end-to-end training
Feb 13
Leveraging Task-Specific DNN Structure for Improving Performance and Accuracy
Neural module networks (and their surprising effectiveness for VQA), learning to compress images/video, conditional architectures, discussion on value of modularity vs. end-to-end learning
Feb 15
Hardware Accelerators for DNN Inference
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision, recent ISCA/MICRO papers on DNN acceleration
Feb 20
Design Space of Dataflow Programming Abstractions for Deep Learning
TensorFlow/XLA, Mx.Net/TVM, Torch, TensorFold/Kerras, challenges of compiling/scheduling DNN training/evaluation (fusion, footprint), properties of a good IR, need for performance portability
Feb 22
Enhancing Efficiency Through Model Specialization (in particular for video)
Specialization to scene (NoScope), exploiting temporal coherence in video for faster inference
Feb 27
Efficient Inference at Datacenter Scale (Exact Topic TBD)
Systems like Facebook AML's Lumos
Mar 1
Exact Topic TBD (Tentative: Scaling the Labeling for Large Image/Video Databases)
Part 3: GPU Implementation of the Real-Time 3D Graphics Pipeline
Mar 6
Real-Time 3D Graphics Pipeline Architecture
3D graphics pipeline as a machine architecture (abstraction), basic pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
Mar 8
Hardware Acceleration of Z-Buffering and Texturing
Texture sampling and prefiltering basics, texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
Mar 13
Scheduling the Graphics Pipeline onto a GPU
Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Mar 15
Domain Specific Languages for Shading
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages
Assignments and Projects
All students will be expected to perform academic paper readings approximately every other class, complete three simple programming exercises (to reinforce concepts), and complete a self-selected final project (projects can be performed in teams of up to two).
due Jan 19Assignment 1: Performance Analysis on a Multi-Core CPU (a mini assignment for review)
due Feb 1Assignment 2: Implementing and Optimizing a Mini-Camera RAW Processing Pipeline
due Feb 15Assignment 3: Scheduling Inception / MobileNet DNN Modules in Halide (or TVM)
end of quarterSelf-Selected Term Project