Visual Computing Systems : Stanford Winter 2018

Stanford CS348V, Winter 2018

VISUAL COMPUTING SYSTEMS

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info

Tues/Thurs 1:30-2:50pm

Mitchell Earth Sciences B67

Instructor: Kayvon Fatahalian

See the course info page for more info on course policies, logistics, and how to prepare for the course.

Winter 2018 Schedule (subject to change)

Jan 9	Course Introduction + Review of Parallel Hardware Architecture Multi-core, SIMD, and hardware multi-threading in the context of modern multi-core CPUs, GPUs, FPGAs, ASICs; understanding latency and bandwidth constraints
Part 1: High Efficiency Image and Video Processing
Jan 11	Overview of a Modern Digital Camera Processing Pipeline Algorithms taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting aberrations, autofocus/autoexposure, high-dynamic range processing via multi-shot techniques
Jan 16	Camera Pipeline Part II + Image Processing Algorithms You Should Know Pyramidal/multi-resolution techniques, Local laplacian filters, bilateral filters (via the bilateral grid), optical flow
Jan 18	Efficiently Scheduling Image Processing Algorithms on Parallel Hardware Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Jan 23	Specialized Hardware for Image Processing Contrasting efficiency of GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis; DSLs for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
Jan 25	Lossy Image (JPG) and Video (H.264) Compression Basics of JPG and H.264 encoding, motivations for ASIC acceleration, future opportunities for compression when machines, not humans, will observe most images
Jan 30	The Light Field, Computational Cameras, and Display/Capture for VR Light field representation, light field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline
Part 2: Efficient Training and Evaluation of DNNs for Visual Understanding
Feb 1	Workload Characteristics of DNN Inference for Image Analysis DNN topologies, reduction to dense linear algebra, challenges of direct implementation, where the compute lies in the network, motivations for three modern "trunks": Inception/Resnet/MobileNet, what it means to be fully convolutional
Feb 6	Scheduling and Algorithms for Parallel DNN Training at Scale Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
Feb 8	A Case Study of Algorithmic Optimizations for Object Detection Motivating R-CNN/Fast R-CNN/Faster R-CNN, alternative design of SSD/Yolo, philosophy of end-to-end training
Feb 13	Leveraging Task-Specific DNN Structure for Improving Performance and Accuracy Neural module networks (and their surprising effectiveness for VQA), learning to compress images/video, discussion on value of modularity vs. end-to-end learning
Feb 15	Hardware Accelerators for DNN Inference GPUs, Google TPU, special instructions for DNN evaluation, choice of precision, recent ISCA/MICRO papers on DNN acceleration
Feb 20	Optimizing Inference on Video Streams Specialization to scene, exploiting temporal coherence in video, sharing across applications
Feb 22	Video Processing at Datacenter Scale Facebook SVE, ExCamera, Scanner
Part 3: GPU Implementation of the Real-Time 3D Graphics Pipeline
Feb 27	Real-Time 3D Graphics Pipeline Architecture 3D graphics pipeline as a machine architecture (abstraction), basic pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
Mar 1	Hardware Acceleration of Texture Mapping and Depth-Buffering Texture sampling and prefiltering basics, texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
Mar 6	Scheduling the Graphics Pipeline onto a GPU Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Mar 8	Large Scale Distributed Image Processing at Facebook (guest lecture) Facebook Lumos
Mar 13	Domain Specific Languages for Shading Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages
Mar 15	Topic TBD

Assignments and Projects

All students will be expected to perform academic paper readings approximately every other class, complete three simple programming exercises (to reinforce concepts), and complete a self-selected final project (projects can be performed in teams of up to two).

due Jan 19	Assignment 1: Performance Analysis on a Multi-Core CPU (a mini assignment for review)
due Feb 7	Assignment 2: RAW Processing for the kPhone 348V
due Mar 16	Assignment 3: Scheduling a MobileNet Layer (optional assignment)
end of quarter	Self-Selected Term Project