![](images/teaser.jpg)
This page contains lecture slides and recommended readings for the Winter 2018 offering of CS348V.
- The Compute Architecture of Intel Processor Graphics Gen9, Intel Technical Whitepaper
The required reading for this class is not an academic technical paper, but a whitepaper from Intel describing the architectural geometry of their latest GPU. This processor is particularly notable because it is the integrated GPU that will be in most mid-2016 and later Core i5 or i7 processors -- the marketing name is HD Graphics 530 (or larger).
I'd like you to read the whitepaper, focusing on the description of the processor in Sections 5.3-5.5. Then, given your knowledge of the concepts discussed in lecture (such as superscalar, multi-core, multi-threading, etc), I'd like you to describe the features of the processor (using terms from the lecture, not Intel terms).
Pro tip: Consider your favorite data-parallel language, such as GLSL/HLSL shading languages, CUDA, OpenCL, ISPC, or just an OpenMP #pragma parallel for, and make sure you can think through how an embarrassingly parallel for loop can be lowered to these architectures. (You don't need to write this down, but you could.)
Students wanting to go farther might also be interested in also reading the NVIDIA P100 or GTX 980 whitepapers also linked below. Then you could make a table contrasting the geometry of: a modern AVX-capable Intel CPU, Intel Integrated Graphics (Gen9), NVIDIA GPUs, and any other processor you might be interested in, such as Intel Knights Corner, AMD GPUs, etc.
- NVIDIA Tesla P100 Whitepaper, 2016
- NVIDIA GeForce GTX 980 Whitepaper, 2014
- NVIDIA Tegra X1 Whitepaper
- The Rise of Mobile Visual Computing Systems, Fatahalian, IEEE Mobile Computing 2016
- Scalability! But at What COST?, McSherry, Isard, and Murray. HotOS 2015 (The arguments in this paper are very consistent with the way we think about performance in the visual computing domain.)
- The Stanford CS448A course notes are a very good reference for camera image processing pipeline algorithms and issues.
- The interactive demos on the Stanford CS178 course site are very well done
- Clarkvision.com has some very interesting material on cameras.
- Burst Photography for High Dynamic Range and Low-light Imaging on Mobile Cameras, Hasinoff et al. SIGGRAPH Asia 2016 (the best public description of the camera pipeline in a modern smartphone)
- Demosaicking: Color Filter Array Interpolation, Gunturk et al. IEEE Signal Processing Magazine, 2005
- Burst Photography for High Dynamic Range and Low-light Imaging on Mobile Cameras, Hasinoff et al. SIGGRAPH Asia 2016
- The Laplacian Pyramid as a Compact Image Code. Burt and Adelson, IEEE Transactions on Communications 1983.
- Pyramid Methods in Image Processing. Andersen et al. 1984
- Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid. Paris et al. SIGGRAPH 2013
- Exposure Fusion. Mertens et al. Computer Graphics and Applications, 2007
- Fast Local Laplacian Filters: Theory and Applications. Aubry et al. TOG 14
- Fast Median and Bilateral Filtering, Weiss. SIGGRAPH 2006
- A Non-Local Algorithm for Image Denoising, Buades et al. CVPR 2005
- A Gentle Introduction to Bilateral Filtering and its Applications, Paris et al. SIGGRAPH 2008 Course Notes
- A Fast Approximation of the Bilateral Filter using a Signal Processing Approach, Paris and Durand. MIT technical report 2006 (extends their ECCV 2006 paper)
- An Iterative Image Registration Technique with an Application to Stereo Vision, Lucas and Kanade. IJCAI 1981
- Lucas-Kanade 20 Years On: A Unifying Framework, Baker and Matthews. ICCV 2004
- Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. Ragan-Kelley, Andrew Adams, et al. PLDI 2013 (OR READ THE SELECTED CHAPTERS OF THE RAGAN-KELLEY THESIS GIVEN BELOW IN FURTHER READING)
- The Frankencamera: An Experimental Platform for Computational Photography. A. Adams et al. SIGGRAPH 2010
- Decoupling Algorithms from the Organization of Computation for High Performance Image Processing (please read Chapters 1, 4, 5, and 6.1), Ragan-Kelley (MIT Ph.D. thesis, 2014)
- Automatically Scheduling Halide Image Processing Pipelines, Mullapudi et al. SIGGRAPH 2016
- Halide Language Website (contains documentation and many tutorials)
- Rigel: Flexible Multi-Rate Image Processing Hardware, Hegarty et al. SIGGRAPH 2016.
- Programming Heterogeneous Systems from an Image Processing DSL. Pu et al. TACO 2017
- Understanding Sources of Inefficiency in General-Purpose Chips, Hameed et al. ISCA 2010
- Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines Hegarty et al. SIGGRAPH 2014
- Light Field Rendering. Levoy and Hanrahan SIGGRAPH 1996
- The Lumigraph. Gortler et al. SIGGRAPH 1996
- Single Lens Stereo with a Plenoptic Camera. E. Adelseon and J. Wang. Transactions on Pattern Analysis and Machine Intelligence, 1992
- Light-Field Photography with a Hand-Helf Plenoptic Camera. Ng et al. Stanford Technical Report, 2005
- Digital Light Field Photography. R. Ng. Stanford Ph.D. Dissertation, 2006 (see chapters 1-4)
- Jump: Virtual Reality Video, Andersen et al. SIGGRAPH Asia 2016 (Jump website)
- Casual 3D Photography, Hedman et al. SIGGRAPH Asia 2017
- Facebook Surround 360 page
- Going Deeper with Convolutions, Szegedy et al. CVPR 2015 (the Inception paper)
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Howard et al. 2017
- Stanford cs231: Convolutional Neural Networks for Visual Recognition. If you haven't taken CS231N, I recommend that you read through the lecture notes of modules 1 and 2 for very nice explanation of key topics.
- An Introduction to different Types of Convolutions in Deep Learning. by Paul-Louis Pröve (a nice little tutorial)
- Deep Residual Learning for Image Recognition. K. He et al. CVPR 2016 (the ResNet paper)
- Neural Networks and Deep Learning, Nielson, 2016 (a free online book)
- Check out the TensorFlow tutorials and play around in the TensorFlow Playground
- Visualizing and Understanding Convolutional Neural Networks, Zeiler and Fergus, ECCV14
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. Goyal et al. 2017
- ImageNet Training in Minutes. You et al. 2018
- Scaling Distributed Machine Learning with the Parameter Server, Li et al. OSDI 2014
- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation, Girshick et al. CVPR 2014 (the R-CNN paper)
- Fast R-CNN, Girshick, ICCV 2015
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, Ren et al. NIPS 2015
- SSD: Single Shot MultiBox Detector. Liu et al. ECCV 2016
- Mask R-CNN. He et al. ICCV 2017
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. J. Johnson et al. CVPR 2017
- Inferring and Executing Programs for Visual Reasoning. J. Johnson et al. ICCV 2017
- Variable Rate Image Compression with Recurrent Neural Networks, Toderici et al. ICLR 2016
- Full Resolution Image Compression with Recurrent Neural Networks, Toderici et al. 2016
- Learning Binary Residual Representations for Domain-Specific Video Streaming. Tsai et al. AAAI 2018
- Neural Module Networks, Andreas et al. CVPR 2016
- In-Datacenter Performance Analysis of a Tensor Processing Unit. Jouppi et al. ISCA 2017
- SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. Parashar et al. ISCA 2017
- ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. Venkataramani et al. ISCA 2017
- Cambricon: an instruction set architecture for neural networks, Liu et al. ISCA 2016
- EIE: Efficient Inference Engine on Compressed Deep Neural Network, Han et al. ISCA 2016
- Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing, Albericio et al. ISCA 2016
- Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators, Reagen et al. ISCA 2016
- vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design, Rhu et al. MICRO 2016
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Network, Chen et al. ISCA 2016
- Summarize one cool idea, or critique one submission from the 2018 SysML workshop.
- NoScope: Optimizing Neural Network Queries over Video at Scale. D. Kang et al. 2017
- Clockwork Convnets for Video Semantic Segmentation. E. Shelhamer et al. 2016
- SVE: Distributed Video Processing at Facebook Scale. Huang et al. SOSP 2017
- Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. Fouladi et al. NSDI 2017
- Live Video Analytics at Scale with Approximation and Delay-Tolerance. Zhang et al. NSDI 2017
- The Design of the OpenGL Graphics Interface. by M. Segal and K. Akeley. [unpublished 1994]
- The Direct3D 10 System by D. Blythe. SIGGRAPH 2006
- High-Performance Software Rasterization on GPUs. S. Laine et al. High Performance Graphics 2011. (source code is available on the paper page)
- Pyramidal Parametrics. L. Williams, Computer Graphics 1983
- Texture on Demand. D. Peachy. Pixar Technical Memory #217. 1990
- The Design and Analysis of a Cache Architecture for Texture Mapping. Z. S. Hakura and Anoop Gupta, ISCA 1997
- Prefetching in a Texture Cache Architecture. H. Igehy et al. Graphics Hardware 1998
- Cardinality-Constrained Texture Filtering. J. Manson and S. Schaefer. SIGGRAPH 2013.
- Parameterization-Aware MIP-Mapping. J. Manson and S. Schaefer. Computer Graphics Forum. 2012.
- Texture Compression using Low-Frequency Signal Modulation. S. Fenney. Graphics Hardware 2003
- iPACKMAN: High-Quality, Low-Complexity Texture Compression for Mobile Phones. J. Ström and T. Akenine-Möller. Graphics Hardware 2005
- ETC2: Texture Compression using Invalid Combinations. J. Ström and M. Pettersson. Graphics Hardware 2007
- Adaptive Scalable Texture Compression. T. Olson et al. High Performance Graphics 2012
- Block Compression in Direct3D 10. MSDN Developer Reference. 2013
- Efficient Depth Buffer Compression. J. Hasselgren and T. Akenine Möller.
- Stochastic Depth Buffer Compression using Generalized Plane Encoding. M. Andersson et al. Computer Graphics Forum 2013
- Pomegranate: A Fully Scalable Graphics Architecture. M. Eldridge et al. SIGGRAPH 2000
- Life of a Triangle - NVIDIA's Logical Pipeline. C. Kubisch (NVIDIA GameWorks Blog, 2015)
- Fast Tessellated Rendering on Fermi GF100. T. Purcell (High Performance Graphics Hot3D talk)
- A Sorting Classification of Parallel Rendering. S. Molnar et al. IEEE Computer Graphics and Applications, 1994.
- A Language for Shading and Lighting Calculations. P. Hanrahan and J. Lawson. SIGGRAPH 1990
- Cg: A System for Programming Graphics Hardware in a C-like Language. W. R. Mark et al. SIGGRAPH 2003
- Spark: Modular, Composable Shaders for Graphics Hardware. T. Foley and P. Hanrahan. SIGGRAPH 2011
- Shader Components: Modular and High Performance Shader Development. Y. He et al. SIGGRAPH 2017
- A Real-Time Procedural Shading System for Programmable Graphics Hardware. K. Proudfoot et al. SIGGRAPH 2001
- Shade Trees. R. Cook. SIGGRAPH 1984
- An Image Synthesizer. K. Perlin. SIGGRAPH 1985
- Shader Metaprogramming. M. McCool et al. Graphics Hardware 2002