CS348n Course Schedule

CS468 Class Schedule, Fall 2024-'25

Monday

Wednesday

September 23	September 25
Course Overview: Foundation Models, 3D and 4D Tasks. Lecture Slides: Intro Reading: Attention is all you need (Transformers) Language models are few-shot learners (GPT-3) On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Denoising Diffusion Probabilistic Models (DDPM) The Mythos of Model Interpretability When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models The Ethics of AI Ethics: An Evaluation of Guidelines	Introduction: Geometry Representations: Implicit and Explicit, Structured and Unstructured. Lecture Slides: GeomReps Reading: Representations of Geometry for Computer Graphics DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 3D Gaussian Splatting for Real-Time Radiance Field Rendering
September 30	October 02
Survey of Large Language and Language-Vision Models I. Lecture Slides: FMSurvey1 Reading: An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) Language Models are Unsupervised Multitask Learners (GPT-2) Scaling Law for Autoregressive Generative Models (Scaling Law) Elucidating the Design Space of Diffusion-Based Generative Models (EDM) High-Resolution Image Synthesis with Latent Diffusion Models (LDM)	Survey of Large Language and Language-Vision Models II. Lecture Slides: FMSurvey2 Reading: Training Language Models to Follow Instructions with Human Feedback (InstructGPT) Gorilla: Large Language Model Connected with Massive APIs Chain-of-Thought Prompting Emerging Properties in Self-Supervised Vision Transformers (DINO) Segment Anything (SAM) Learning Transferable Visual Models From Natural Language Supervision (CLIP) BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (BLIP-2) The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
October 07	October 09
3D Awareness Assessment of Current Foundation Models. Lecture Slides: 3D_Awareness Reading: A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence Improving 2D Feature Representations by 3D-Aware Fine-Tuning Pose Priors from Language Models Deep ViT Features as Dense Visual Descriptors Probing the 3D Awareness of Visual Foundation Models	In Context Learning for 3D / 4D. Lecture Slides: InContext Reading: MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
October 14	October 16
Fine Turning Foundation Models for 3D / 4D Queries, Low Rank Adaptation. Lecture Slides: FineTuning Reading: LoRA: Low-Rank Adaptation of Large Language Models SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model CLIP Can Understand Depth	Parametric 3D Geometries, Humans Models. Lecture Slides: ParamHumans Reading: SMPL: A Skinned Multi-Person Linear Model ChatPose: Chatting about 3D Human Pose ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning
October 21	October 23
2Dfor3D: Neural Rendering. Lecture Slides: NeuralRendering Reading: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis 3D Gaussian Splatting for Real-Time Radiance Field Rendering Instant Neural Graphics Primitives with a Multiresolution Hash Encoding Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields Mip-Splatting: Alias-free 3D Gaussian Splatting	2Dfor3D: Distillation, 3D Features. Lecture Slides: Joint2D3D Reading: Decomposing NeRF for Editing via Feature Field Distillation Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images Improving 2D Feature Representations by 3D-Aware Fine-Tuning Projects discussion.
October 28	October 30
Programmatic Representations of Geometry; Synthetic 3D / 4D Data. Lecture Slides: IntrinsicReps Reading: Learning the 3D Fauna of the Web. WonderJourney: Going from Anywhere to Everywhere WonderWorld: Interactive 3D Scene Generation from a Single Image PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation	Explicit Geometry: Neural Approaches for 3D Point Clouds and Meshes. Lecture Slides: PointsMeshes Reading: PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space Dynamic Graph CNN for Learning on Point Clouds Deep Hough Voting for 3D Object Detection in Point Clouds MeshCNN: A Network with an Edge Project proposals due.
November 04	November 06
Foundation-Assisted Agents for 3D / 4D Content Creation. Lecture Slides: Agents Reading: BlenderAlchemy: Editing 3D Graphics with Vision-Language Models Can Large Language Models Understand Symbolic Graphics Programs? AWOL: Analysis WithOut synthesis using Language Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions	Shaping Latent Spaces for Geometry, Topology, and Physics. Lecture Slides: LatentSpaces Reading: GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors Enhancing Implicit Shape Generators Using Topological Regularizations
November 11	November 13
Image Diffusion. Lecture Slides: Diffusion Reading: Denoising Diffusion Probabilistic Models Generative Modeling by Estimating Gradients of the Data Distribution Denoising Diffusion Implicit Models Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications Generative Modeling by Estimating Gradients of the Data Distribution	3D from Language Lecture Slides: Text_2_3D Reading: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Zero-Shot Text-Guided Object Generation with Dream Fields DreamFusion: Text-to-3D using 2D Diffusion
November 18	November 20
3D from Image. Lcture Slides: Image_2_3D Reading: Neural Scene Representation and Rendering Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations pixelNeRF: Neural Radiance Fields from One or Few Images EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models Zero-1-to-3: Zero-shot One Image to 3D Object LRM: Large Reconstruction Model for Single Image to 3D CAT3D: Create Anything in 3D with Multi-View Diffusion Models	3D from Video; Motion Models. Lecture Slides: VidGen Reading: Scalable Diffusion Models with Transformers VideoPoet: A Large Language Model for Zero-Shot Video Generation Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models L4GM: Large 4D Gaussian Reconstruction Model
November 25	November 27
Thanksgiving Holiday (no class),	Thanksgiving Holiday (no class),
Dercember 02	December 04
Additional Student Paper Presentations. Project Presentations.	Student Project Presentations.

. .