CS468 Class Schedule, Fall 2024-'25
Course Overview: Foundation Models, 3D and 4D Tasks.
Lecture Slides: Intro
Reading:
Attention is all you need (Transformers)
Language models are few-shot learners (GPT-3)
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Denoising Diffusion Probabilistic Models (DDPM)
The Mythos of Model Interpretability
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
The Ethics of AI Ethics: An Evaluation of Guidelines
Introduction: Geometry Representations: Implicit and Explicit, Structured and Unstructured.
Lecture Slides: GeomReps
Representations of Geometry for Computer Graphics
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Survey of Large Language and Language-Vision Models I.
Lecture Slides: FMSurvey1
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
Language Models are Unsupervised Multitask Learners (GPT-2)
Scaling Law for Autoregressive Generative Models (Scaling Law)
Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
High-Resolution Image Synthesis with Latent Diffusion Models (LDM)
Survey of Large Language and Language-Vision Models II.
Lecture Slides: FMSurvey2
Training Language Models to Follow Instructions with Human Feedback (InstructGPT)
Gorilla: Large Language Model Connected with Massive APIs
Chain-of-Thought Prompting
Emerging Properties in Self-Supervised Vision Transformers (DINO)
Learning Transferable Visual Models From Natural Language Supervision (CLIP)
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (BLIP-2)
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
3D Awareness Assessment of Current Foundation Models.
Lecture Slides: 3D_Awareness
A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Pose Priors from Language Models
Deep ViT Features as Dense Visual Descriptors
Probing the 3D Awareness of Visual Foundation Models
In Context Learning for 3D / 4D.
Lecture Slides: InContext
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization
Fine Turning Foundation Models for 3D / 4D Queries, Low Rank Adaptation.
Lecture Slides: FineTuning
LoRA: Low-Rank Adaptation of Large Language Models
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
CLIP Can Understand Depth
Parametric 3D Geometries, Humans Models.
Lecture Slides: ParamHumans
SMPL: A Skinned Multi-Person Linear Model
ChatPose: Chatting about 3D Human Pose
ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning
2Dfor3D: Neural Rendering.
Lecture Slides: NeuralRendering
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Mip-Splatting: Alias-free 3D Gaussian Splatting
2Dfor3D: Distillation, 3D Features.
Lecture Slides: Joint2D3D
Decomposing NeRF for Editing via Feature Field Distillation
Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
Projects discussion.
Programmatic Representations of Geometry; Synthetic 3D / 4D Data.
Lecture Slides: IntrinsicReps
Learning the 3D Fauna of the Web.
WonderJourney: Going from Anywhere to Everywhere
WonderWorld: Interactive 3D Scene Generation from a Single Image
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Explicit Geometry: Neural Approaches for 3D Point Clouds and Meshes.
Lecture Slides: PointsMeshes
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
Dynamic Graph CNN for Learning on Point Clouds
Deep Hough Voting for 3D Object Detection in Point Clouds
MeshCNN: A Network with an Edge
Project proposals due.
Foundation-Assisted Agents for 3D / 4D Content Creation.
Lecture Slides: Agents
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Can Large Language Models Understand Symbolic Graphics Programs?
AWOL: Analysis WithOut synthesis using Language
Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions
Shaping Latent Spaces for Geometry, Topology, and Physics.
Lecture Slides: LatentSpaces
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Enhancing Implicit Shape Generators Using Topological Regularizations
Image Diffusion.
Lecture Slides: Diffusion
Denoising Diffusion Probabilistic Models
Generative Modeling by Estimating Gradients of the Data Distribution
Denoising Diffusion Implicit Models
Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications
3D from Language
Lecture Slides: Text_2_3D
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Zero-Shot Text-Guided Object Generation with Dream Fields
DreamFusion: Text-to-3D using 2D Diffusion
3D from Image.
Lcture Slides: Image_2_3D
Neural Scene Representation and Rendering
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
pixelNeRF: Neural Radiance Fields from One or Few Images
EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks
GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models
Zero-1-to-3: Zero-shot One Image to 3D Object
LRM: Large Reconstruction Model for Single Image to 3D
CAT3D: Create Anything in 3D with Multi-View Diffusion Models
3D from Video; Motion Models.
Lecture Slides: VidGen
Scalable Diffusion Models with Transformers
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
L4GM: Large 4D Gaussian Reconstruction Model
Thanksgiving Holiday (no class),
Additional Student Paper Presentations. Project Presentations.
Student Project Presentations.