CS468 Class Schedule, Fall 2024-'25


Monday
Wednesday

 


September 23
September 25

Course Overview: Foundation Models, 3D and 4D Tasks.

Lecture Slides: Intro

Reading:

Attention is all you need (Transformers)

Language models are few-shot learners (GPT-3)

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?

Denoising Diffusion Probabilistic Models (DDPM)

The Mythos of Model Interpretability

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

The Ethics of AI Ethics: An Evaluation of Guidelines

 

Introduction: Geometry Representations: Implicit and Explicit, Structured and Unstructured.

Lecture Slides: GeomReps

Reading:

Representations of Geometry for Computer Graphics

DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

3D Gaussian Splatting for Real-Time Radiance Field Rendering

September 30
October 02

Survey of Large Language and Language-Vision Models I.

Lecture Slides: FMSurvey1

Reading:

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) 

Language Models are Unsupervised Multitask Learners (GPT-2)

Scaling Law for Autoregressive Generative Models (Scaling Law)

Elucidating the Design Space of Diffusion-Based Generative Models (EDM)

High-Resolution Image Synthesis with Latent Diffusion Models (LDM)

 


 

Survey of Large Language and Language-Vision Models II.

Lecture Slides: FMSurvey2

Reading:

Training Language Models to Follow Instructions with Human Feedback (InstructGPT) 

Gorilla: Large Language Model Connected with Massive APIs 

Chain-of-Thought Prompting

Emerging Properties in Self-Supervised Vision Transformers (DINO)

Segment Anything (SAM)

Learning Transferable Visual Models From Natural Language Supervision (CLIP) 

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (BLIP-2) 

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

 

October 07
October 09

3D Awareness Assessment of Current Foundation Models.

Lecture Slides: 3D_Awareness

Reading:

A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Pose Priors from Language Models

Deep ViT Features as Dense Visual Descriptors

Probing the 3D Awareness of Visual Foundation Models


In Context Learning for 3D / 4D.

Lecture Slides: InContext

Reading:

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

October 14
October 16

Fine Turning Foundation Models for 3D / 4D Queries, Low Rank Adaptation.

Lecture Slides: FineTuning

Reading:

LoRA: Low-Rank Adaptation of Large Language Models

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

CLIP Can Understand Depth

 

Parametric 3D Geometries, Humans Models.

Lecture Slides: ParamHumans

Reading:

SMPL: A Skinned Multi-Person Linear Model

ChatPose: Chatting about 3D Human Pose

ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning


 

October 21
October 23

2Dfor3D: Neural Rendering.

 

Lecture Slides: NeuralRendering

Reading:

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Mip-Splatting: Alias-free 3D Gaussian Splatting

 

2Dfor3D: Distillation, 3D Features.

 

Lecture Slides: Joint2D3D

Reading:

Decomposing NeRF for Editing via Feature Field Distillation

Neural Feature Fusion Fields: 3D Distillation of Self-Supervised 2D Image Representations

ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images

Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Projects discussion.

 

October 28
October 30

Programmatic Representations of Geometry; Synthetic 3D / 4D Data.

 

Lecture Slides: IntrinsicReps

Reading:

Learning the 3D Fauna of the Web.

WonderJourney: Going from Anywhere to Everywhere

WonderWorld: Interactive 3D Scene Generation from a Single Image

PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation

Explicit Geometry: Neural Approaches for 3D Point Clouds and Meshes.

 

Lecture Slides: PointsMeshes

Reading:

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Dynamic Graph CNN for Learning on Point Clouds

Deep Hough Voting for 3D Object Detection in Point Clouds

MeshCNN: A Network with an Edge

Project proposals due.

November 04
November 06

Foundation-Assisted Agents for 3D / 4D Content Creation.

 

Lecture Slides: Agents

Reading:

BlenderAlchemy: Editing 3D Graphics with Vision-Language Models

Can Large Language Models Understand Symbolic Graphics Programs?

AWOL: Analysis WithOut synthesis using Language

Aladdin: Zero-Shot Hallucination of Stylized 3D Assets from Abstract Scene Descriptions

 

Shaping Latent Spaces for Geometry, Topology, and Physics.

 

Lecture Slides: LatentSpaces

Reading:

GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors

Enhancing Implicit Shape Generators Using Topological Regularizations

November 11
November 13

Image Diffusion.

 

Lecture Slides: Diffusion

Reading:

Denoising Diffusion Probabilistic Models

Generative Modeling by Estimating Gradients of the Data Distribution

Denoising Diffusion Implicit Models

Tutorial on Denoising Diffusion-based Generative Modeling: Foundations and Applications

Generative Modeling by Estimating Gradients of the Data Distribution

 

3D from Language

 

Lecture Slides: Text_2_3D

Reading:

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Zero-Shot Text-Guided Object Generation with Dream Fields

DreamFusion: Text-to-3D using 2D Diffusion

November 18
November 20

3D from Image.


Lcture Slides: Image_2_3D

Reading:

Neural Scene Representation and Rendering

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

pixelNeRF: Neural Radiance Fields from One or Few Images

EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

GeNVS: Generative Novel View Synthesis with 3D-Aware Diffusion Models

Zero-1-to-3: Zero-shot One Image to 3D Object

LRM: Large Reconstruction Model for Single Image to 3D

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

 

3D from Video; Motion Models.

 

Lecture Slides: VidGen

Reading:

Scalable Diffusion Models with Transformers

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

L4GM: Large 4D Gaussian Reconstruction Model

 

 

November 25
November 27

Thanksgiving Holiday (no class),

 

 

Thanksgiving Holiday (no class),

 

Dercember 02
December 04

Additional Student Paper Presentations. Project Presentations.

 

 

Student Project Presentations.

 

. .