CS468: Topics in Geometric ComputingFoundation Models for 3D/4D Scene Understanding and Content CreationLeonidas GuibasFall 2024-25 |

Breaking News: In the last few years, large pre-trained models in the language and vision-language areas have shown impressive capabilities and emergent behaviors even for tasks they were not specifically trained on. These so-called foundation models (FMs) are re-shaping how we approach learning problems as we aim for the grand goal of artificial general intelligence (AGI). When it comes to 3D or 4D tasks, however -- tasks that involve spatial reasoning in 3D about geometry and motion, the state of FM development is less clear. This is because current FMs are trained with vast web data that includes text, images, and videos -- but little 3D. It is important to assess the 3D / 4D awareness and capabilities of FMs and study how to improve them, as our world is 3D and perceiving, reasoning an acting on the real world requires 3D understanding. The obvious challenge is that the real 3D data we have is orders of magnitude less that what is available in the language and vision domains. Furthermore, 3D annotations are cumbersome. This course will survey the state of the art of 3D (space) / 4D (space+time) understanding of FMs, explore a variety of approaches towards enhancing that understanding, and study how FMs can be used in a variety of 3D / 4D tasks. Specific topics to be covered include:
he course will require presentations of papers from the current literature in class, active participation in the class discussions, and a collaborative project. These pages are maintained by Leonidas Guibas guibas@cs.stanford.edu. |