Interactive Modeling with an Indycam
and Complex Domain-Specific Constraints
Lucas Pereira
Olaf Hall-Holt
CS223B - Intro to Computer Vision
348c: Sensing for Graphics
March, 1997
Contents:
Here are a few of the models that we generated. You can click on
the picture to download an Inventor 2.0 scene-description file.
Note that we saved only the geometry, not the instance of the
domain, so the model is not editable (beyond basic operations).
-

alldesk.iv.gz (15 MB uncompressed)
-

book_OpInv1.iv.gz (1.4 MB uncompressed),
book_OpInv2.iv.gz (1.4 MB uncompressed),
book_OpInv3.iv.gz (1.4 MB uncompressed)
-

book_foley1.iv.gz (4.3 MB uncompressed),
book_foley2.iv.gz (4.3 MB uncompressed),
book_foley3.iv.gz (4.3 MB uncompressed),
book_foley4.iv.gz (4.3 MB uncompressed)
-
rubiks.iv.gz (450 KB uncompressed)
-

smallrubiks.iv.gz (120 KB uncompressed)
The goal of this project was to try to expand the class of real-life
objects that can be modeled relatively easily, without expensive
equipment such as a a laser scanner. For this reason, we chose to
limit ourselves to two (fairly) commonly available input devices:
an Indycam, and an intelligent user.
Our intent was to develop a proof-of-concept program that would explore
the issues, challenges, and decisions involved in writing such a
program. As such, our program is missing quite a few basic features
that would eventually be desirable, such as saving an instance of
a domain, rather than just the Inventor file. However, we think
we've built a subset large enough to demonstrate that this approach may have
some merit, especially for classes of objects that lend well to procedural
description.
Our basic idea is that we break the modeling process into two
distinct phases:
- Domain Development
In this phase, the user writes a C++ class that describes some
knowledge, common sense, or constraints about a specific class
of objects, which we call the domain. This domain has
three basic operations:
- Initialize
In this phase, the user sets up the user event callbacks and any
other necessary initialization of internal state of the object.
The parameters of the domain are initialized to default values.
- Interact/Refine
In this phase, the domain uses the callbacks (in response
to user events) to update the paramaters of the domain for
this specific instance.
- Create Geometry
In this phase, the domain generates a "current" model, based
on the object instance's parameters. This model is an
Inventor file, which can be viewed by the user and saved
to a file.
- Interactive Modeling
In this phase, the modeler uses the program, with the new built-in
callbacks for a specific domain, to model some instance of that
domain. As the program responds to user interaction (which may
involve high-level functions, such as calling a computer vision
algorithm), the domain updates its parameters to refine the domain
instance until it nearly matches (to the desired degree) the real-life
object.
This project does not really introduce any new vision algorithms.
Rather, it provides a framework in which such algorithms can
be applied. In fact, the programmer of the domain-development
phase has almost unlimited freedom in his choice of actions.
We hope that virtually any kind of vision algorithm or scanning
technique (such as laser scanning) could be incorporated in this
framework. Furthermore, since the domain developer can add any
type of data structure to the domain, the domain developer could
even add such things as confidence values, standard deviations,
and interval arithmetic. The only requirement is that the developer
can then map this data (in whatever form) to some instance as
an Inventor file.
Our paper goes into more detail about background
work done by others.
Paul Debevec's work on
image & geometry-based rendering
is particularly relevant. His implementation could be considered an
example of one (well-built) domain.
The domains are written as C++ classes. We hope that this would encourage
the re-use of helpful code through inheritance. As an example, we
developed four domains:
- base domain:
This is the prototype for every domain. It defines the basic
operations (initialize, handle callbacks, and create geometry).
It also has some basic controls that seem generally useful, such
as changing modes in the modcam program (A subclass domain can
add additional modes, but the basic modes are: none, manipulation,
photo capture, and edge-detection).
- box domain:
This simple domain handles texture-mapped cubes. In texture mode,
the user aligns the object with the background and then clicks on
the face. The object grabs the background texture and maps it on
the face.
- openbook domain:
This domain inherits the box domain. It uses either 2 or 6 textured
cubes to represent the covers, spine, and text of two halves of a
book. It has built-in constraints to align the two halves so that
the spine is always in contact.
- banana domain:
This domain inherits the base domain. It includes a function to
grab texture onto any quad mesh, provided the mesh already has
texture coordinates. (This domain is still under construction).
The SKETCH, modcam, and Debevec interfaces.
Modcam has a simple user interface based on the examiner viewer in the
SGI's Open Inventor package. As you can see in the above (middle) snapshot,
there are three windows. The top window is used for interacting
exclusively with the developing model, while the lower two windows allow
interaction with pictures and/or the model. Each window has an associated
set of standard widgets, which allow for rotation, translation, zoom,
and other operations.
The current mode of interaction is shown in the top left of the top window.
The default mode is "none", in which all manipulations change the camera.
Some modes are domain-independant, like the Texture mode, used to assign
textures to faces of the model, and the Edge mode, used to specify edges
in the picture(s). Domain specific modes also exist for actions that only
make sense in particular contexts. For example, a mode exists for directly
modifying how open an open book should appear. Finally, there are a couple
menu options, such as for saving an Inventor file. We expect a more
sophisticated interface could be valuable for more complex domains, but
this approach has served us well so far.
The avaliable commands are listed in the
modcam user's guide.
,
==>
We employed the techniques presented in Prof. Tomasi's
Introduction to Computer Vision
course to calibrate the Indycam. We first took a picture
of a series of concentric circles on a flat paper at a known distance
from the camera. The circles were exactly .5 cm apart, and we estimated
the distance to the camera's focal point to be 23 cm. We examined this
picture to ascertain that the aspect ratio was close to 1 (it was within
1% of 1) and to calculate the field of view, which is roughly 42 by
30 degrees. We then found the pixel coordinates of the intersections of
the circles with the horizontal centerline, and used a least-squares fit
to approximate the fifth-order polynomial that could best map uniformly
spaced pixel coordinates to the coordinates found along the centerline.
With the assumption that geometric distortion is radially symmetric,
we used this fifth-order polynomial to "unwarp" subsequent images.
Specifically, given a pixel location in the unwarped image, we mapped
it back using our polynomial to the corresponding position on a ray
out from the center on the original image, and interpolated between the
nearest four pixels to obtain the desired color value.
We are experimenting with a variety of techniques to make picking edges
easy. We would like to be able to draw along an edge, and have a more
precise calculation refine the position of the edge to some high accuracy.
At the moment, our algorithm proceeds in three steps, detailed below.
First, we simplify the problem by assuming that we are looking for
an edge that is straight, and looking for a straight segment that well
approximates the user input. We draw a segment connecting
the first and last points of the user's curve, then translate it
orthogonally to balance the "weight" of the points on either side.
Second, we define a cost function on each of the points of our initial
image. We are using a simplification of the cost function used by Mortensen
and Barrett for their Intelligent Scissors program, namely a linear
combination of the Laplacian zero-crossing function and the gradient
magnitude. The latter of these is scaled to lie in the range [0,1],
and then we use weights of .7 and .3 for the linear combination. This
defines a cost on each grid point. We then extend these samples to
a planar region by bilinear interpolation, and define the cost function
for a segment to be the line integral of this function over the length
of the segment, divided by the length of the segment.
Third, we make local modifications to the endpoints until a local
minimum of the cost function is reached. So far, we have implemented
an approach that only examines segments whose end-points lie on the
grid. We consider all the neighboring gridpoints of our initial
endpoints, and pick the pair of neighbors which reduces the cost function
by the greatest amount. We then iterate, but limit movement to the
a single general direction after the first iteration. That is, we only
allow an endpoint to move toward one of the neighboring pixels of the
initial pixel "direction" after the first iteration.
The above technique for edge location has several drawbacks. First of
all, the cost function has many local minima for typical images, so
many that the edge can easily "get stuck" due to variations in image
intensity that seem invisible to the casual observer. One way to address
this problem would be to define our cost function using a pyramid of
resampled images, so that more distant, but strong, edge features can
have a larger region of influence. Second, it is not that uncommon on
very strong edges for this technique to shrink the length of the segment
to almost zero. It would be preferable for the segment to instead have
a propensity for growth, so that only a section of an edge needs to
be indicated in order for the entire edge to be recovered, as was done
in the Debevec's facade program. For this, we could redefine our cost
function to reward greater lengths, but we would need to do so carefully
to avoid making them overlong. Finally, we would like to use gradient
descent methods in order to allow edge location to sub-pixel accuracy.
At the moment, however, the quality of the images we get from the camera
may not justify any further enhancement in this direction.
In order to recover the camera position and more accurately estimate
the proportions of our modelled shapes, we followed the work of
Debevec et. al. to create a non-linear local optimizer. For the
moment, our implementation focuses on simple boxes. Given sufficient
edge correspondences in the image, we calculate the positions of the
projections of four or more corners of the box in the picture. We then
define our cost function to be the sum of the squared distances to
corresponding points in our box model when projected using the current
camera and object parameters. Using a standard conjugate gradient method
(Newton-Raphson and Fletcher-Reeves) we quickly obtained our desired
transform matrices. We hope to extend this to handle more complex
domains, and at least allow the positioning of edges relative to the
current best guess for the camera position.
Here's a few items left on our to-do list at the end of the quarter:
- Save Domains, not just the resulting Inventor files.
- Smooth the seams between images.
- Write the "texture-mapped Inventor object" domain. It could load in
a pre-existing geometric Inventor model, and allow the user to
use the camera to place texture on it, and then save the
texture-mapped object.
- Build a polyline primitive, which would allow a user to trace a line
on the pictures from two angles, and create a 3-D polyline that
could be used to affect modeling.
- Write other domains (trees, etc).
- Modcam Core:
- modcam.C
The main file. It sets up all the widgets, and the core
callbacks.
- Interface.C,h
Sets up the menubar, and handles file I/O.
- PullDowns.h
Definitions for the pulldown menus.
- WorldInfo.C,h
Classes to keep track of the world data. Left over from noodle,
and nearly obsolete.
- find_edge.C,h
Routine to use gradient-descent to find edges in an image.
- debug.C,h
Debugging utilities.
- Domain Interface:
- domains.C,h
Contains enumerated lists of the available domains.
- domain.C,h
The prototype domain. It handles some of the generic domain
events.
- Specific Domains:
- boxdomain.C,h
(Inherits domain): The domain of textured boxes.
- openbookdomain.C,h
(Inherits boxdomain): The domain of open/closed textured books,
assuming no curved edges.
- bananadomain.C,h
(Inherits domain): The textured banana domain.
Note:
= New references (not mentioned in our
paper)
bold = major references
-
Curless, Brian and Marc Levoy,
"A Volumetric Method for Building Complex Models from Range Images,"
Siggraph 1996, 303-312.
-
Debevec, Paul E., Camillo J. Taylor, and Jitendra Malik,
"Modeling and Rendering Architecture from Photographs:
A hybrid geometry- and image- based approach,"
Siggraph 1996, 11-20.
Galyean, Tinsley, "Sculpting: An Interactive Volumetric Modeling
Technique," Siggraph 1991, 267-274.
-
Gortler, Steven J., Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen,
"The Lumigraph," Siggraph 1996, 43-54.
Hanrahan, Pat and Paul Haeberli,
"Direct WYSIWYG Painting and Texturing on 3D Shapes",
Siggraph 1990, 215-223.
-
Mortensen, Eric N., and William A. Barrett,
"Intelligent Scissors for Image Composition,"
Siggraph 1995, 191-198.
-
Sato, Yoichi and Katsushi Ikeuchi,
"Reflectance analysis for 3D computer graphics model generation,"
technical report CMU-CS-95-146, 1995.
-
Shewchuk, Jonathan R.,
"An Introduction to the Conjugate Gradient Method Without the
Agonizing Pain," technical report CMU-CS-94-125, March 1994.
(postscript available)
-
Szeliski, Richard,
"Rapid Octree Construction from Image Sequences,"
CVGIP: Image Understanding 58, 1 (July 1993), 23-32.
-
Taubin, Gabriel, "A signal processing approach to fair surface design,"
Siggraph 1995, 351-358.
-
Tsai, Roger Y., "A Versatile Camera Calibration Technique for High Accuracy
3D Machine Vision Metrology Using Off-the-shelf TV Cameras and Lenses,"
IEEE Journal of Robotics and Automation, 3(4):323-344, August 1987.
(Software available.)
Wang, Sidney W., and Arie E. Kaufman, "Volume Sculpting,"
1995 Symposium on Interactive 3D Graphics, 151-156.
-
Zeleznik, Robert C., Kenneth P. Herndon, and John F. Hughes,
"SKETCH: An Interface for Sketching 3D Scenes," Siggraph 1996,
163-170.