In a sense, the constraints specify "common sense" about how we expect architecture to be constructed. (Debevec's algorithm would have as much trouble as a human in trying to make sense of an M.C. Escher drawing, which defies "common sense.") By applying the constraints of a particular domain (in Debevec's case, his domain is architecture), the search space of possible objects is reduced to a tractable number.
The goal of my project would be to develop a representation in which "common sense" constraints can be written about arbitrary domains, not just architecture. The representation (or "language") would describe the total search space of possible models in the given domain. Each possible variation would be described by a number with some default value and some measure of confidence of that default value, in the absence of further evidence. The algorithm would explore the different possible variations of each parameter, and determine which variations best match the given photo(s). As it determined correspondences between the model and the photo, it would update the parameter values and the confidence values.
I hope to demonstrate that such a language can be built up with appropriate C++ objects. Each object would list its variable parameters (and confidences), possible children objects, and functions for determining how the "appearance" of the object in a photograph should map into the parameters.
For example:
The Domain of Cups/Mugs:
Cups and mugs generally consist of a central section, which 95% of the time (roughly) is an object of revolution, and may have a handle (30% yes, 70% no). The handle (if it exists) generally has the topology of a half-torus, and generally exhibits centerline symmetry. The outline of the handle and the mug can be parameterized with an array of numbers (as a parametric curve, or a spline, or a displacement map, etc.)
Functions that derive silhouettes against a colored background can be used to determine the shape of the cup, and a function that detects a torus topology can be used to positively identify a handle.
Let's assume that the user has written these constraints somehow in our language. Then the user shows the program a photo of a specific cup, and asks the program to try to reconcile the photo with the constraints. First, it would make a guess at the size and shape of an "average" cup (assuming it is an object of revolution, since this is the most likely case). Then it will try to match the model with the photo. It will try perturbing different parameters, and see if the perturbation creates a better match or a worse match.
For example, one parameter would be whether it has a handle. It would try matching a model with and without the handle, and pick the one that more closely matched the photo. If both models worked out equally well (for example, if the handle was hidden from that view), then it would pick the more probable model, but assign a low confidence value to that decision. In this case, it would decide, with low confidence, that the cup does not have a handle. ("No handle" has higher probability, because it is easier disproved.) If another photo shows the handle, then it would change the parameter to "cup has a handle" with higher probability.
The user's role in this system would be to write the initial C++ description of the "cup domain", and then give the algorithm pictures of the object. Some of the more common functions that map appearance to parameters (such as the "silhouette of an object of revolution" function) would be inherited, so that the new object would hopefully require minimal programming.
One example of a special function that would be helpful in the cup domain would be a function that maps the inner wall of the cup to "1/4 inch inside the outer wall". With this function, the modeler could guess at the inside shape of the cup, without ever actually seeing a silhouette of the inside.
Two steps that I have not mentioned, camera calibration and registration, will need to be implemented. However, I am assuming that I will use known techniques (such as Tsai and Debevec), and will introduce nothing new here.
Due to the limited time available for this project, I will probably only tackle one or two domains. Another possible "easy" domain would be a limited subset of Legos. Given several pictures of a simple lego model, and knowledge about all the possible lego bricks used, the program could try to identify the actual structure of the lego model.
Extensions, probably beyond the scope of this quarter, would be:
In an ideal system, the computer would display the model *as* it is being refined. The user could watch as the program places the lego bricks in the model one by one, and provide feedback about which legos are correct, and which are questionable. The user could click on parts of the cup to indicate "This part of the cup looks fine" and "that part of the cup needs some more work." This would modify the confidence values, and tell the program which parameters of the model to work on first.
Other interesting domains might include: faces, cars, books, flowers, forests, hands, or virtually anything that can be described procedurally.