Many of these project ideas could be Siggraph, Eurographics, or CVPR/ICCV/ECCV papers. (In fact, two student projects from the 2004 version of this course did become SIGGRAPH papers.) To keep you and your fellow students from being scooped, please don't distribute this handout, or the URL of this web page, or its ideas, beyond the Stanford community.
If you want clarification of a project idea, or you have an idea we haven't listed, come talk to us! In general, Marc is the best person to talk to to figure out if your idea is novel, plausible given the time frame, and interesting, and Andrew is the best person to talk to find out if your idea is possible given the hardware and programming constraints of the devices. Don't consider the list below as your only options. Either of us would love to brainstorm with you about your ideas.
The projects are grouped according to what hardware they will use. The major two options are the Nokia N95, and our own Frankencamera. Here are the pros and cons of each choice:
Pros:
Pros:
If a student or team would like to play with touch screen interfaces for computational photography, but they don't want the full Frankencamera experience, we can make available a standalone N800, which is the touchscreen viewfinder on the Frankencamera. It runs Linux, and although it's not an x86 architecture we have a VM to help with cross-compilation. However, it doesn't have a capable camera (just a low resolution videoconferencing camera). Finally, we've listed a few projects that can be implemented on ordinary digital cameras, as long as they are programmable, and a few ideas in automotive imaging.
- Light Field Fax Machine
Design an appropriate algorithms and set of markers so that you can do the following:
- Print your markers onto a sheet of paper.
- Place an object of interest on top of the sheet.
- Wave your N95 around the object, viewing it from many angles.
- Your software on the N95 should detect the markers and calculate from which angle it is viewing the object. It should save important frames.
- Send the frames, tagged with their viewing direction to another person's N95.
- The other person should be able to print out the same sheet of paper, and wave their N95 around it (without the object). The N95 should calculate the viewing angle from the markers and display the appropriate frame, as if the object were on top of the paper. Their N95 has become a virtual window into the world that was captured by your N95.
- Painted Aperture v1
It is possible to combine the images from several side-by-side small cameras to act as a single camera with a much larger aperture, and hence a much shorter depth of field. With a very short depth of field, things that are out of focus are so out of focus that they become invisible, allowing you to look through foreground occluders such as bushes. (See our demonstration using the Stanford Multi-Camera Array at http://graphics.stanford.edu/projects/array/videos/crowd0-sap.mpg.) It should also be possible to move a single small camera through a range of positions, and 'paint' a synthetic aperture. The resulting images should be aligned to each other on the background and combined in such a way that removes foreground occluders. Think carefully about sampling pattern and density. Using this technique to take photographs of sunbathing celebrities through dense foliage is optional. See also the project below on a synthetic aperture rear-view mirror.
- Painted Aperture v2
In portrait photography, large apertures are useful to blur out background detail, leaving only the person's face in focus. Using the technique described in the previous project idea, but aligning on the foreground instead of the background, make large aperture portrait photos possible on the N95. For extra fun on this or the previous project, implement a touchscreen user interface (using a Frankencamera or N800) for choosing the object on which to align, and hence where to focus.
- Motion Blur Game Controller
Computer vision techniques can be used to track objects in a camera's field of view. One use of this tracking can be to estimate camera pose, which can in turn be used as a continuous input device suitable for game control. See [Adams 08] or ask Andrew for more detail. These techniques fail when camera motions are large, because motion blur makes tracking difficult. If you could detect the direction and length of the motion blur, however, this could be used as an input device in itself. Write a simple game that uses quick movements of the camera, combined with a motion blur detection algorithm, as the input device.
- Moment Camera
A moment camera stores a circular buffer of recent frames over the past n seconds. When the user presses the shutter, they can select as their output any photo from that past n-second window of time. They can also combine the last n seconds of data in other ways. See [Cohen 06] or google "moment camera" for more details. Implement a moment camera on the N95, using viewfinder frames for your circular buffer and taking a single high-resolution snapshot when the user presses the trigger. Backpropagate the high-resolution information from the snapshot to make a high-resolution video of the last n seconds. You may also want to see [Bhat 07] for inspiration. Or use the low-resolution video to boost the resolution, dynamic range, or other aspects of the high-resolution snapshot - using image blending [Sawhney 01], texture synthesis [Wei and Levoy 00], image analogies [Hertzmann 01], or an algorithm of your own invention.
- Low Light Photography using Many Images v1
In low light conditions, a long exposure image will have low noise, but will likely be motion blurred due to hand shake. A shorter exposure image will require a higher analog gain on the sensor, and will be noisy. Write an algorithm that combines two such images to produce a single noise-free blur-free image. Check out this paper for ideas.
- Low Light Photography using Many Images v2
Blurry images produced by camera shake can be modeled as the convolution of an unknown sharp scene by an unknown blur kernel representing the path taken by the camera. Solving for the unknown sharp scene is an ill-posed deconvolution problem. [Fergus 06] shows how the statistics of natural scenes can be used to constrain the solution to plausible ones. If you put a camera in burst mode, it will produce a collection of images, each blurred by a different kernel, but all derived from the same sharp scene (assuming nothing moves except the camera). Develop an algorithm that takes advantage of these extra images to improve the results.
- Distributed Flash
Flash photos often look terrible, as the location of the camera is almost the worst possible place for light to come from. Fancier flashes are designed to be able to point upwards or sideways in order to bounce the light off nearby surfaces instead, but this requires a much brighter flash. Instead, take advantage of your friend's flash! Write a program that synchronizes the flash across multiple N95s using bluetooth or wifi to provide light from more directions. One N95 should act as the master, and the others as slaves. Consider simplifying the task by using NTP to synchronize all the N95s to a global clock.
We have a number of Mitsubishi Pocket Projectors. They are small dim LED based projectors that can be run off batteries, and are suitable for indoor, short throw use. Here are a few project ideas that use an N95 with its TV-out cable plugged into a projector.
- Mobile 3D Scanning
A well established method for extracting the 3D shape of an object is to project a sequence of stripe patterns, coding each patch on the object with a unique sequence of illumination over time. This allows you to compute a correspondence from camera pixels to projector pixels, with which you can triangulate and estimate depth. Construct a mobile 3D object scanner using an N95 and a Pocket Projector.
- Mobile Direct/Indirect Separation
A recent paper [Nayar 06] describes a process using a project to separate the light in a scene that bounces once and enters the camera, from the light that scatters around the scene multiple times before entering the camera. Implement a mobile direct/indirect separation viewer using a camera and N95, so that you can walk around analyzing scenes in this way. Interesting objects include cloudy water, human flesh, vegetables, gemstones, and complex arrangements of diffuse objects.
- Structured Flash
One problem with flash photography it may dramatically overexpose nearby regions, while not providing enough light to far regions. Use a projector plugged into an N95 to make a flash that adapts to bring all areas of the image up to a uniform brightness. The captured image can then be divided by the projected image to produce a high dynamic range view of the scene. The adaptation will require a mapping from camera pixels to projector pixels, using a similar method to that described in the Mobile 3D Scanning project idea.
Projects using the frankencamera will likely involve hardcore hacking. Depending on the project you may be soldering hardware, writing linux kernel modules, or programming an FPGA. A frankencamera project will be hard, but it will also be part of our active research, and is most likely to result in a paper. Some of these projects could alternatively be done using any programmable digital camera, as noted below.
- Foveal Camera
A basic primitive operation for computational photography is to take a quick burst of images with varying settings. Most cameras unnecessarily reset the sensor every time a single setting (other than exposure) is changed, introducing a delay of around a second. Modify the frankencamera to be able to change its settings very rapidly in the brief time period between frames. Use this capability to construct a camera that rapidly alternates between a 640x480 view of the entire scene, and a 640x480 view of one particular part of the scene. This kind of alternation would be very useful for autofocus algorithms. Another application might be to capture and present to the user a wide angle view of the scene, while simultaneously allowing them to follow a moving object, animal, or person with their finger on the touchscreen. This followed object would then be displayed as a high-resolution inset. For a bigger challenge (or a bigger team), track moving objects automatically (see the next project).
- Keeping moving objects centered
Optical or digital image stabilization, available on many commercial still and video cameras, assume that the scene is relatively static. In sports or nature photography, this is often not the case. Instead, the photographer wants to keep a moving object centered in the frame. Using a touch-screen interface, an algorithm, or some combination of these, implement an interactive system that would help a photographer keep a moving object or animal centered in the frame during a photographic burst. Remember that if the object is moving towards or away from you or in a curve, their motion may not be straight. Since the motion of objects is typically more than can be accommodated by an optical image stabilization system, combine your algorithm with the foveal camera described in the previous project. By the way, don't ignore motion blur; it's one of the main challenges in this project. For an extra challenge, use object recognition algorithms to "learn" how to find the animal automatically after a few pokes on a touchscreen, after which it would be followed automatically.
- Metering for HDR
High dynamic range photographs capture the scene at many exposure settings to faithfully record all parts of the scene without under or over exposing. There are no good algorithms, however, for deciding which exposure settings are necessary. Using the Frankencamera or a digital camera with computer control over shutter and aperture, write an algorithm that that adaptively decides which exposure and gain settings are necessary to capture the entire scene in a minimum number of shots.
- HDR Viewfinding
One of the drawbacks of HDR photography is that you don't see a satisfactory image until after some offline processing. Thus, it's hard to know if you've really captured the scene. Let's use the Frankencamera to fix that. Rapidly alternate between a few different exposures the scene (as in the Metering for HDR project idea), combine subsequent frames at different exposures into a single HDR image, and tone map it for live display on the Frankencamera's viewfinder. You might want to plug the Frankencamera into a laptop for extra compute power. See also the Heads-up HDR project below.
- Light Stage Ring Flash
Construct a macro ring flash for the Frankencamera consisting of a circular bank 8 or 16 bright white LEDs. The LEDs should be software controllable from the USB, serial, or GPIO pins on the Frankencamera (ask Andrew or Eddy how to do this). Use the lights, with a suitable set of basis illumination functions, to produce a macro photography version of Paul Debevec's light stage. With this device you should be able to do limited synthetic relighting of macro scenes. Create a GUI that lets you describe the synthetic relighting. This project should be done as a team, with at least one person working on the hardware, and at least one on the algorithms. You will want to start by producing some dummy data sets so that you can work on the algorithms and GUI while the hardware is under construction. If you are feeling particularly brave, or want to extend this work into the summer (we have CURIS and REU positions available for undergrads), account for misalignment between frames so that you can make the whole thing hand-held and take it outside!
While the focus of this course is on programmable portable cameras (like cell phones), for some projects an ordinary programmable digital camera makes a fine research platform. In particular, we have several Canon SLRs in the lab, with SDKs that run on PCs or Macs. You're welcome to borrow one of these cameras for your project; just promise that you'll never mount it on a tripod!
- Optimal walks through parameter space
Even after a photograph has been composed, there are many parameters to adjust on a digital camera: focus, aperture, shutter, ASA, flash intensity. In their SIGGRAPH 2005 paper on flash-noflash photography, Agrawal et al. suggest adaptively sampling the 2D space of exposure time x flash intensity in order to capture a scene with a minimum number of shots. If one adds focus, aperture, and ASA, the space becomes 5-dimensional. Propose and implement a strategy for walking through this space, with the goal of capturing the best possible picture in the least number of shots. Your strategy may depend on your definition of best; perhaps your camera should offer multiple modes: sharpest, least noisy, lowest power usage, etc. Think carefully about image alignment - if the pictures differ markedly in character, accurate and robust alignment may be harder than you think.
- Touch-driven tone mapping
Fully automatic tone mapping algorithms often fail to produce satisfactory images. User guidance in the form of sliders can help, but sometimes it's not enough. [Lischinski 06] describes a tone mapping algorithm guided by a set of strokes drawn by the user. The user draws one or more strokes on an initial tone-mapped HDR image, then for the region surrounding each stroke selects an exposure adjustment (up or down). It's hard to draw strokes with a touch screen, but it's easy to poke at an area and swipe up (for brighter) or down (for darker). Modify Lischinski's algorithm for such an interface. Even better, can you make his algorithm progressive, so that the tone mapped image improves as you touch and adjust more areas?
- Selecting focus and depth of field from focal stacks
Imagine a camera that sweeps quickly through a range of focus settings, taking an image at every setting. This would allow you to select the focus and depth of field afterwards using an interactive graphics program. It's been shown [Agarwala 04] how to combine these images into an "all-focus image", but how about allowing more control than this? Design and implement such a program, starting with focal stacks captured by a handheld camera. You can easily implement this on our Frankencamera, or you can crank the focus on an SLR by hand. You can imagine programs that restrict you to physically plausible virtual cameras, and programs that don't. What creative effects can you achieve using this technique?
- Adjusting a camera based on your photo collection
An increasingly common theme in computational photography is clever ways to leverage existing collections of images when performing image editing. One example that might be of interest to the user of cell phones is improving a photograph, or adjusting a camera before you capture a photograph, by looking at other shots of that same object, animal, or person among a user's existing images - on the device or in the cloud. Examples of improvements are adjusting white balance or exposure, removing camera shake, increasing resolution, or removing shadows or occlusions. Even the images a user recently deleted from their camera might provide helpful information about the user's preferences! Try implementing one or more of these ideas. Depending on your approach, algorithms from machine learning might be appropriate.
Prof. Sebastian Thrun and I have been talking about offering a course on automotive imaging at Stanford. Here are two project ideas that helped spur us to think about such a course. We've listed them here because, after all, a car is a mobile computing platform!
- Synthetic aperture rear-view mirror
If you mount a row of webcams immediately above the rear-view mirror in a car, align and add their imagery (aligning the imagery on distant objects), and present the resulting video on the rear-view mirror, you can blur out the posts in the back of the car. The result is a synthetic aperture view of the traffic behind the car - from a perspective similar to that of a normal rear-view mirror - but without the car's blind spot! See the Painted Aperture v1 project for more details and an example of this effect. We have external funding for this project; if it works, it could be big.
- Heads-up HDR display for drivers
Driving at night on rural roads is difficult because the headlights of oncoming cars blind us. The underlying cause of this problem is glare in our eyes, which gets worse as we age. The problem is also worse on rainy nights, when the road surface becomes more reflective. Implement an HDR video system, i.e. a video camera cycling through multiple exposure times, followed by real-time tone mapping to compress the results down into a displayable image. Eventually, this video could be displayed to the driver in a small inset monitor, or using a heads-up display. The notion is that when the driver is blinded by an oncoming headlight, the HDR imaging system might not be. We have a high-speed video camera in the lab, but our Frankencamera might be a better platform on which to prototype the idea. In the latter case, this project is essentially an application of HDR Viewfinding (see above). However, the best tone mapping algorithms for viewfinding and safe driving might be quite different.
As long as this list is, don't consider these your only options. If you have an idea for another project, or a variation on one of these ideas, come talk to us; we'd love to brainstorm with you about it!