CS 478 - Computational Photography
(Winter quarter, 2012)

Ideas for projects

About:

Most of these project ideas require a programmable camera. That is, after all, precisely the idea behind this course. For this purpose, each of you has been loaned a NVIDIA Tegra 3 tablet for the duration of the course. The first assignment will give you some experience programming this device and interacting with the camera module; the second assignment, some experience processing the resulting images.

Alternatively, you may propose a different platform for conducting your project, but it should involve a programmable or customizable camera. We have a number of Nokia N900's, a smartphone that also implements the FCam API, that you can borrow, if you would like to work on a project that requires many cameras, but the development environment for the Nokia N900's is substantially different from that for the Tegra 3 tablets. You may also use a standard DSLR camera in order to tinker with the lens optics (e.g. coded aperture.) We will be reluctant to approve projects that process conventional photographs entirely on desktop--while that falls in the domain of computational photography, we are encouraging students to make use of the hardware and the API we possess, that are not available elsewhere. (On the other hand, you are welcome to offload computation onto a networked machine, if you find the computational power of the tablet lacking, and your project does not depend on having low latency.) Still other projects might require you to be handy with a soldering iron. We'll pay for any hardware you need to buy, but talk to us beforehand.

Each idea is accompanied by a link to the relevant paper. We encourage you to search the existing literature for similar papers, or papers on which the example paper is based. If you want further clarification of a project idea, or you have an idea we haven't listed, come talk to us! We will try to help you figure out if your idea is novel, interesting, and plausible given the time frame and the hardware constraints. Please do not consider the list below as your only options. In the past offering of the course, close to half of the projects were inspired by the students' own ideas. Any of us would love to brainstorm with you about your ideas.

Yet another popular source of project ideas is recent SIGGRAPH, Eurographics, ICCP or CVPR/ICCV/ECCV papers (or any other papers discussed in the lecture.) If you find an arresting paper that could be implemented or re-purposed for mobile computational photography, think about how you could extend it.

Conversely, many of these project ideas could turn out to be conference papers themselves. (In fact, two student projects from the 2004 version of this course did become SIGGRAPH papers.) To keep you and your fellow students from being scooped, please do not distribute the URL of this web page, or its ideas, beyond the Stanford community. Many of these ideas are on the back burner for Marc Levoy's students, so if you would like to collaborate with us beyond the duration of the course, you are welcome.

Requirements:

  1. Form a group or one or two, and brainstorm ideas.
  2. Sign up for and complete a project conference (with a staff member) by February 13.
  3. Submit the project proposal by February 15.
  4. Present your project during the finals week.
  5. Submit the project write-up by the end of the finals week.

Examples from 2010:

These are some well-done projects from the previous time the course was offered. You may model your project after one of these, but we would expect some significant extension thereof. Talk to the staff if you are interested and have an idea for an extension.


    "High-Exposure" Projects

  1. Metering for HDR imaging (paper, paper)

    High-dynamic-range imaging consists of capturing images with differentexposures, merging them together, and tone-mapping the result to compress its range for display. Although HDRI has a long history, people have rarely addressed the question of automatically deciding which exposures to capture. Of course, addressing it would require a programmable camera with access to the light meter. The Tegra 3 tablet does not have a light meter, but it has fast access to the sensor and plenty of computational power. Using one of these platforms, explore metering for HDR imaging. What technique works best? Sung Hee Park has looked at this problem recently, so talk to him about the state of the field, and what interesting questions are still unaddressed.

  2. Touch-driven tone mapping on viewfinder (paper, report)

    Fully automatic tone mapping algorithms often fail to produce satisfactory images. User guidance in the form of sliders can help, but sometimes it's not enough. Recent papers implement a stroke-based interface for locally editing the exposure, using various edge-aware edit propogation techniques. See video or the report from 2010, for example. However, all existing tone-mapping algorithms (at least the fancy ones) operate on the final image. Working directly on the viewfinder stream would open up new possibilities. To do this would require tracking of the objects in the scene, so that previous user input could be re-mapped to the appropriate spatial locations.

  3. "Flashy" Projects

  4. Flash matting (paper)

    A fun photographic game is to replace the background in a picture. This requires extracting a matte that specifies which pixels in the picture are foreground and which are background. One technique for performing this so-called matte extraction is to record a flash-noflash image pair, then look for pixels that change the most. If you then record a replacement background picture using the same camera, you can perform the replacement right on the camera. Implement this pipeline with the Tegra 3 tablet.

  5. Borrowed flash (report from 2010)

    Pictures taken using on-camera flash look awful. If red-eye doesn't ruin your shot, those deer-in-the-headlights specular highlights will. A common solution is an off-camera flash, but such flash units are available only for SLRs. For cell phones (or future point-and-shoot cameras with radios), why not borrow a second camera from a friend, slave it to your phone using Wifi or Bluetooth, hold it in your other hand, and program it to flash when your camera takes a picture. A pair of students implemented this for a pair of Nokia N900's in the last incarnation of the course, and did so quite successfully, so we would be looking for something extra this time around. How about using multiple flashes for flash-matting? Are there other flash-based algorithms that could be improved with multiple flashes, or off-camera flash? What if the multiple flashes had different color filters? Could you try to emulate the color temperature of the ambient light, and avoid the flash ruining the scene atmosphere in the photograph?

  6. Flash-No-Flash (paper)

    In a low-light environment, activating the flash is necessary to capture the scene without noise ruining your photographs. However, the flash can change the color tone of the ambient image, and cause the annoying red-eye effects. With a programmable camera, it is trivial to capture a quick succession of flash-no-flash pair. Combining the images on a mobile device for user feedback, however, would be a challenging and worthwhile endeavor. Think about the camera parameters for the flash-no-flash pair, and what kind of user feedback would be appropriate for the composite.

  7. "Usable" Projects

  8. Useful viewfinder visualizations

    The small LCDs at the back of cameras don't make it possible to judge what is in focus. Especially when playing with depth of field effects or a tilt-shift lens, it would be very useful to have a real-time visualization of what is in focus in the image, for example with a heat map. Alternatively, straight lines are important elements for composition. A photographer might want them to stay horizontal or vertical, or to avoid converging parallels (for architectural photography in particular). Real-time analysis and visualization on the viewinder could help a photographer achieve these goals. For more ideas about visualizing focus, see this thought-provoking Open Letter to Leica by Luminous Landscape's Michael Reichmann.

  9. User interfaces for controlling camera settings

    The settings of digital cameras (aperture, shutter, ISO, focus, white balance, etc.) are not very intuitive. When people look at a photograph they think instead of brightness, contrast, noise (or grain), depth of field (features they want to be sharp versus blurry), motion blur, color balance, etc. Using the touch screen on the Tegra 3 tablet, explore alternative ways to control camera settings. This problem is more challenging (and interesting) than it sounds. For example, depth of field depends on the lens, sensor, zoom and focus settings, so work these variables into your algorithms. Similarly, our notion of brightness depends on absolute scene luminance and our state of visual adaptation (see Reinhard's book on HDR imaging for relevant papers); can these variables be worked in? For the user interface, consider capturing bursts of images with alternative settings and displaying them as thumbnail images in a grid on the viewfinder, using some variant on "Design Galleries".

  10. Adjusting a camera based on your photo collection

    An increasingly common theme in computational photography is clever ways to leverage existing collections of images when performing image editing. One example that might be of interest to the user of cell phones is improving a photograph, or adjusting a camera before you capture a photograph, by looking at other shots of that same object, animal, or person among a user's existing images - on the device or in the cloud. Examples of improvements are adjusting white balance or exposure, removing camera shake, increasing resolution, or removing shadows or occlusions. Even the images a user recently deleted from their camera might provide helpful information about the user's preferences! Try implementing one or more of these ideas. Depending on your approach, algorithms from machine learning might be appropriate.

  11. Making use of the tablet form factor (NEW)

    Tablets offer a form factor that differs from that of traditional cameras, whether film or digital. Think about what tasks are suitable for a tablet-based camera, rather than for the more conventional handheld cameras? Should a tablet camera require non-traditional considerations in user interface design? If so, what are they? Think about common uses of tablets (e.g. document-viewing, browsing and consuming multimedia.) Are there camera applications that will augment such uses? For instance, for students who are using a tablet in an educational setting (K-12), it might be useful to have a camera application that can be usd to "scan" a textbook, a blackboard, pieces of handwritten notes and have the images be integrated seamlessly into the document library.

  12. "Deep" and "Shallow" Projects

  13. Applications of approximate depth maps

    By capturing images at two focus settings (or a sequence of settings) and testing the resulting images for sharpness, one can estimate distance to the scene at each pixel. In computer vision this is called depth-from-defocus or depth-from-focus, respectively. It is unclear what type of depth estimation algorithm (and the accompanying capture parameters) is proper for mobile platforms, and this warrants investigations. Note that, although it is difficult to get a pixel-accurate depth map, they might be useful in photography. For example, if you estimated a depth map during aiming, could you use it to capture an all-focus image by moving the lens only to depths containing interesting features? Or to simulate a tilted focal plane? Or to help you extract a matte (see the "Flash matting" project)? Of course, if you can implement a robust depth estimation algorithm on the Tegra 3 tablet, you might be able to do much, much more.

  14. Stereo photography (CAVEAT)

    The Tegra 3 tablet comes with a pair of cameras on the back of the tablet, allowing the user to engage in stereo photograph, in theory. We say "in theory" because currently NVidia has not provided access to the second camera via FCam, and we do not know when exactly it will happen. In case it does in a timely fashion, it might be interesting to see how conventional camera control (auto-focus, auto-exposure, etc) or other algorithms can benefit from a pair of cameras. Of course, one could rely on two networked Tegra 3 tablets in order to simulate a stereo rig.

  15. Computational Re-photography (paper)

    The goal of re-photography is to guide the user to the viewpoint of a reference photograph, so that they could take a matching picture. One can imagine an interface that computationally guides the user to the correct location and perspective. The original implementation of the paper relied on an SLR tethered to a laptop, and we ported it to N900 for the Frankencamera paper, although the N900 could barely keep up with the stream of incoming images. We would like to see a robust implementation on the Tegra 3. Yet another challenge is removing foreground object that weren't present in the reference photograph, for example, by translating the camera and combinining multiple photographs. Could the interface guide the user through this process? This project involves implementing and/or porting a number of computer vision algorithms.

  16. "Moving" Projects

  17. Removing handshake using deconvolution (paper)

    Handshake can be modeled as convolution of an unknown (sharp) scene by an unknown blur kernel (representing the shaking of your hands) to produce the recorded (blurry) image. In theory one can remove handshake using deconvolution, but not knowing the blur kernel makes the problem ill-posed. In the recent years, there have been many papers dealing with this "blind" deconvolution problem, but none of them has been an on-line implementation that performed the computation directly on the camera. There is a strong motivation for removing handshake in a camera: it tells you if you need to take another shot. It may be that the Tegra 3 tablet is powerful enough to finally break the barrier, and we would love to see it tried. Would it help if we had a burst of images that share the same unknown sharp scene? Could we try to estimate the blur kernel using an inertial measurement unit? Would the problem be more feasible if we tried to deblur only percetually significant regions (e.g. faces) to show to the user, and table the rest of the images to off-line computation?

  18. Motion-sensitive low-noise imaging (paper)

    Two years ago we developed an algorithm for aligning successive viewfinder frames on a cell phone in real time. One application we explored was aligning and averaging multiple frames to reduce noise in low-light environments, assuming you held the camera reasonably stationary. However, this technique fails for moving objects; they become blurry. Can you extend the technique to detect motion, then locally reduce the number of frames that are combined in these regions? This is essentially photography (or videography) with adapative per-pixel exposure time.

  19. Painted aperture (app)

    In portrait photography, large apertures are useful to blur out background detail, leaving only the person's face in focus. Using the technique described in the previous project idea, but aligning on the foreground instead of the background, make large aperture portrait photos possible on cameras with small apertures (e.g. smartphones, tablets.) For extra fun on this or the previous project, implement a touchscreen user interface for choosing the object on which to align, and hence where to focus. Professor Marc Levoy's SynthCam implements this on the iPhone, but perhaps you could best him with a novel extension?

  20. Using Inertial Measurements (paper)

    Tegra 3 is equipped with a gyroscope and an accelerometer, which can be accessed with the standard Android API. Could these be used in a creative manner to aid in imaging or image processing? Deblurring with IMU has been reported to be successful, but we are wondering what else is possible.

  21. "Well-timed" Projects

  22. Computer-assisted lucky imaging (paper)

    The term comes from a related technique in astronomy. In photography, the muscle tremors that cause handshake are cyclical. If you shoot a burst of frames, some of them will fall into the minima of these cycles and be relatively free of handshake. For the Frankencamera paper, we bolted an inertial measurement unit (IMU) to an N900. (Note that the N900 does possess a built-in accelerometer, but it was of limited quality.) This allowed us to detect these lucky moments, saving only those frames captured when handshake was minimal. We're wondering whether it would be possible to perform this selection more accurately (although probably not faster) using image-based motion estimation. One challenge is that image-based techniques will also detect (and confuse) handshake with object motion. Perhaps a hybrid of IMU and image-based techniques could solve this confusion. As an extension, could you use this information to set the shutter speed for the next frame - maximizing exposure while avoiding motion blur?

  23. Moment Camera (paper)

    A moment camera stores a circular buffer of recent frames over the past n seconds. When the user presses the shutter, they can select as their output any frame(s) from that n-second window. The Casio EX-F1 has this capability. Implement a moment camera on the Tegra 3 tablet, using (low-resolution) viewfinder frames for your circular buffer. Instead of simply selecting the best frames, try combining them in creative ways. It would be great to combine a low-resolution burst of frames with a single high-resolution frame captured at the instant the user presses the trigger, but our current Frankencameras can't change resolutions fast enough.

  24. "Focused" Projects

  25. Focus Sweep (paper)

    Sharp features that are out of focus have a certain look, called bokeh, which depends on the shape of the camera's aperture. By manipulating the aperture quickly during an exposure, you can change the bokeh for artistic effects. By manipulating the focus during an exposure, you can create other effects. A special case is moving the focus smoothly from front to back during the exposure, sometimes called focus sweep, which blurs out objects (approximately) equally regardless of depth, and deblurring by the kernel can produce an all-focus image. Can you implement this on the Tegra 3 tablet, calibrate the kernel for each sweep range (so that you do not have to rely on blind deconvolution), and implement the non-blind deconvolution on the device? Once focus sweep is implemented, could you devise a user interface that includes virtual aperture control, in addition to focus, exposure, color balance?

As long as this list is, don't consider these your only options. In fact, another long list, which overlaps only partially with this list, can be found on the web site of the Spring 2008 version of this course. If you have an idea for another project, or a variation on one of these ideas, come talk to us; we'd love to brainstorm with you about it!


© 2012 Jongmin Baek, David Jacobs
Last update: February 8, 2012 02:48:44 PM
jbaek@cs.stanford.edu
dejacobs@cs.stanford.edu