Introduction: This assignment is an exercise intended to introduce you to image processing on the Tegra platform. You will perform a variety of processing tasks using OpenCV, ImageStack, and OpenGLSL. Each paradigm is well-suited to different kinds of tasks, so think carefully about which one(s) you would like to use in your final project.
Tasks:
Deliverables:
Unzip it somewhere convenient and create a project from existing source in your Eclipse workspace by following the same instructions given in Steps 4 and 5 of the "Getting Started" section in Assignment 1. Windows users only: If you are using windows and cygwin, you will need to add an additional environment variable definition NDK_USE_CYGPATH=1 before calling ndk-build as well. You can do this the same way we added NDK_MODULE_PATH before.
The first time you compile it might take a few minutes because you'll be compiling the new libraries. The application when run should behave like a basic camera sans autofocus. Under the hood its got a couple extra libraries loaded in and a revamped viewfinder pipeline, but they aren't doing anything interesting yet.
The code skeleton changes necessary to support the image processing libraries require small changes to many files. We figure it will be easier for you to just copy your code from the first assignment to the new code base instead. Once you are done, test to make sure everything still works, since the first task will augment your autofocus functionality.
Be careful when doing the migration. It is easy to lose track of which FCamInterface.java etc. you are editing at any given point. Once you've completed the migration, it might be helpful to close the old project or remove it entirely from your Eclipse workspace.

The autofocus routines you created in the first assignment can be improved by using a little bit of computer vision and scene understanding. In this section, you will add a face-based autofocus mode to FCamera using OpenCV.
OpenCV is the tool of choice for computer vision tasks such as feature detection, tracking, etc. It has been optimized for the Tegra hardware and has a huge community of users, so it is relatively fast and well supported for common tasks. However, recently published research, unless implemented in OpenCV and released by paper authors, may be difficult to incorporate into projects purely using OpenCV.
Subtasks:Look back at Assignment 1 for a reminder of how to add a new touch mode. We don't yet have an appropriate method to call in FCamInterface.java from CameraFragment.java, so for now just make one up.
We need a new method for enqueuing messages for face-based autofocus requests. Add the declaration for one in FCamInterface.java and its implementation in FCamInterface.cpp
The cascade classifier we will use for face detection works by composing a large number of simple image classifiers that have been trained on a large dataset of example images. These parameters of these classifiers are stored in the face.xml. The easiest way to get access to them in the C++ code is to copy the file to a known location in the tablet's filesystem. To accomplish this, enter the following command into a terminal:
student:~/fcam/android/packages/fcamerapro/assets$ adb push face.xml /data/fcam/data.xmlAlthough files placed in the assets directory are automatically pushed to the device, getting access to them from C++ is not easy (Java, on the other hand, is straightforward.)
The push command above used to put the file at /data/fcam/face.xml, make sure to either push it again or change the code that references it in the main camera thread of FCamInterface.cpp.
The main camera thread in FCamInterface.cpp already constructs an instance of MyFaceDetector. Your only job is to implement the detectFace() method. This method should return a vector of Rects that indicate which regions of the image are likely to be human faces. Check out the OpenCV documentation on classifiers to figure out exactly what you need to do.
You'll need to modify the main camera loop and possibly the autofocus subsystem to implement face-based autofocus. The form your solution takes depends on how you completed the previous assignment, so we haven't added many explicit TODOs to the code base. You are more or less on your own here, good luck!
After sending viewfinder frames to the your autofocus's update method, highlight any faces detected by drawing a box around them. The easiest way to do this is to change the pixel values in the frame object (just set the Y channel to 254). Note that if you drew the boxes before calling the autofocus update your autofocus algorithm would falsely interpret the boxes as part of the scene. It's okay if you leave the box highlighting code on all the time (i.e. don't bother adding a check box to the UI to control this if you don't feel like it.)
![]() ![]() | → | ![]() |
Taking photographs in dark environments is challenging. Using a flash will allow you to get a well-exposed photograph with crisp edges, but the scene's ambient light colors are lost. Simply boosting the gain of a photograph will yield more accurate colors, but the image will be quite noisy. In this section, you will use ImageStack to combine the edge information from a flash photograph with the color information from a noisy photograph to create a noise-free(-ish) photo with good colors.
ImageStack is the tool of choice for image processing after capture. It was developed here at Stanford, so there is plenty of local expertise available to you. Many of the techniques we'll discuss in this course are already implemented in ImageStack, so you won't need to spend half your time implementing someone else's research. Internally, it uses floating point images, so it works great for many image domains (e.g. depth data, HDR, etc.,) but can be too memory hungry and slow for real-time applications.
Subtasks:Check out onClick() in CameraFragment.java to see how we request bursts of images with varying parameters. We don't need to change anything here, but it may be useful to be familiar with this chunk of code for your final project.
In order to add a new "Merge Stack" button to the menu in the viewer tab, we'll need to create an additional menu item in the menu's XML description. It seems easiest to edit the XML directly rather than using the graphical interface here. You'll need to choose an icon for the new menu item; @android:drawable/ic_menu_add is a good choice. You'll also want to define a new string for the title of the menu item in strings.xml so you can reference it in viewer_menu.xml.
The onOptionsItemSelected() method is called when menu items are tapped. Add an additional case to the switch statement for your newly minted menu item. This case should check if the currently selected image stack is a flash/no-flash pair and, if so, call an appropriate JNI method in FCamInterface.java that we will define in the next step.
This JNI definition will be a bit more complicated than those we've defined before. The easiest way to get images from the Java side to the C++ side is to pass file paths as Java String objects. After crossing the JNI barrier, these will be jstring objects in C++. The easiest way to get the processed image is via the file system interface already built into the camera's capture routine, so don't worry about returning anything. For now, just add a log message to the C++ function to make sure the UI is set up properly.
Note that because this fusion is explicitly being performed as a post process, there is no need to worry about thread safety and introducing new messages into the camera thread's queue. Instead, we can do all our work in the calling Java thread to make things simple---Blocking the main UI thread for the 15 seconds or so this takes is typically a terrible idea. Strictly speaking, we should ask a background Java thread to do the heavy lifting so at least the UI will be responsive, but we'll let it slide this time since this isn't really the focus of the course and non-trivial to set up.
The JNIEnv method GetStringUTFChars() is useful here. Look it up in the JNI documentation.
The relevant classes and methods are defined in fcam/include/ImageStack/File.h.
This part is mostly up to you. The basic idea is we want to smooth out noise in the no-flash image without crossing edges present in the flash image. The applications sections of the Gaussian KD-tree and Permutohedral lattice papers may help to inspire you. The relevant classes and methods are defined in fcam/include/ImageStack/GaussTransform.h.
This is a meant to be a quick and dirty introduction to using ImageStack. You are not expected to look up state-of-the-art flash/no-flash fusion techniques to complete this assignment.
We can add a new image to the image gallery using the classes defined in AsyncImageWriter.h. Because we're hijacking this functionality that was designed for the handling FCam frames, we will need to create a dummy FCam::Tegra::_Frame object and fill it with our filtered image data. Using the image format FCam::RGB24 for the FCam::Image stored in the dummy frame should make your life a little bit easier. The following code snippet saves an ImageStack::Image named filteredImage to the viewing gallery:
ImageSet *is = writer->newImageSet(); // writer is a global instance of AsyncImageWriter already defined
FCam::_Frame* f = new FCam::Tegra::_Frame;
f->image = FCam::Image(filteredImage.width, filteredImage.height, FCam::RGB24);
for (int y = 0; y < filteredImage.height; y++) {
for (int x = 0; x < filteredImage.width; x++) {
for (int c = 0; c < 3; c++) {
((unsigned char*)f->image(x, y))[c] = (unsigned char)(255 * filteredImage(x, y)[c] + 0.5f);
}
}
}
is->add(fmt, FCam::Frame(f));
writer->push(is);
Answer the following questions in your write up:

Digital camera viewfinders often have a lower dynamic range than the sensor is capable of capturing. Accordingly, it can be difficult for a photographer to tell whether a particularly bright pixel is actually saturated or not. One common technique for helping out in this situation is a "zebra" viewfinder mode. Any pixels that are completely saturated (i.e. their RGB value is (1,1,1)) are replaced with a striped black and white pattern. In this section, you will add a zebra viewfinder mode to FCamera using OpenGL shaders.
OpenGLSL is the tool of choice for parallelizable tasks requiring raw computational power. Unsurprisingly, the Tegra tablet by NVIDIA has a pretty beefy GPU in it. The easiest way to harness all those FLOPs is through clever use of shader programming. Massaging a computational photography task into a form well-suited to a GPGPU implementation can be difficult (and sometimes impossible), so make sure the increased performance is critical to your application before investing too much time with OpenGLSL.
Subtasks:In order to enable multiple viewfinder modes, we'll need to create an additional spinner in the user interface. Change camera.xml appropriately (Hint: copy-paste gets you a long way here.) Note that the graphical layout editor doesn't seem to work here, so you'll need to edit the XML directly.
Your modifications to camera.xml probably make reference to some strings and string arrays that do not exist. You'll need to change strings.xml to keep things kosher. If you run into compilation problems, check the XML as the graphical editor can be finicky.
Modify the appropriate places to get a reference to the new mode spinner. You'll also need to implement the onItemSelected(...) method to notify the CameraView of any mode switches.
Take a look at assets/default.vert and assets/default.frag:
// default.vert
attribute vec4 vPosition;
attribute vec2 vTexCoordIn;
varying vec2 vTexCoordOut;
void main() {
gl_Position = vPosition;
vTexCoordOut = vTexCoordIn;
}
default.vert is called a vertex shader. Each vertex sent into the pipeline gets processed by this shader. In the our case, this will get called four times, once on each corner of the rectangular viewfinder. During each call, the vertex's position and texture coordinates are passed in through attributes. This particular vertex shader doesn't do much, it just leaves the position unchanged and passes the texture coordinate to the next shader as a varying (meaning its value should be interpolated between vertices.)
// default.frag
#extension GL_OES_EGL_image_external:enable
precision mediump float;
varying mediump vec2 vTexCoordOut;
uniform samplerExternalOES texture;
void main() {
gl_FragColor = texture2D(texture, vTexCoordOut);
}
default.frag is called a fragment shader. A fragment is best described as a "candidate" pixel that could be drawn to the screen. Fragments are generated by rasterizing the polygons defined by the vertices processed in the vertex shader. These fragments then get processed here. In the our case, this will get called 800x600 times, once for each pixel in the viewfinder. During each call, the fragments's texture coordinates are passed in along with a reference to the texture as a uniform (meaning its value is constant for all fragments.) The samplerExternalOES type is a special kind of sampler that understands the Tegra-specific texture buffer we use to pass around viewfinder frames. This particular fragment shader performs a 2D texture look up and returns the appropriate color.
Your zebra viewfinder shader program will be very similar to the default viewfinder. The vertex shader can be copied directly, while the fragment shader will need to be a bit more complicated. For more information on OpenGLSL syntax, check out the documentation or some tutorials.
When the CameraViewRenderer is created, it loads and precompiles all the possible viewfinder shaders. Update the code marked by TODOs to include your new zebra shader program.
Answer the following questions in your write up:
If you find yourself with some spare time, try making a "Cops" viewfinder mode that pixelates (i.e. censors) any faces found in the viewfinder stream. You'll probably find this easiest to add in the main camera thread using OpenCV and directly editing frame pixels, rather than making a new shader on the Java side. You won't get any extra credit or anything for doing this, but if you end up giving it a try send some screen shots our way!