Light Field Camera Calibration


Many of the targetted applications of the Stanford Multi-Camera Array (such as light field rendering, shape from light fields, synthetic aperture photography) require accurate calibration. Camera calibration gives us the geometric relation between the 3D coordinates of a point in the world and the 2D (pixel) coordinates of its image in a camera of the array. For the traditional pinhole camera model, calibration involves determining:

This page describes our current implementation of camera calibration.


Design Philosophy

Our goal is to keep calibration accurate, efficient and simple enough to perform as often as required by our applications. We would like to be able to do geometric calibration for every data set we acquire, with minimal assumptions on camera placement and capabilities. Hence, we abide by the following principles: We found it was easy to meet these criteria with a straightforward extension of [1] to multiple cameras. By having our cameras view a planar calibration grid of known geometry, we obtain several 3D-2D correspondences. Given these, we can obtain the optimal camera parameters by solving a nonlinear least squares minimization problem.


Feature Detection and Correspondences

Our calibration grid consists of several squares, whose corners are the feature points we locate in the images. The corners are located in three stages:
  1. Edge detection
  2. Fitting lines to resulting edgels
  3. Finding intersections of line segments corresponding to square corners
We use
Horatio, a vision library from the University of Surrey for doing the feature detection. The detected corners are grouped into quadrilaterals and clustered into rows and columns. If enough quadrilaterals are not found, or there are too many spurious ones, or the ones detected do not match up to the known geometry of the grid, we simply abandon the image rather than incur the risk of an incorrect match. Experiments with thousands of images (several light fields) show that the feature detector never generates incorrect correspondences and locates about 90% of visible squares in the image on average. It is capable of handling partial occlusions of the grid. An example of the feature detection is shown below.

Edge detection
Line Fitting
Final squares


Nonlinear Optimization (Bundle Adjustment)

Having obtained several pairs of 3D point-2D image pairs, we seek the model parameters that best fit the observed data. This is easily formulated a nonlinear minimization problem in terms of the calibration parameters (and the motion of the planar calibration grid) that minimizes the pixel reprojection error. For an initial guess, we compute the calibration parameters of each camera separately using implementations of Zhang's algorithm [2] [3]. Complete details of this stage are provided in [4], here we merely sketch the necessity of exploiting sparsity in the optimization.

The nonlinear minization is a fairly large scale problem: suppose we have N cameras, taking S synchronized snapshots. Each camera is modeled by 12 parameters, in addition, there are S-1 rigid motions of the calibration grid. (The position of the grid in the first snapshot is used to define the global coordinate system.) This leads to a 12N+6S-6 dimensional problem: these are the number of parameters we are solving for. Suppose that on an average, we extract P point correspondences from a snapshot, then we have a total of D=P*N*S observations. Typically, we would have about N=100 cameras, S=15 snapshots, and about P=120 point correspondences per image. This is a 1284-dimensional search, with a 180,000 x 1284 jacobian. In double precision, this would require about 1.7 gigabytes of storage, and take impractically long to compute.

Fortunately, it is easy to show that the jacobian is very sparse. Each row of the jacobian corresponds to one ray through a point on the calibration grid and the center of one camera in the array. The only parameters this ray depends on are the 12 parameters or the camera, and the 6 representing the pose of the calibration grid. This means that each row of the jacobian can have at most 18 nonzero entries. Consequently, the effective width is a constant, independent of the number of cameras or snapshots. Exploiting this sparsity is necessary for a feasible implementation of the optimization.


Results

We calibrated an array of 85 cameras from 8 snapshots of the calibration grid. This involved a 1062-dimensional search, given a total of 86824 3D-2D pairs. Below we show the error statistics, and visualizations of computed camera geometry.

RMS error 0.3199 pixels
Mean error 0.2647 pixels
Median error 0.2342 pixels
Standard Deviation 0.1796 pixels
Avg. Planarity Deviation 0.9636 cm


Camera centers, as computed by calibration, projected onto a photo of the array itself. The projection was approximated by a homography computed by manually specifying control points on the photograph to map the camera centers to.
Control points (clicked manually)
Superimposed control points and camera projections

Histogram of reprojection errors
Cummulative distribution of reprojection errors
2D plot of reprojection errors, MATLAB .fig file
3D plot of camera centers, MATLAB .fig file


TO DOs

  1. Components of the calibration pipeline have to be better integrated.
  2. Port feature detector to Gandalf, for increased speed, better integration. Extend it to handle calibration objects of varying sizes.
  3. Finish code to better visualize computed camera geometry and reprojection errors.
  4. Build larger calibration objects (upto 1m x 1m) for wide area calibration. Check out Pacific Panels / Home Depot for large flat mounting surfaces.
  5. Port nonlinear minimization to C/C++, eliminating need for Matlab. (Great project for another CS 205 student!)

References



Vaibhav Vaish

Last update: May 7th, 2003.