Robust Multi-camera Calibration

CS 205 Project

Abstract:

We describe the formulation and MATLAB implementation of a large-scale nonlinear optimzation procedure, to calibrate an array of 128 cameras. The previous version of our calibration system was calibrating each camera independently. This produced model parameters that fit observed data very well, and the optimization procedures were simple and swift. The drawback was that we could not enforce certain constraints between different cameras, so our calibration parameters were not completely consistent. In this project, we describe our efforts to write a global optimization procedure, that allows us to capture all constraints without sacrificing accuracy or scalability.



Contents


Introduction

The Stanford Graphics Lab is building an array of 128 cameras for high performance imaging applications [1]. Many applications we are targetting require very precise calibration of the cameras. Previously, we were using third-party implementations [2][3] of Zhang's algorithm [4] to calibrate each camera independently. But this does not let us model all the constraints in our calibration procedure, and yields model parameters that have some inconsistancies. For example, the calibration process involves moving a patterned grid (of known geometry) in front of the cameras. The computed motion of the grid in the world should be the same (upto a rigid transform) with respect to each camera. But calibrating each camera independently does not let us incorporate such constraints in the optimization. Consequently, the camera parameters we get fit the observed data well, but are not mutually consistent - a disaster for multi-view algorithms we wish to implement with our array.

In this document, I describe how I implemented an optimization routine using the MATLAB optimization toolbox [5]. The aim was to get accurate, consistent values for the calibration parameters while keeping the system scalable to 128 cameras. Here, I talk primarily about optimization; motivation for the problem can be found in my project proposal.

The next section describes implementation decisions I had to make, and issues in parametrizing the problem with its constraints. Thereafter, I discuss making best use of the MATLAB API for the particular optimization problem I wish to solve. I present inital results on a data set acquired from a four-camera prototype (the array is still under construction), and conclude with analysis and thoughts on how these techniques could be applied to other problems in my research.


Parametrizing the Problem

Objective Function

We wish to find the camera parameters that accurately predict the pixel coordinates of a point in 3D from the point's world coordinates. As input, we have a large number of points with whose 3D coordinates (possibly as a function of some optimiation parameters) in the owrld and pixel coordinate the camera images are known. Our objective function simply computes the "pixel error": the difference between the observed images of 3D points and what is predicted by the current estimate of model parameters.

The matlab function lsqnonlin is the ideal candidate to minimize this in the L2 norm. There is an advantage in the special structure of a nonlinear least squares problem versus a general minization (fminunc, fmincon): the gradient and hessian of the 2-norm of the objective function can be easily computed using the jacobian and previously computed hessians [6, Pg 134]. The drawback - at least in MATLAB - is that we cannot specify additional constraints but have to solve an unconstrained problem. Moreover, we cannot eliminate potential outliers during the minimization. We stick with lsqnonlin, however, because we can encode the important constraints in the parametrization described below, and our feature detector conservatively filters out outliers.

Model Parameters

We use the standard camera model of computer vision: a perspective camera, with some nonlinear lens distortion. A detailed description of the parameters may be found in [4]. We take care to model everything in the calibration process consistently, this enables us to overcome the drawback of the previous approach cited above. For example, to ensure that the grid motion (in the global world coordinate frame) is same with respect to all cameras, we add additional parameters to describe the grid motion and use these same set of parameters for all cameras.

Initialization

We use the output of the previous version of our system as an initial guess for our system.

Termination Conditions

MATLAB has several parameters to terminate the minimization, including limiting of iterations, function evaluations, thresholding step size, etc. Although we experimented briefly with varying these, we found that the defaults work very well after the enhancements described below.


Exploiting the MATLAB API

Here I describe the various stages in the development of our optimization procedure. The data set used to illustrate our experiments was from a four-camera protoype, with 2940 data points and 78 model parameters. Since each observation point has two coordinates (x and y), the Jacobian is 5880 x 78. Details of the experiments and the actual MATLAB output are tabulated in the next section.

Naive Implementation

As the first implementation, I simply coded up the objective function, and handed it off to lsqnonlin , without any information on sparsity or partial derivatives. This worked better than I expected it to, although the optimization was slow, and terminated due to the threshold on number of function evaluations, the residual was significantly minimized by that stage. The main observation from the
MATLAB output was that most of the function evaluations were beind used to estimate the Jacobian. Indeed, there were 79 function evaluations per iteration, of which 78 were being used for finite differences. This inspired the first enhancement.

Structure of the Jacobian

The pixel error for any one observed 3D point depends only on the camera which observed it. It is independent of parameters of other cameras, or parameters describing other images seen by the same camera. Therefore, it has very few non-zero partial derivatives, and the Jacobian is sparse, with well-structured blocks of non-zero elements. We can increase efficiency by passing the structure of the Jacobian to MATLAB so it doesn't waste function evaluations trying to estimate partial derivatives known to be zero. This is accomplished by enabling the JacobPattern option in the options passed to lsqnonlin.

The results are encouraging: the number of function evaluations drops to 19 per iteration. This is consistent with the fact that there are at most 18 non-zero entries in any row of the Jacobian. Adding this option lets the minimization proceed till the limit on iterations is reached, we even see a slightly smaller residual in the MATLAB output.

Analytic Derivatives

Encouraged by this improvement, I decided to go the whole way and provide analytic expressions for the partial derivatives. While the objective function is relatively straightforward to code up, its partial derivatives certainly are not! This was by far the most time consuming portion of the project: it took about 75% of the total time and hogs 50% of the total code. The 'DerivativeCheck' option, which compares programmer-supplied derivatives to those estimted via finite differencing was extremely helpful in debugging, but even with this it often took hours to track down the sources of error. Fortunately, I managed to end up with a reasonably modular, bug-free implementation.

Since the difference between the analytic derivatives and those estimated by finite differencing was of the order of 10-5, I was not suprised there was not any increase in accuracy. The good news was that having programmer-supplied Jacobians permits us to speedup computation by exploiting its sparsity. This will be important as the size of the problem increases with more cameras.

Exploiting Sparsity

As a final improvement, I decided to make MATLAB exploit the sparsity of the Jacobian in matrix multiplication. The minimization requires the Jacobian to compute products of the type J*Y, JT*Y, JT*J*Y. MATLAB accepts a user specified function for evaluating these products. Hence, the full Jacobian need not ever be formed during the minimization. I implemented a sparse representation of the Jacobian and function to compute the required products, and passed them to the minimization by enabling the 'JacobMult' option. As expected, this runs much faster.

This is a tremendous win, both in terms of time and space. The full Jacobian is MxN, where M depends on the number of observations we are fitting a model to, and N is a constant plus 12 times the number of cameras. The sparse Jacobian, however, has no more than 18M non-zero entries. This is very encouraging, as it does not increase if we add more cameras - beyond the fact that we will have more observations. Since flop counts are no longer available in MATLAB, I'm not able to provide an accurate measure of increase in speed. However, it is very likely this will be a major source of speedup when we calibrate the entire array of 128 cameras.


Experiments and Results

The following table summarizes our results and how they were affected by the improvements described above. The data set was from a four-camera prototype. Each camera took six images, there were a total of 2940 points. The time taken per iteration is approximate.

Implementation Normalized Residual Error Time per Iteration Termination MATLAB Output
Naive, unoptimized 0.3929 pixels 7.1 sec Exceeded number of function evaluations naive.txt
Used Jacobian structure 0.3908 pixels 2.5 sec Exceeded maximum number of iterations jpattern.txt
Jacobian evaluated analytically 0.3905 pixels 0.53 sec Exceeded maximum number of iterations sparsity.txt



Calibration error, corresponding to third row of the table above. We plot the residual error for each observation, in a different color for each camera. Click on the image to view in full-res.


Conclusions

The primary objective of this project to have a system that could do an accurate, scalable global calibration, which has largely been accomplished. In the process, I learnt much about writing useful nonlinear optimizations and some good enhancements. The accuracy of this camera calibration is reasonable, I believe most of the residual error is due to imperfect feature detection and localization. Our automated feature detector, which provides the input data to which we fit, has an accuracy of about 0.3 pixels.

Unfortunately, the completion of the camera array took longer than I anticipated and so I was not able to run the optimization on 128 cameras. However, when the construction of the array is finished, the calibration code will be ready for it.


References