Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Philipp Krähenbühl and Vladlen Koltun

Stanford University

NIPS 2011

Most state-of-the-art techniques for multi-class image segmentation and labeling use conditional random fields defined over pixels or image regions. While region-level models often feature dense pairwise connectivity, pixel-level models are considerably larger and have only permitted sparse graph structures. In this paper, we consider fully connected CRF models defined on the complete set of pixels in an image. The resulting graphs have billions of edges, making traditional inference algorithms impractical. Our main contribution is a highly efficient approximate inference algorithm for fully connected CRF models in which the pairwise edge potentials are defined by linear combinations of Gaussian kernels. Our algorithm can approximately minimize fully connected models on tens of thousands of variables in a fraction of a second. Quantitative and qualitative results on the MSRC-21 and PASCAL VOC 2010 datasets demonstrate that full pairwise connectivity at the pixel level produces significantly more accurate segmentations and pixel-level label assignments.

Paper, supplementary material, slides. and poster.

A more general but slightly slower implementation of the project is available for download. The code is under BSD license, feel free to use and modify it. We're happy to hear about applications of this research, please send us an email if you use the code. If you use it for a publication, please cite our paper. If you think you found a bug in the code, please let us know as well. However, we will not provide assistance with installing, understanding, or running the code.
A slightly faster implementation including a new learning algorithm can be found with our ICML 2013 paper.

Unary potentials:
An implementation of textonboost and the more accurate annotations can be found here . If you want to just use our unary potentials you can find them here .

VOC dataset:
As a clairification to the paper. We use the standard class average intersection / union accuracy measure for the VOC 2010 dataset. It is computed as: "true positives" / ("true positives"+"false positives"+"false negatives").