Graph cuts in computer vision - Alchetron, the free social encyclopedia

As applied in the field of computer vision, graph cuts can be employed to efficiently solve a wide variety of low-level computer vision problems (early vision), such as image smoothing, the stereo correspondence problem, image segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Such energy minimization problems can be reduced to instances of the maximum flow problem in a graph (and thus, by the max-flow min-cut theorem, define a minimal cut of the graph). Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph (e.g., normalized cuts), the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization (other graph cutting algorithms may be considered as graph partitioning algorithms).

"Binary" problems (such as denoising a binary image) can be solved exactly using this approach; problems where pixels can be labeled with more than two different labels (such as stereo correspondence, or denoising of a grayscale image) cannot be solved exactly, but solutions produced are usually near the global optimum.

History

The theory of graph cuts was first applied in computer vision in the seminal paper by Greig, Porteous and Seheult of Durham University. In the Bayesian statistical context of smoothing noisy (or corrupted) images, they showed how the maximum a posteriori estimate of a binary image can be obtained exactly by maximizing the flow through an associated image network, involving the introduction of a source and sink. The problem was therefore shown to be efficiently solvable. Prior to this result, approximate techniques such as simulated annealing (as proposed by the Geman brothers), or iterated conditional modes (a type of greedy algorithm as suggested by Julian Besag) were used to solve such image smoothing problems.

Although the general k -colour problem remains unsolved for k > 2 , the approach of Greig, Porteous and Seheult has turned out to have wide applicability in general computer vision problems. Greig, Porteous and Seheult approaches are often applied iteratively to a sequence of binary problems, usually yielding near optimal solutions.

Notation

Image: x ∈ { R , G , B } N

Output: Segmentation (also called opacity) S ∈ R N (soft segmentation). For hard segmentation S ∈ { 0 for background , 1 for foreground/object to be detected } N

Energy function: E ( x , S , C , λ ) where C is the color parameter and λ is the coherence parameter.

E ( x , S , C , λ ) = E c o l o r + E c o h e r e n c e

Optimization: The segmentation can be estimated as a global minimum over S: arg ⁡ min S E ( x , S , C , λ )

Existing methods

Standard Graph cuts: optimize energy function over the segmentation (unknown S value).

Iterated Graph cuts:

First step optimizes over the color parameters using K-means.
Second step performs the usual graph cuts algorithm.

These 2 steps are repeated recursively until convergence.

Dynamic graph cuts:
Allows to re-run the algorithm much faster after modifying the problem (e.g. after new seeds have been added by a user).

Energy function

P r ( x | S ) = K ^{( − E )} where the energy E is composed of 2 different models ( E c o l o r and E c o h e r e n c e ):

Likelihood / Color model / Regional term

E c o l o r — unary term describing the likelihood of each color.

This term can be modeled using different local (e.g. texons) or global (e.g. histograms, GMMs, Adaboost likelihood) approaches that are described below.

Histogram

We use intensities of pixels marked as seeds to get histograms for object (foreground) and background intensity distributions: P(I|O) and P(I|B).

Then, we use these histograms to set the regional penalties as negative log-likelihoods.

GMM (Gaussian Mixture Model)

We usually use 2 distributions to model background and foreground pixels.

Use a Gaussian mixture model (with 5–8 components) to model those 2 distributions.

Goal: Try to pull apart those 2 distributions.

Texon

A texon (or texton) is a set of pixels that has certain characteristics and is repeated in an image.

Steps:

Determine a good natural scale for the texture elements.
Compute non-parametric statistics of the model-interior texons, either on intensity or on Gabor filter responses.

Examples:

Deformable-model based Textured Object Segmentation

Contour and Texture Analysis for Image Segmentation

Prior / Coherence model / Boundary term

E c o h e r e n c e — binary term describing the coherence between neighborhood pixels.

In practice, pixels are defined as neighbors if they are adjacent either horizontally, vertically or diagonally (4 way connectivity or 8 way connectivity for 2D images).

Costs can be based on local intensity gradient, Laplacian zero-crossing, gradient direction, color mixture model,...

Different energy functions have been defined:

Standard Markov random field (MRF): Associate a penalty to disagreeing pixels by evaluating the difference between their segmentation label (crude measure of the length of the boundaries). See Boykov and Kolmogorov ICCV 2003

Conditional random field (CRF): If the color is very different, it might be a good place to put a boundary. See Lafferty et al. 2001; Kumar and Hebert 2003

Criticism

Graph cuts methods have become popular alternatives to the level set-based approaches for optimizing the location of a contour (see for an extensive comparison). However, graph cut approaches have been criticized in the literature for several issues:

Metrication artifacts: When an image is represented by a 4-connected lattice, graph cuts methods can exhibit unwanted "blockiness" artifacts. Various methods have been proposed for addressing this issue, such as using additional edges or by formulating the max-flow problem in continuous space.

Shrinking bias: Since graph cuts finds a minimum cut, the algorithm can be biased toward producing a small contour. For example, the algorithm is not well-suited for segmentation of thin objects like blood vessels (see for a proposed fix).

Multiple labels: Graph cuts is only able to find a global optimum for binary labeling (i.e., two labels) problems, such as foreground/background image segmentation. Extensions have been proposed that can find approximate solutions for multilabel graph cuts problems.

Memory: the memory usage of graph cuts increase quickly as the image size increase. As an illustration, the Boykov-Kolmogorov max-flow algorithm v2.2 allocates 24 n + 14 m bytes ( n and m are respectively the number of nodes and edges in the graph). Nevertheless, some amount of work has been recently done in this direction for reducing the graphs before the maximum-flow computation.

Algorithm

Minimization is done using a standard minimum cut algorithm.

Due to the Max-flow min-cut theorem we can solve energy minimization by maximizing the flow over the network. The Max Flow problem consists of a directed graph with edges labeled with capacities, and there are two distinct nodes: the source and the sink. Intuitively, it's easy to see that the maximum flow is determined by the bottleneck.

Implementation (exact)

The Boykov-Kolmogorov algorithm is an efficient way to compute the max-flow for computer vision related graph.

Implementation (approximation)

The Sim Cut algorithm approximates the graph cut by the solution of the set of simultaneous non-linear equations based on the analog electrical model of the flow network. Acceleration of the algorithm is possible through parallel computing.

Software

http://pub.ist.ac.at/~vnk/software.html — An implementation of the maxflow algorithm described in "An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Computer Vision" by Vladimir Kolmogorov

http://vision.csd.uwo.ca/code/ — some graph cut libraries and MATLAB wrappers

http://gridcut.com/ — fast multi-core max-flow/min-cut solver optimized for grid-like graphs

http://virtualscalpel.com/ — An implementation of the Sim Cut; an algorithm for computing an approximate solution of the minimum s-t cut in a massively parallel manner.

References

Graph cuts in computer vision Wikipedia

(Text) CC BY-SA

Contents