Complementary geometric and optical information for match prop- agation based 3D reconstruction

Get Complete Project Material File(s) Now! »

Variational techniques for global methods

Variational matching methods stem directly from the literature on the optical ow estimation problem. Several of the state of the art methods for dense matching and 3D reconstruction use the variational technique since it produces well behaved results through error equidistribution [120, 6, 111].
Horn and Schunck [50] introduced the variational matching technique for optical ow. This technique aims to reduce the dierence in intensities between corresponding pixels in two images by optimizing the displacement vector [u; v] (see equation 2.5) using computations on variations. Following the model for global matching (equations 2.1 and 2.4) the key idea is to use a linearized version of the squared intensities dierences as a data term and the gradient of the displacement eld as a regularization term. This formulation is expressed in equation 2.6; where represents a weight for the smoothness term. E(u; v) = Z Z (x; y) 􀀀 I2(x + u; y + v)j2.

Discrete techniques for global methods

Another direction that can be taken to solve for global matching methods is to search for the solution in a quantized space. Namely, the problem is solved by assigning corresponding labels to each pixel. Common solving methods following this direction include graph cuts, belief propagation and linear programming. The notion behind them is to approximate the continuous problems with discrete ones, loosing accuracy but gaining in eciency and robustness.
Several methods in this category merge a set of candidate solutions pre-computed with other method(s) and/or with dierent setups. For example [103] proposes the fusion of a set of continuous solutions computing minimum graph cuts. [127] proposes to solve a set of binarized continuous subproblems solving for minimum graph cuts using MRF. The motivation in this paper is to overcome the slow convergence speeds that some iterative methods yield by not restricting the ow at each step to local displacements.
Taking advantage of the developments in feature descriptor, the authors of [72] proposed a method that characterizes each pixel using SIFT [74]. The problem is modeled as a minimization of features distances plus a smoothness term. An L1 norm is used for the data term and a thresholded L1 norm is used for the smoothness term. The problem is then solved using belief propagation. The overall computation is fast but with a loss of precision that manifests itself as a staircasing eect in 3D reconstructions.

Model based feature detectors

Model based descriptors aim to provide a formal representation of corner points in an image. One example of such representation is presented in [41], where the author models a corner as a blurred wedge, parameterized by its angle, amplitude and blur.
Features were detected by tting this model into local images. Other models proposed include junctions [102] that are also detected by tting a parametric model that comprises several homogeneous regions with blurred junctions. Other methods, for example, aim to characterize the scale response of common feature detectors [27]. In general, this breed of methods suer from an important drawback: high complexity, which makes dicult the design of rich feature detectors, and makes for high running times.

Color based feature detectors

In the previously outlined methods, feature detection was performed based only on image intensities, either by dierentiation or by directly analyzing the intensities. Color based detectors build-up on the previous ideas, combining them with color distinctiveness. Examples include the CSIFT method [1], which presents an extension to the SIFT detector and descriptor based on a color invariance model [38].

Feature descriptors and matching

Ideally, feature descriptors should be distinctive and invariant to geometric and photometric transformations. The simplest descriptor of a region around a feature point or inside a feature region is an array of pixels. Arrays of intensity valued pixels can be used to perform matchings across images using some correlation score.

Feature descriptors based on distribution

The SIFT descriptor proposed by D. Lowe [74] basically consists of a normalized vector with 128 values. This vector represents a histogram of orientations of image gradients, normalized by the magnitude of the same gradients. A grid of size 44 is constructed around a detected feature point and inside each grid, the image gradients are classied in eight dierent bins ( 448 = 128), see gure 3-6. The contribution of each gradient to the histogram will depend on its magnitude and a weight given by a Gaussian function around the feature point. The overall extension of the 4 4 grid will depend on the scale at which the feature was detected.
Since the obtained descriptor is of only 128 (constant) dimensions, in [74] is proposed to use nearest neighbor to perform feature matching. This will return matches for all the features detected, including those that do not actually have a correspondence.
Therefore, matches are only preserved if the distance to the second nearest neighbor is signicantly larger.

Gradient location-orientation histogram: GLOH

The GLOH [82] descriptor aims to extend SIFT by performing principal component analysis (PCA) [52] to reduce the dimensions of the feature vectors. It works by computing SIFT descriptors for a log-polar location grid with 8 bins in the angular direction and 3 bins in the radial direction, which results in 17 location bins (central dot represents the location of the feature detected. The cell with the arrows represents a histogram with the eight bins. In the lower right cell is exemplied a case for 66 pixels. bin is not divided in dierent angular directions). Furthermore, gradient orientations are divided in 16 bins, giving a total of 272 bins. This number of dimensions is reduced using PCA by keeping the 128 (or 64) largest eigenvectors.

READ Classification of computational assumptions in the algebraic group model

Feature descriptors based on lters

Filter based descriptors are based on the response of three main lters: Gabor lters [26], Steerable lters [32] and complex lters [108]. For example, from a cognitive point of view [78], Gabor lters based on image decomposition are suggested to be relevant to human image understanding. The Gabor lters consist of a set of local bandpass functions, which react to orientation and frequency. In the context of complex lters, [108] proposes a lter that can have 16 dierent responses, similar to the derivatives of a gaussian function. The lter used is formulated as: Kmn = (x + iy)m(x 􀀀 iy)nG(x; y), with G(x; y) is the Gaussian function, m + n 6 and m n.

LDAHash

The SIFT descriptor described previously describes features using a vectors of 128 dimension. Matching feature points using such descriptors works reasonably for a few hundreds of points or less, but it becomes a problem for thousands or millions. The LDAHash feature matching method proposed by Strecha et al. [20] minimizes that problem by mapping the descriptors into the Hamming space. Once in this space, the Hamming metric is used to compare the resulting descriptors.
The Hamming space [43] is the set of 2N binary strings of length N. The Hamming distance between two strings is the number of positions at which they dier. The LDAHash method rst nds an ane mapping function that minimizes inclass covariance and maximizes across-class covariance. Then, it computes a threshold to binarize the mapped descriptors.

Feature descriptors based on spin images

Spin images [51] represent the distribution of intensities around a central point in a normalized histogram. Each histogram is obtained for 5 to 10 rings around the central point. They are usually used in conjunction with normalized regions from Harris-Ane or Hessian-Ane feature detectors.

Feature descriptors based on color

Color based image feature descriptors aim to use information from color images instead of simply grayscale versions. For example, [133] present distribution-based color descriptors. Normalized RGB, hue, opponent angle and spherical angle are used to create histograms. In general, combinations of color and shape-based descriptors can provide richer information to perform matching and recognition [133].

3D reconstruction using image features

Sparse sets of features can provide important information for tasks such as object obstacle avoidance and scene calibration. They are also used in the process of automatic scene calibration (e.g. [73, 10, 123]). However, they are often not envisioned for full scene reconstruction. Dense feature descriptors and matching methods (e.g. [124, 20, 72]) propose a way to overcome this limitation by computing per-pixel feature descriptors that are matched across images.

Table of contents :

.1 Introduction
1.1 Multi-view stereo for 3D reconstruction
1.1.1 Classication of Image-based 3D reconstruction methods .
1.2 Main contributions and thesis outline
1.2.1 Main contributions
1.2.2 Thesis outline
2 Image matching for 3D reconstruction
2.1 Introduction
2.2 Local methods
2.3 Global methods
2.3.1 Variational techniques for global methods
2.3.2 Discrete techniques for global methods
2.3.3 Evaluation data-sets
3 Feature detection and matching
3.1 What are local features?
3.2 Feature detection
3.2.1 Curvature based feature detectors
3.2.2 Intensity based feature detectors
3.2.3 Segmentation based features
3.2.4 Model based feature detectors
3.2.5 Color based feature detectors
3.3 Feature descriptors and matching
3.3.1 Feature descriptors based on distribution
3.3.2 Feature descriptors based on lters
3.3.3 LDAHash
3.3.4 Feature descriptors based on spin images
3.3.5 Feature descriptors based on color
3.4 3D reconstruction using image features
3.4.1 DAISY
3.4.2 SIFT Flow
3.5 Experiments
3.6 Conclusion
4 Variational Image Matching for 3D reconstruction
4.1 Classical formulation
4.2 Data term
4.2.1 Data term penalizers
4.2.2 What to measure
4.3 Regularization
4.4 Data term linearization
4.4.1 Early data term linearization
4.4.2 Late data term linearization
4.5 Additional cues from scene geometry
4.5.1 Epipolar geometry
4.5.2 Sparse feature matches
4.6 Solving the variational matching problem
4.6.1 Successive over relaxation: SOR
4.6.2 Alternating direction implicit: ADI
4.6.3 Coarse to ne
4.6.4 Multi-grid
4.7 Experiments
4.7.1 Datasets
4.7.2 Methods and criteria
5 Distortion driven variational multi-view reconstruction
5.1 Introduction
5.2 Geometry driven variational matching
5.2.1 Classical variational matching
5.2.2 Distortion driven variational matching
5.2.3 Results using distortion driven variational matching
5.3 Distortion driven multiple-view merging
5.3.1 Triplet contributions
5.3.2 Pair contribution
5.4 Results and discussion
5.5 Conclusion
6 Propagation-based matching for 3D reconstruction
6.1 Introduction
6.2 Match propagation for wide-baseline congurations
6.3 Quasi-dense match propagation for wide-baseline congurations
6.3.1 Quasi-Dense Wide Baseline Matching for Three-Views
6.3.2 Multi-view quasi-dense matching for wide-baseline congurations
6.3.3 Results and discussion
6.4 Accurate, Dense, and Robust Multiview Stereopsis
6.4.1 Results and discussion
7 Complementary geometric and optical information for match prop- agation based 3D reconstruction
7.1 Introduction
7.2 Geometry based image match propagation
7.2.1 Overview
7.2.2 Propagation
7.2.3 Candidate region querying
7.2.4 Fit surface and update
7.3 Results
7.4 Conclusion and discussion
8 Conclusion and perspective
8.1 Conclusion
8.2 Perspective
Bibliography