Problems requiring accurate determination of parameters from
image-based quantities arise often in computer vision. Two recent,
independently developed frameworks for estimating such parameters are
the FNS scheme of the authors, and the HEIV scheme of Leedan \& Meer.
In this paper, it is shown that the two schemes constitute intimately
related but different means of numerically solving a common underlying
equation characterising the minimiser. The analysis is driven by the
search for a non-degenerate form of a certain generalised eigenvalue
problem, and this effectively leads to a new derivation of the HEIV
algorithm. This work may be seen as an extension of the authors'
previous efforts to rationalise and inter-relate a spectrum of
estimators, including the renormalisation method of Kanatani and the
normalised eight-point method of Hartley.
The paper gives a novel explanation for the improvement in
performance of the stereo-vision eight-point algorithm that results
from using normalised data. It is first established that the
normalised algorithm acts to minimise a specific cost function. It
is then shown that this cost function is statistically better
founded than the cost function associated with the non-normalised
algorithm. This supercedes the standard agument that improved
performance is due to the better conditioning of a pivotal matrix.
Experimental results are given that support the shift in argument
from numerical stability to statistical soundness as a means of
rationalising performance. This work continues a wider effort to
place a variety of estimation techniques within a coherent
framework.
In recent work the authors proposed a wide-ranging method for
estimating parameters that constrain image feature locations and
satisfy a constraint not involving image data. The present work
illustrates the use of the method with experiments concerning
estimation of the fundamental matrix. Results are given for both
synthetic and real images. It is demonstrated that the method gives
results commensurate with, or superior to, previous approaches, with
the advantage of being fast.
Previous work of the authors developed a theoretically well-founded
scheme (FNS) for finding the minimiser of a class of cost functions.
Various problems in video analysis, stereo vision, ellipse-fitting,
etc, may be expressed in terms of finding such a minimiser.
However, in common with many other approaches, it is necessary to
correct the minimiser as a post-process if an ancillary constraint
is also to be satisfied. In this paper we develop the first
integrated scheme (CFNS) for simultaneously minimising the cost
function and satisfying the constraint. Preliminary experiments in
the domain of fundamental-matrix estimation show that CFNS generates
rank-2 estimates with smaller cost function values than rank-2
corrected FNS estimates. Furthermore, when compared with the
Hartley-Zisserman Gold Standard method, CFNS is seen to generate
results of comparable quality in a fraction of the time.
Many parameter estimation methods used in computer vision
are able to utilise covariance information describing the uncertainty of
data measurements. This paper considers the value of this information
to the estimation process when applied
to measured image point locations. Covariance
matrices are first described and a procedure is then
outlined whereby covariances may be associated with image
features located via a measurement process. An empirical
study is made of the conditions under which covariance
information enables generation of improved parameter
estimates. Also explored is the extent to which the noise
should be anisotropic and inhomogeneous if improvements are
to be obtained over covariance-free methods.
Critical in this is the devising of synthetic
experiments under which noise conditions can be
precisely controlled. Given that covariance information
is, in itself, subject to estimation error, tests are
also undertaken to determine the impact of
imprecise covariance information upon the quality
of parameter estimates. Finally, an experiment
is carried out to assess the value
of covariances in estimating the fundamental matrix from real images.
The renormalisation technique of Kanatani is intended to iteratively
minimise a cost function of a certain form while avoiding systematic
bias inherent in the common method of minimisation due to Sampson.
Within the vision community, the technique
has generally been perceived as somewhat
controversial and impenetrable. This work presents
an alternative, simpler derivation of the technique, along
with new insights that place it in the context of other approaches.
We first show that the minimiser of the cost function must satisfy a
special variational equation. A Newton-like, fundamental numerical
scheme is presented with the property that its theoretical
limit coincides with the minimiser. Standard statistical
techniques are then employed to derive afresh several renormalisation
schemes. The fundamental scheme proves pivotal
in the rationalising of the renormalisation and
other schemes, and enables us to show that the renormalisation
schemes do not have as their theoretical limit the desired
minimiser. The various minimisation schemes are finally subjected
to a rigorous performance analysis.
We consider the problem of estimating parameters of a model described by an equation of special form. Specific models arise in the analysis of a wide class of computer vision problems, including conic fitting and estimation of the fundamental matrix. We assume that noisy data are accompanied by (known) covariance matrices characterising the uncertainty of the measurements. A cost function is first obtained by considering a maximum likelihood formulation, and applying certain necessary approximations that render the problem tractable. A novel, Newton-like iterative scheme is then generated for determining a minimiser of the cost function. Unlike alternative approaches such as Sampson's method or the renormalisation technique, the new scheme has as its theoretical limit the minimiser of the cost function. Furthermore the scheme is simply expressed, efficient, and unsurpassed as a general technique in our testing. An important feature of the method is that it can serve as a basis for conducting theoretical comparison of various estimation approaches.
The optical flow observed by a moving camera satisfies, in the absence of noise, a special equation analogous to the epipolar constraint arising in stereo vision. Computing the ``flow fundamental matrix'' of this equation is an essential prerequisite to undertaking 3-D analysis of the flow. This paper presents an optimal formulation of the problem of estimating this matrix under an assumed noise model. This model admits independent Gaussian noise that is not necessarily isotropic or homogeneous. A theoretical bound is derived for the accuracy of the estimate. An algorithm is then devised that employs a technique called renormalization to deliver an estimate and then corrects the estimate so as to satisfy a particular decomposability condition. The algorithm also provides an evaluation of the reliability of the estimate. Epipoles and their associated reliabilities are computed in both simulated and real-image experiments. Experiments indicate that the algorithm delivers results in the vicinity of the theoretical accuracy bound.
We are concerned with solving an equation whose form is applicable to a wide class of problems arising in computer vision. The equation typically relates image point locations to the parameters of some appropriate model. We assume that each measured datum is accompanied by a covariance matrix that characterises the uncertainty of the measurement. Noisy data are assumed to be in plentiful supply, implying that our problem is overdetermined. To tackle noise, the problem is transformed to one of least squares minimisation. In this sense, we are concerned with fitting a surface to data and their covariances. Examples are given of computer vision problems whose forms constitute instances of our general equation. The paper has two principal concerns: the establishing of a suitable cost function for our general problem, and the deriving of effective schemes for minimising the cost function. A weighted least squares (WLS) cost function is obtained by considering an optimal maximum likelihood formulation, and applying certain necessary approximations that render the problem tractable. A new and fundamental Newton-like iterative scheme is then generated for directly minimising the WLS cost function. This proves valuable in the deriving afresh of various existing and modified schemes, and helps us to show that the renormalisation approaches of Kanatani do not theoretically act to minimise the WLS cost function. A portion of this work serves to rationalise renormalisation, and several new variations on the theme are proposed. Various minimisation schemes are then tested. Experiments are carried out on the benchmark conic fitting problem of estimating ellipses from synthetic data points and their covariances. When the data exhibit noise that is anisotropic and inhomogeneous, those methods that make use of covariance information perform markedly better than more traditional methods that do not. None of the methods outperforms the fundamental scheme. Thus, being in addition simply expressed and constituting a genuine minimiser of the WLS cost function, the fundamental scheme offers strong advantages over the alternatives considered.
A recursive method is presented for recovering 3D object shape and
camera motion under orthography from an extended sequence of video images.
This may be viewed as a natural extension of both the original (Tomasi
and Kanade 1994) and the sequential (Morita and Kanade 1997) factorization
methods. A critical aspect of these factorization approaches is the estimation
of the so-called shape space (Morita and Kanade 1997), and they may in
part be characterised by the manner in which this subspace is computed.
If P points are tracked through F frames, the proposed recursive least-squares
method updates the shape space with complexity O(P) per frame. In contrast,
the sequential factorization method updates the shape space with complexity
O(P^2) per frame. The original factorization methoqd is intended to be
used in batch mode using points tracked across all available frames. It
effectively computes the shape space with complexity O(FP^2) after F frames.
Unlike other methods, the recursive approach does not require the estimation
or updating of a large measurement or covariance matrix. Experiments with
real and synthetic image sequences confirm the recursive method's low computational
complexity and accuracy, and indicate that it is well suited to real-time
applications.
We consider the problem of metrically reconstructing a scene viewed
by a moving stereo head. The head comprises two cameras with coplanar optical
axes arranged on a lateral rig, each camera being free to vary its angle
of vergence. Under various constraints, we derive novel explicit forms
for the epipolar equation, and show that a static stereo head constitutes
a degenerate camera configuration for carrying out self-calibration. The
situation is retrieved by consideration of a stereo head undergoing ground
plane motion, and new closed-form solutions for self-calibration are derived.
An error analysis reveals that reconstruction is adversely affected by
inward-facing camera vergence angles that are similar in value, and by
a principal point location whose horizontal component is in error. It is
also shown that the adoption of domain-specific robust techniques for computation
of the fundamental matrix can significantly improve the quality of scene
reconstruction. Experiments conducted with dynamic stereo head images confirm
that avoidance of near-degenerate configurations and use of robustness
techniques are essential if reliable reconstructions are in future to be
attained.
Robust techniques are developed for determining structure from motion
in the uncalibrated case. The structure recovery is based on previous work
of the authors in which it was shown that a camera undergoing unknown motion
and having an unknown, and possibly varying, focal length can be self-calibrated
via closed-form expressions in the entries of two matrices derivable from
an instantaneous optical flow field. Critical to the recovery process is
the obtaining of accurate numerical estimates, up to a scalar factor, of
these matrices in the presence of noisy optical flow data. We present techniques
for the determination of these matrices via least-squares methods, and
also a way of enforcing a dependency constraint that is imposed on these
matrices. A method for eliminating outlying flow vectors is also given.
Results of experiments with real-image sequences are presented that suggest
that the approach holds promise.
Detecting background changes in scenes containing significant numbers
of moving objects has several applications in video surveillance. One important
example is the detection of suspicious packages left in busy airport terminals
or train stations. This paper outlines a statistical approach to automatically
detecting long term changes to the stationary component of a scene, and
describes a prototype system which has been used to successfully demonstrate
the feasibility of this approach.
In this paper, we analyse the motion of a camera having free intrinsic
parameters. We define a free parameter to be one that is unknown and may
vary continuously. A time-dependent epipolar equation is presented, followed
by a formal definition of the time-derivative of the fundamental matrix
for the case of a mobile camera. Next, differential forms of the epipolar
equation are obtained. This may be seen as a recasting of the recent work
of Vieville and Faugeras into an analytical framework. Critical to the
approach is the determination, to within a common scalar factor, of two
special matrices from optical flow data. The case of a camera with free
focal length undergoing arbitrary motion is then considered in detail.
Closed-form expressions are given, in terms of the entries of the two matrices,
for the ego-motion parameters, as well as the focal length and its derivative.
Various computational techniques have been developed that perform
reasonably well in inferring shape from shading. However, these techniques
typically require substantial prerequisite information if they are to evolve
an estimate of surface shape. It is therefore interesting to consider how
depth might be inferred from shading information without prior knowledge
of various scene conditions. One approach has been to undertake a pre-processing
step of estimating the light-source direction, thereby providing input
to the computation of shape from shading. In this paper, we present evidence
that a versatile light-source-direction estimator is unattainable, and
propose that, in the absence of domain-specific knowledge, shape and light-source
direction should be determined in a coupled manner.
This paper examines the pioneering method of Pentland for automatically
estimating the direction of the ``sun'' from a single image. It is shown
that, under the assumptions used in the derivation of the method, the estimate
of source direction is erroneous. Specifically, it is shown that an image-based
expression used in calculating source direction diverges to infinity as
the density of image points is increased, and that the formula involving
this expression is therefore incorrect. When the method is implemented,
the flaw manifests itself in the undesirable dependence of the estimator
upon image resolution. Supporting experimental evidence is given for this.
An alternative source-direction estimator is proposed which is free of
these drawbacks.