This page uses XML, which Netscape doesn't support properly, so there's an html version of this page at http://www.cs.adelaide.edu.au/~hengel/Vision/Papers/Publications.html, but it's not guaranteed to be up to date. A better solution would just be to use a more XML capable browser.

Publications

Middleware for Distributed Video Surveillance

Henry Detmold, Anton van den Hengel, Anthony Dick, Katrina Falkner, David Munro, Ron Morrison

Video surveillance networks are a class of sensor networks that serve several purposes, including protecting major facilities from terrorism and other threats. At the hardware level, standard IP networking devices and IP video cameras enable building thousand-camera networks at a reasonable cost. However, monitoring surveillance networks through human inspection is expensive and remarkably ineffective. Trained operators lose concentration and miss a high percentage of significant events after only 10 minutes. Consequently, surveillance users are turning to software for automated video surveillance.1 Most research in this area concentrates on the computer vision algorithms required to detect and interpret activity in video. Such work is limited to networks of less than 100 cameras. We need to address the real-world issues raised by scaling to thousands of cameras and integrating a diverse, evolving collection of surveillance approaches into continuously operating surveillance networks.

IEEE Distributed Systems Online, vol. 9, no. 2, 2008, art. no. 0802-o2001

The paper is available as PDF


Fast global kernel density mode seeking: Applications to localisation and tracking

C. Shen, M.J. Brooks, A. van den Hengel

Tracking objects in video using the mean shift (MS) technique has been the subject of considerable attention. In this work, we aim to remedy one of its shortcomings. MS, like other gradient ascent optimization methods, is designed to find local modes. In many situations, however, we seek the global mode of a density function. The standard MS tracker assumes that the initialization point falls within the basin of attraction of the desired mode. When tracking objects in video this assumption may not hold, particularly when the target’s displacement between successive frames is large. In this case, the local and global modes do not correspond and the tracker is likely to fail. A novel multibandwidth MS procedure is proposed which converges to the global mode of the density function, regardless of the initialization point. We term the procedure annealed MS, as it shares similarities with the annealed importance sampling procedure. The bandwidth of the procedure plays the same role as the temperature in conventional annealing. We observe that an over-smoothed density function with a sufficiently large bandwidth is unimodal. Using a continuation principle, the influence of the global peak in the density function is introduced gradually. In this way, the global maximum is more reliably located. Since it is imperative that the computational complexity is minimal for real-time applications, such as visual tracking, we also propose an accelerated version of the algorithm. This significantly decreases the number of iterations required to achieve convergence.We show on various data sets that the proposed algorithm offers considerable promise in reliably and rapidly finding the true object location when initialized from a distant point.

IEEE Transactions in Image Processing, 16(5), pp. 1457-1469, May 2007

The paper is available as PDF 1.7M


VideoTrace: Rapid interactive scene modelling from video

A. van den Hengel, A. Dick, T. Thormählen, B. Ward, and P. H. S. Torr

VideoTrace is a system for interactively generating realistic 3D models of objects from video—models that might be inserted into a video game, a simulation environment, or another video sequence. The user interacts with VideoTrace by tracing the shape of the object to be modelled over one or more frames of the video. By interpreting the sketch drawn by the user in light of 3D information obtained from computer vision techniques, a small number of simple 2D interactions can be used to generate a realistic 3D model. Each of the sketching operations in VideoTrace provides an intuitive and powerful means of modelling shape from video, and executes quickly enough to be used interactively. Immediate feedback allows the user to model rapidly those parts of the scene which are of interest and to the level of detail required. The combination of automated and manual reconstruction allows VideoTrace to model parts of the scene not visible, and to succeed in cases where purely automated approaches would fail.

ACM Transactions on Graphics, 26(3), Article No. 86, July 2007

The paper is available as PDF 541kB


Thrift: Local 3D Structure Recognition, ,

Alex Flint, Anthony Dick, Anton van den Hengel

This paper presents a method for describing and recognising local structure in 3D images. The method extends proven techniques for 2D object recognition in images. In particular, we propse a 3D interest point detector that is based on SURF, and a 3D descriptor that extends SIFT. The method is applied to the problem of detecting repeated structure in range images, and promising results are reported.

9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (Dec. 2007 : Glenelg, Australia), DICTA '07, Adelaide, Australia, December 2007.

The paper is available as PDF 1.3MB


Interactive 3D Model Completion, Digital Image Computing: Techniques and Applications, Adelaide, Australia, December 2007.

A. van den Hengel, A. Dick, T. Thormaehlen, B. Ward, P.H.S. Torr

A common problem when using automated structure from motion techniques is that the object to be modelled can only be partially reconstructed from the video. This can occur because not all of the object is visible in the video, or because of featureless or ambiguous regions on the object’s surface. In this paper we present an interactive method for rapidly and intuitively generating a complete 3D model from the output of a structure and motion algorithm. The method combines information obtained from the video data with the partial 3D model and user interaction. It is demonstrated on video containing partially seen objects, including planar and curved surfaces, and indoor and outdoor settings.

9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (Dec. 2007 : Glenelg, Australia), DICTA '07, Adelaide, Australia, December 2007.

The paper is available as PDF 1.2MB


A Shape Hierarchy for 3D Modelling from Video, ,

A. van den Hengel, A. Dick, T. Thormaehlen, B. Ward, P. H. S. Torr

This paper describes an interactive method for generating a model of a scene from image data. The method uses the camera parameters and point cloud typically generated by structure-and-motion estimation as a starting point for developing a higher level model, in which the scene is represented as a set of parameterised shapes. Classes of shapes are represented in a hierarchy which defines their properties but also the method by which they are localised in the scene, using a combination of user interaction, sampling and optimisation. Relations between shapes, such as adjancency and alignment, are also specified interactively. The method thus provides a modelling process which requires the user to provide only high level scene information, the remaining detail being provided through geometric analysis of the image set. This mixture of guided, yet automated, fitting techniques allows a non-expert user to rapidly and intuitively create a visually convincing 3D model of a real world scene from an image set.

5th International Conference on Computer Graphics and Interactive Techniques in Australasia and Southeast Asia (GRAPHITE’07), December, 2007

The paper is available as PDF 560kB


RATSAC: a method for adaptive accelerated robust estimation, and its application to video synchronisation

D.W. Pooley, M.J. Brooks, A. van den Hengel

A new method for robust estimation is introduced. The presented algorithm seeks to unify adjustable per-iteration speedup methods with an adaptive assumption regarding the number of inliers. This is achieved by assuming a prior distribution on the true number of inliers, and using Bayesian inference to adjust the speedup whenever a best-so-far model estimate is found. Convincing results are obtained for both synthetic and real cases of the robust synchronisation of video pairs generated by independently moving cameras

9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (Dec. 2007 : Glenelg, Australia), DICTA '07, Adelaide, Australia, December 2007.

The paper is available as PDF


An adaptive Bayesian technique for tracking multiple objects

P. Kumar, M.J. Brooks, A. van den Hengel

Robust tracking of objects in video is a key challenge in computer vision with applications in automated surveillance, video indexing, human-computer-interaction, gesture recognition, traffic monitoring, etc. Many algorithms have been developed for tracking an object in controlled environments. However, they are susceptible to failure when the challenge is to track multiple objects that undergo appearance change to due to factors such as variation in illumination and object pose. In this paper we present a tracker based on Bayesian estimation, which is relatively robust to object appearance change, and can track multiple targets simultaneously in real time. The object model for computing the likelihood function is incrementally updated and uses background-foreground segmentation information to ameliorate the problem of drift associated with object model update schemes. We demonstrate the efficacy of the proposed method by tracking objects in image sequences from the CAVIAR dataset.

Int. Conf. Patt. Recog. and Mach. Intell. (ICPRMI '07), Kolkata, India, December 2007.

The paper is available as PDF 2.9M


Finding Camera Overlap in Large Surveillance Networks

Anton van den Hengel, Anthony Dick, Henry Detmold, Alex Cichowski and Rhys Hill

Recent research on video surveillance across multiple cam- eras has typically focused on camera networks of the order of 10 cameras. In this paper we argue that existing systems do not scale to a network of hundreds, or thousands, of cameras. We describe the design and deploy- ment of an algorithm called exclusion that is specifically aimed at finding correspondence between regions in cameras for large camera networks. The information recovered by exclusion can be used as the basis for other surveillance tasks such as tracking people through the network, or as an aid to human inspection. We have run this algorithm on a campus net- work of over 100 cameras, and report on its performance and accuracy over this network.

8th Asian Conference on Computer Vision, Tokyo, Japan, November 2007

The paper is available as PDF 600kB


Preparing for Post-catastrophe Video Processing

A. van den Hengel, H. Detmold, A. Dick, R. Hill

: The spread of video surveillance systems and video phones means that video cameras are more ubiquitous than ever before. In the event of a catastrophe the video captured by these cameras is an important source of information for those directing the response. This information is currently either ignored because it is seen as inaccessible, filtered through the media, or requires a huge commitment of human resources in order to perform the required processing. We present here an approach towards acquiring and processing this video in order to extract as much value as possible within the shortest time period. The approach presented allows flexible, intelligent video processing to be carried out quickly and securely.

RNSA Security Technology Conference, Melbourne, Australia, September 2007

The paper is available as PDF 166kB


Topology estimation for thousand-camera surveillance networks

Henry Detmold, Anton van den Hengel, Anthony Dick, Alex Cichowski, Rhys Hill, Ekim Kocadag, Katrina Falkner and David S. Munro

Surveillance camera technologies have reached the point whereby networks of a thousand cameras are not uncommon. Systems for collecting and storing the video generated by such networks have been deployed operationally, and sophisticated methods have been developed for interrogating individual video streams. The principal contribution of this paper is a scalable method for processing video streams collectively, rather than on a per camera basis, which enables a coordinated approach to large-scale video surveillance. To realise our ambition of thousand camera automated surveillance networks, we use distributed processing on a dedicated cluster. Our focus is on determining activity topology -the paths objects may take between cameras' fields of view. An accurate estimate of activity topology is critical to many surveillance functions, including tracking targets through the network, and may also provide a means for partitioning of distributed surveillance processing. We present several implementations using the exclusion algorithm to determine activity topology. Measurements reported for the key system component demonstrate scalability to networks with a thousand cameras. Whole-system measurements are reported for actual operation on over a hundred camera streams (this limit is based on the number of cameras and computers presently available to us, not scalability). Finally, we explore how to scale our approach to support multi-thousand camera networks.

First ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC-07), Vienna, Austria, September 2007

The paper is available as PDF 1.14MB


Determining the Translational Speed of a Camera from Time-Varying Optical Flow

Anton van den Hengel, Wojciech Chojnacki, and Michael J. Brooks

Under certain assumptions, a moving camera can be self-calibrated solely on the basis of instantaneous optical flow. However, due to a fundamental indeterminacy of scale, instantaneous optical flow is insufficient to determine the magnitude of the camera’s translational velocity. This is equivalent to the baseline length indeterminacy encountered in conventional stereo self-calibration. In this paper we show that if the camera is calibrated in a certain weak sense, then, by using time-varying optical flow, the velocity of the camera may be uniquely determined relative to its initial velocity. This result enables the calculation of the camera’s trajectory through the scene over time. A closed-form solution is presented in the continuous realm, and its discrete analogue is experimentally validated.

1st International Workshop on Complex Motion, Gunzburg, Germany, Springer-Verlag LNCS 3417, pp. 190-197, 2007

The paper is available as PDF 300kB


Middleware for Video Surveillance Networks

Henry Detmold, Anthony Dick, Katrina Falkner, David Munro, Anton van den Hengel, Ron Morrison

Automated video surveillance networks are a class of sensor networks with the potential to enhance the protection of facilities such as airports and power stations from a wide range of threats. However, current systems are limited to networks of tens of cameras, not the thousands required to protect major facilities. Realising thousand camera automated surveillance networks demands middleware and architectural support; replacing the ad hoc approaches used in current systems with robust and scalable methods.This paper introduces middleware supporting both computation and communication in automated video surveillance networks. The computational approach is based on the Blackboard architectural style, which is widely used in signal processing and AI. Communication on the surveillance network follows the service oriented model, with publish/subscribe messaging; providing scalability, availability and the ability to integrate separately developed surveillance services. The middleware is demonstrated through its application to an important class of surveillance algorithms.

Middleware for Sensor networks (MidSens2006), November, 2006, Melbourne, Australia.

The paper is available as PDF 477kB


Generalised Principal Component Analysis: Exploiting Inherent Parameter Constraints

W. Chojnacki, A. van den Hengel, M. Brooks

Generalised Principal Component Analysis (GPCA) is a recently devised technique for fitting a multi-component, piecewise-linear structure to data that has found strong utility in computer vision. Unlike other methods which intertwine the processes of estimating structure components and segmenting data points into clusters associated with putative components, GPCA estimates amulticomponent structure with no recourse to data clustering. The standard GPCA algorithm searches for an estimate by minimising a simple algebraic misfit function. The underlying constraints on the model parameters are ignored. Here we promote a variant of GPCA that incorporates the parameter constraints and exploits constrained rather than unconstrained minimisation of a statistically motivated error function. The output of any GPCA algorithm hardly ever perfectly satisfies the parameter constraints. Our new version of GPCA greatly facilitates the final correction of the algorithm output to satisfy perfectly the constraints, making this step less prone to error in the presence of noise. The method is applied to the example problem of fitting a pair of lines to noisy image points, but has potential for use in more general multi-component structure fitting in computer vision.

Advances in Computer Graphics and Computer Vision International Conferences VISAPP and GRAPP 2006, Setúbal, Portugal, February 25-28, 2006, Revised Selected Papers

The paper is available as PDF 600kB


Rapid Interactive Modelling from Video with Graph Cuts, ,

Anton van den Hengel, Anthony Dick, Thorsten Thormählen, Ben Ward, and Philip H. S. Torr

We present a method for generating a parameterised model of a scene from a set of images. The method is novel in that it uses information from several sources—video, sparse 3D points and user input—to fit models to a scene. The user drives the process by providing selected high-level scene information, for instance selecting an object in the scene, or specifying the relationship between a pair of objects. The system combines this information with image and 3D data to dynamically update its model of the scene. In doing so it avoids common pitfalls of both automatic structure and motion algorithms, and image-based modelling packages.

Eurographics 2006, September 2006, Vienna, Austria.

The paper is available as PDF 1.4M


Scalable Surveillance Software Architecture,

Henry Detmold, Anthony Dick, Katrina Falkner, David S. Munro, Anton van den Hengel, Ron Morrison

Video surveillance is a key technology for enhanced protection of facilities such as airports and power stations from various types of threat. Networks of thousands of IP-based cameras are now possible, but current surveillance methodologies become increasingly ineffective as the number of cameras grows. Constructing software that efficiently and reliably deals with networks of this size is a distributed information processing problem as much as it is a video interpretation challenge. This paper demonstrates a software architecture approach to the construction of large scale surveillance network software and explores the implications for instantiating surveillance algorithms at such a scale. A novel architecture for video surveillance is presented, and its efficacy demonstrated through application to an important class of surveillance algorithms.

IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS 2006), November, 2006, Sydney, Australia.

The paper is available as PDF


Building Models of Regular Scenes from Structure-and-Motion,

Anton van den Hengel, Anthony Dick, Thorsten Thormählen, Ben Ward, Philip H. S. Torr

This paper describes a method for generating a model-based reconstruction of a scene from image data. The method uses the camera models and point cloud typically generated by a structure-and-motion process as a starting point for developing a higher level model of the scene. The method relies on the user to provide a minimal amount of structural seeding information from which more complex geometry is extrapolated. The regularity typically present in man-made environments is used to minimise the interaction required, but also to improve the accuracy of fit. We demonstrate model based reconstructions obtained using this method.

The Seventeenth British Machine Vision Conference (BMVC 2006), September 2006, Edinburgh, United Kingdom

The paper is available as PDF 2.7M


Hierarchical model fitting to 2D and 3D data

A. van den Hengel, A. Dick, T. Thormaehlen, B. Ward, P. H. S. Torr

We propose a method for interactively generating a model-based reconstruction of a scene from a set of images. The method facilitates the fitting of multiple object models to the data in a manner that provides the best overall fit to the image set. This requires that models are not fit independently, but rather collectively, each potentially impacting upon the fit of the other.

Third International Conference on Computer Graphics, Imaging and Visualisation, IEEE Computer Society Press, July 2006, Sydney, Australia

The paper is available as PDF 264kB


Utilising the broad range of information which human observers bring to bear when interpreting their visual environment is currently infeasible for artificial vision systems.We propose instead a method for modelling compound structures which intelligently divides this prior information into that which may be applied by the system and that which may not. Models are fitted to the input data on the basis of 2D and 3D image-based measures, but also as directed by a prior which is split between the human and the system. Importantly this split is carried out in a manner which minimises the human input required.

International Workshop on the Representation and Use of Prior Knowledge in Vision (WRUPKV) held in association with ECCV’06, May 2006, Graz, Austria. Also selected for publication in LNCS

The paper is available as PDF 1.6M


Fast global kernel density mode seeking: applications to localisation and tracking, ,

C. Shen, M. J. Brooks, A. van den Hengel

We address the problem of seeking the global mode of a density function using the mean shift algorithm. Mean shift, like other gradient ascent optimisation methods, is susceptible to local maxima, and hence often fails to find the desired global maximum. In this work, we propose a multi-bandwidth mean shift procedure that alleviates this problem, which we term annealed mean shift, as it shares similarities with the annealed importance sampling procedure. The bandwidth of the algorithm plays the same role as the temperature in annealing. We observe that the over-smoothed density function with a sufficiently large bandwidth is uni-modal. Using a continuation principle, the influence of the global peak in the density function is introduced gradually. In this way the global maximum is more reliably located. Generally, the price of this annealing-like procedure is that more iterations are required. Since it is imperative that the computation complexity is minimal in real-time applications such as visual tracking. We propose an accelerated version of the mean shift algorithm. Compared with the conventional mean shift algorithm, the accelerated mean shift can significantly decrease the number of iterations required for convergence. The proposed algorithm is applied to the problems of visual tracking and object localisation. We empirically show on various data sets that the proposed algorithm can reliably find the true object location when the starting position of mean shift is far away from the global maximum, in contrast with the conventional mean shift algorithm that will usually get trapped in a spurious local maximum.

IEEE International Conference on Computer Vision (ICCV'05), Beijing, China, Oct. 2005.

The paper is available as PDF


Computing Surface-based Photo-Consistency on Graphics Hardware

J Bastian, A. van den Hengel

This paper describes a novel approach to the problem of recovering information from an image set by comparing the radiance of hypothesised point correspondences. Our algorithm is applicable to a number of problems in computer vision, but is explained particularly in terms of recovering geometry from an image set. It uses the idea of photo-consistency to measure the confidence that a hypothesised scene description generated the reference images. Photo-consistency has been used in volumetric scene reconstruction where a hypothesised surface is evolved by considering one voxel at a time. Our approach is different: it represents the scene as a parameterised surface so decisions can be made about its photo-consistency simultaneously over the entire surface rather than a series of independent decisions. Our approach is further characterised by its ability to execute on graphics hardware. Experiments demonstrate that our cost function minimises at the solution and is not adversely affected by occlusion.

Proceedings of Digital Image Computing: Techniques and Applications, December 2005, Cairns, Australia.

The paper is available as PDF


Constrained Generalised Principal Component Analysis,

W. Chojnacki, A. van den Hengel, M. Brooks

Generalised Principal Component Analysis (GPCA) is a recently devised technique for fitting a multi-component structure to data. Unlike other methods which intertwine the processes of estimating structure components and segmenting data points into clusters associated with putative components, GPCA estimates a multi-component structure with no recourse to data clustering. The standard GPCA algorithm searches for an estimate by minimising an appropriate misfit function. The underlying constraints on the model parameters are ignored. Here we promote a variant of GPCA that incorporates the parameter constraints and exploits constrained rather than unconstrained minimisation of the error function. The output of any GPCA algorithm hardly ever perfectly satisfies the parameter constraints. The new version of GPCA greatly facilitates the final correction of the algorithm output to satisfy perfectly the constraints, making this step less prone to error in the presence of noise. The method is applied to the problem of fitting a pair of lines to noisy image points, but has potential for use in more general multi-component structure fitting.

Proceedings of International Conference on Computer Vision Theory and Applications, February 2006, Setúbal, Portugal

The paper is available as PDF 164kB


Augmented particle filtering for efficient visual tracking

Chunhua Shen, Michael J. Brooks, and Anton van den Hengel

Visual tracking is one of the key tasks in computer vision. The particle filter algorithm has been extensively used to tackle this problem due to its flexibility. However the conventional particle filter uses system transition as the proposal distribution, frequently resulting in poor priors for the filtering step. The main reason is that it is difficult, if not impossible, to accurately model the target's motion. Such a proposal distribution does not take into account the current observations. It is not a trivial task to devise a satisfactory proposal distribution for the particle filter. In this paper we advance a general augmented particle filtering framework for designing the optimal proposal distribution. The essential idea is to augment a second filter's estimate into the proposal distribution design. We then show that several existing improved particle filters can be rationalised within this general framework. Based on this framework we further propose variant algorithms for robust and efficient visual tracking. Experiments indicate that the augmented particle filters are more effcient and robust than the conventional particle filter.

IEEE International Conference on Image Processing (ICIP'05)

The paper is available as PDF 915K


Visual tracking via efficient kernel discriminant subspace learning

Chunhua Shen, Anton van den Hengel, Michael J. Brooks

Robustly tracking moving objects in video sequences is one of the key problems in computer vision. In this paper we introduce a computationally efficient nonlinear kernel learning strategy to find a discriminative model which distinguishes the tracked object from the background. Principal Component Analysis and Linear Discriminant Analysis have been applied to this problem with some success. These techniques are limited, however, by the fact that they are capable only of identifying linear subspaces within the data. Kernel based methods, in contrast, are able to extract nonlinear subspaces, and thus represent more complex characteristics of the tracked object and background. This is a particular advantage when tracking deformable objects and where appearance changes due to the unstable illumination and pose occur. An efficient approximation to Kernel Discriminant Analysis using QR decomposition proposed by Xiong et al. makes possible real-time updating of the optimal nonlinear subspace. We present a tracking method based on this result and show promising experimental results on real videos undergoing large pose and illumination changes.

IEEE International Conference on Image Processing (ICIP'05)

The paper is available as PDF 447K


Enhanced importance sampling: unscented auxiliary particle filtering for visual tracking

Chunhua Shen, Anton van den Hengel, Anthony Dick and Michael J. Brooks

The particle filter has attracted considerable attention in vi- sual tracking due to its relaxation of the linear and Gaussian restrictions in the state space model. It is thus more flexible than the Kalman filter. However, the conventional particle filter uses system transition as the proposal distribution, leading to poor sampling efficiency and poor performance in visual tracking. It is not a trivial task to design satisfactory proposal distributions for the particle filter. In this paper, we introduce an improved particle filtering framework into visual tracking, which combines the unscented Kalman filter and the auxiliary particle filter. The efficient unscented auxiliary particle filter (UAPF) uses the unscented transformation to predict one-step ahead likelihood and produces more reasonable proposal distributions, thus reducing the number of particles required and substantially improving the tracking performance. Experiments on real video sequences demonstrate that the UAPF is computationally efficient and outperforms the conventional particle filter and the auxiliary particle filter.

17th Australian Joint Conference on Artificial Intelligence (AI'04)

The paper is available as PDF 588K


2D articulated tracking with dynamic Bayesian networks

Chunhua Shen, Anton van den Hengel, Anthony Dick and Michael J. Brooks

We present a novel method for tracking the motion of an articulated structure in a video sequence. The analysis of articulated motion is challenging because of the potentially large number of degrees of freedom (DOFs) of an articulated body. For particle filter based algorithms, the number of samples required with high dimensional problems can be computationally prohibitive. To alleviate this problem, we represent the articulated object as an undirected graphical model (or Markov Random Field, MRF) in which soft constraints between adjacent subparts are captured by conditional probability distributions. The graphical model is extended across time frames to implement a tracker. The tracking algorithm can be interpreted as a belief inference procedure on a dynamic Bayesian network. The discretisation of the state vectors makes it possible to utilise the efficient belief propagation (BP) and mean field (MF) algorithms to reason in this network. Experiments on real video sequences demonstrate that the proposed method is computationally efficient and performs well in tracking the human body.

4th International Conference on Computer and Information Technology (CIT'04)

The paper is available as PDF 353K


From FNS to HEIV: a link between two vision parameter estimation methods

Wojciech Chojnacki, Michael J. Brooks, Anton van den Hengel, D. Gawley

Problems requiring accurate determination of parameters from image-based quantities arise often in computer vision. Two recent, independently developed frameworks for estimating such parameters are the FNS scheme of the authors, and the HEIV scheme of Leedan and Meer. In this paper, it is shown that the two schemes constitute intimately related but different means of numerically solving a common underlying equation characterising the minimiser. The analysis is driven by the search for a non-degenerate form of a certain generalised eigenvalue problem, and this effectively leads to a new derivation of the HEIV algorithm. This work may be seen as an extension of the authors' previous efforts to rationalise and inter-relate a spectrum of estimators, including the renormalisation method of Kanatani and the normalised eight-point method of Hartley.

IEEE Trans. Pattern Analysis Machine Intelligence, 26, 2, pp. 264-268, Feb. 2004.

The paper is available as PDF 111K


A new approach to constrained parameter estimation applicable to some computer vision problems

Wojciech Chojnacki, Michael J. Brooks, Anton van den Hengel, D. Gawley

Previous work of the authors developed a theoretically well-founded scheme (FNS) for finding the minimiser of a class of cost functions. Various problems in video analysis, stereo vision, ellipse-fitting, etc, may be expressed in terms of finding such a minimiser. However, in common with many other approaches, it is necessary to correct the minimiser as a post-process if an ancillary constraint is also to be satisfied. In this paper we develop the first integrated scheme (CFNS) for simultaneously minimising the cost function and satisfying the constraint. Preliminary experiments in the domain of fundamental-matrix estimation show that CFNS generates rank-2 estimates with smaller cost function values than rank-2 corrected FNS estimates. Furthermore, when compared with the Hartley- Zisserman Gold Standard method, CFNS is seen to generate results of comparable quality in a fraction of the time.

Image and Vision Computing Volume 22, Issue 2 , 1 February 2004, Pages 85-91

The paper is available as PDF 277K


FNS, CFNS and HEIV: Extending Three Vision Parameter Estimation Methods

Wojciech Chojnacki, Michael J. Brooks, Anton van den Hengel, and Darren Gawley

Estimation of parameters from image tokens is a central problem in computer vision. FNS, CFNS and HEIV are three recently developed methods for solving special but important cases of this problem. The schemes are means for .nding unconstrained (FNS, HEIV) and constrained (CFNS) minimisers of cost functions. In earlier work of the authors, FNS, CFNS and a version of HEIV were applied to a speci.c cost function. Here we outline an extension of the approach to more general cost functions. This allows the FNS, CFNS and HEIV methods to be placed within a common framework.

Proceedings of Digital Image Computing: Techniques and Applications, December 2003, Sydney, Australia, pp 663-672

The paper is available as PDF 148K


Computing Image-Based Reprojection Error on Graphics Hardware

J Bastian, A. van den Hengel

This paper describes a novel approach to the problem of recovering information from an image set by comparing the radiance of hypothesised point correspondences. This method is applicable to a number of problems in computer vision, but is explained particularly in terms of recovering geometry and camera parameters from image sets. The algorithm employs a cost-function to represent the probability that a hypothesised scene description and camera parameters generated the reference images and is characterised by its ability to execute on graphics hardware. Experiments show that minimisation of the cost-function converges to a valid solution provided there are adequate geometric constraints and projective coverage.

Proceedings of Digital Image Computing: Techniques and Applications, December 2003, Sydney, Australia, pp 663-672

The paper is available as PDF 1335K


Probabilistic Multiple Cue Integration for Particle Filter Based Tracking

Chunhua Shen, Anton van den Hengel, and Anthony Dick

Robust visual tracking has become an important topic in the field of computer vision. The integration of cues such as color, edge strength and motion has proved to be a promising approach to robust visual tracking in situations where no single cue is suitable. In this paper, an algorithm is presented which integrates multiple cues in a probabilistic manner. Specifically the likelihood of each cue is calculated and weighted before Bayes' rule is applied to obtain the resultant posterior. This posterior is generally not well represented analytically, and is therefore represented as a set of weighted particles, which is updated at each frame by a particle filter. This paper demonstrates how the combination of multiple cue integration and particle filtering results in a robust tracking method. We also demonstrate how each cue's weight can be adapted on-line during the tracking procedure.

Proceedings of Digital Image Computing: Techniques and Applications, December 2003, Sydney, Australia, pp 399-408

The paper is available as PDF 177K


Incorporating constraints into the design of locally identifiable calibration patterns

Anton van den Hengel, Rhys Hill, Michael J. Brooks

Camera calibration requires the identification of points in an image that correspond to known locations in the scene. These are typically determined through the use of a calibration pattern designed to facilitate feature localisation. We present in this paper a novel method of generating patterns such that each subregion is individually identifiable by its cross ratio. The method aims to minimise the probability of misidentifying a subregion. A key advantage of the method is the ability to place constraints on the size of the elements constituting the pattern. This allows a calibration object to be used in a wider variety of viewing conditions, increasing the flexibility of the calibration process.

International Conference on Image Processing, 2003, I: 817-820

The paper is available as PDF 292K


A Voting Scheme For Estimating The Synchrony Of Moving-Camera Videos

D.W. Pooley, M.J. Brooks, A.J. van den Hengel, W. Chojnacki

Recovery of dynamic scene properties from multiple videos usually requires the manipulation of synchronous (simultaneously captured) frames. This paper is concerned with the automated determination of this synchrony when the temporal alignment of sequences is unknown. A cost function characterising departure from synchrony is first evolved for the case in which two videos are generated by cameras that may be moving. A novel voting method is then presented for minimising the cost function in the case where the ratio of the cameras frame rates is unknown. Experimental results indicate this relatively general approach holds promise.

International Conference on Image Processing, 2003, I: 413-416

The paper is available as PDF 321K


Revisiting Hartley's Normalised Eight-Point Algorithm

Wojciech Chojnacki, Michael J. Brooks, Anton van den Hengel, D. Gawley

The paper gives a novel explanation for the improvement in performance of the stereo-vision eight-point algorithm that results from using normalised data. It is first established that the normalised algorithm acts to minimise a specific cost function. It is then shown that this cost function is statistically better founded than the cost function associated with the non-normalised algorithm. This supercedes the standard agument that improved performance is due to the better conditioning of a pivotal matrix. Experimental results are given that support the shift in argument from numerical stability to statistical soundness as a means of rationalising performance. This work continues a wider effort to place a variety of estimation techniques within a coherent framework.

IEEE Trans. Pattern Analysis Machine Intelligence, 25, 9, 2003, pp 1172-1177.

The paper is available as PDF 136K


A new constrained parameter estimator: experiments in fundamental matrix computation

A. van den Hengel, W. Chojnacki, M. J. Brooks, D. Gawley

In recent work the authors proposed a wide-ranging method for estimating parameters that constrain image feature locations and satisfy a constraint not involving image data. The present work illustrates the use of the method with experiments concerning estimation of the fundamental matrix. Results are given for both synthetic and real images. It is demonstrated that the method gives results commensurate with, or superior to, previous approaches, with the advantage of being fast.

In Proceedings of the 13th British Machine Vision Conference, September, 2002, volume 2, pp 468-476, 2002.

The paper is available as PDF 184K


A new approach to constrained parameter estimation applicable to some computer vision problems

W. Chojnacki, M. J. Brooks, A. van den Hengel, D. Gawley,

Previous work of the authors developed a theoretically well-founded scheme (FNS) for finding the minimiser of a class of cost functions. Various problems in video analysis, stereo vision, ellipse-fitting, etc, may be expressed in terms of finding such a minimiser. However, in common with many other approaches, it is necessary to correct the minimiser as a post-process if an ancillary constraint is also to be satisfied. In this paper we develop the first integrated scheme (CFNS) for simultaneously minimising the cost function and satisfying the constraint. Preliminary experiments in the domain of fundamental-matrix estimation show that CFNS generates rank-2 estimates with smaller cost function values than rank-2 corrected FNS estimates. Furthermore, when compared with the Hartley-Zisserman Gold Standard method, CFNS is seen to generate results of comparable quality in a fraction of the time.

In D. Suter, editor, Statistical Methods in Video Processing Workshop held in conjunction with ECCV'02, Copenhagen, Denmark, June 1-2, 2002.

The paper is available as PDF 272K


What value covariance information in estimating vision parameters?

M. J. Brooks, W. Chojnacki, D. Gawley, A. van den Hengel

Many parameter estimation methods used in computer vision are able to utilise covariance information describing the uncertainty of data measurements. This paper considers the value of this information to the estimation process when applied to measured image point locations. Covariance matrices are first described and a procedure is then outlined whereby covariances may be associated with image features located via a measurement process. An empirical study is made of the conditions under which covariance information enables generation of improved parameter estimates. Also explored is the extent to which the noise should be anisotropic and inhomogeneous if improvements are to be obtained over covariance-free methods. Critical in this is the devising of synthetic experiments under which noise conditions can be precisely controlled. Given that covariance information is, in itself, subject to estimation error, tests are also undertaken to determine the impact of imprecise covariance information upon the quality of parameter estimates. Finally, an experiment is carried out to assess the value of covariances in estimating the fundamental matrix from real images.

International Conference on Computer Vision, Vancouver, July 2001.

The paper is available as PDF 117K


Rationalising the Renormalisation Method of Kanatani

W. Chojnacki, M. J. Brooks, A. van den Hengel

The renormalisation technique of Kanatani is intended to iteratively minimise a cost function of a certain form while avoiding systematic bias inherent in the common method of minimisation due to Sampson. Within the vision community, the technique has generally been perceived as somewhat controversial and impenetrable. This work presents an alternative, simpler derivation of the technique, along with new insights that place it in the context of other approaches. We first show that the minimiser of the cost function must satisfy a special variational equation. A Newton-like, fundamental numerical scheme is presented with the property that its theoretical limit coincides with the minimiser. Standard statistical techniques are then employed to derive afresh several renormalisation schemes. The fundamental scheme proves pivotal in the rationalising of the renormalisation and other schemes, and enables us to show that the renormalisation schemes do not have as their theoretical limit the desired minimiser. The various minimisation schemes are finally subjected to a rigorous performance analysis.

Journal Mathematical Imaging and Vision, 14, 1, 2001, 21-38.

The paper is available as PDF 188Kb


Is covariance information useful in estimating vision parameters?

M. J. Brooks, W. Chojnacki, A. van den Hengel, D. Gawley

This paper assesses some of the practical ramifications of recent developments in estimating vision parameters given information characterising the uncertainty of the data. This uncertainty information may sometimes be estimated in association with the observation process, and is usually represented in the form of covariance matrices. An empirical study is made of the conditions under which improved parameter estimates can be obtained from data when covariance information is available. We explore, in the case of fundamental matrix estimation and conic fitting, the extent to which the noise should be anisotropic and inhomogeneous if improvements over traditional methods are to be obtained. Critical in this is the devising of synthetic experiments under which noise conditions can be precisely controlled. Given that covariance information is, in itself, subject to estimation error, tests are also undertaken to determine the impact of imprecise covariance information upon the quality of parameter estimates. We thus investigate the consequences for parameter estimation of inaccuracies in the characterisation of noise that inevitably arise in practical computation

SPIE Videometrics, San Jose, Jan. 2001, pp 195-203

The paper is available as PDF 102K


A fast MLE-based method for estimating the fundamental matrix

W. Chojnacki, M. J. Brooks, A. van den Hengel, D. Gawley

We present a novel method for estimating the fundamental matrix, a key problem arising in stereo vision. The method aims to minimise a cost function that is derived from maximum likelihood considerations. The respective minimiser turns out to be significantly more accurate than the familiar algebraic least squares technique. Furthermore, the method is identical in accuracy to a Levenberg-Marquardt minimiser, while proving simpler and faster.

International Conference on Image Processing, Thessoloniki, Oct. 2001

The paper is available as PDF 311K


On the fitting of surfaces to data with covariances

W. Chojnacki, M. J. Brooks, A. van den Hengel, D. Gawley

We consider the problem of estimating parameters of a model described by an equation of special form. Specific models arise in the analysis of a wide class of computer vision problems, including conic fitting and estimation of the fundamental matrix. We assume that noisy data are accompanied by (known) covariance matrices characterising the uncertainty of the measurements. A cost function is first obtained by considering a maximum likelihood formulation, and applying certain necessary approximations that render the problem tractable. A novel, Newton-like iterative scheme is then generated for determining a minimiser of the cost function. Unlike alternative approaches such as Sampson's method or the renormalisation technique, the new scheme has as its theoretical limit the minimiser of the cost function. Furthermore the scheme is simply expressed, efficient, and unsurpassed as a general technique in our testing. An important feature of the method is that it can serve as a basis for conducting theoretical comparison of various estimation approaches.

IEEE Trans. Pattern Analysis Machine Intelligence, 22, 11, Nov. 2000, 1294-1303.

The paper is available as PDF 213K


Fundamental matrix from optical flow: optimal computation and reliability evaluation

K. Kanatani, Y. Shimizu, N. Ohta, M.J. Brooks, W. Chojnacki, A. van den Hengel

The optical flow observed by a moving camera satisfies, in the absence of noise, a special equation analogous to the epipolar constraint arising in stereo vision. Computing the ``flow fundamental matrix'' of this equation is an essential prerequisite to undertaking 3-D analysis of the flow. This paper presents an optimal formulation of the problem of estimating this matrix under an assumed noise model. This model admits independent Gaussian noise that is not necessarily isotropic or homogeneous. A theoretical bound is derived for the accuracy of the estimate. An algorithm is then devised that employs a technique called renormalization to deliver an estimate and then corrects the estimate so as to satisfy a particular decomposability condition. The algorithm also provides an evaluation of the reliability of the estimate. Epipoles and their associated reliabilities are computed in both simulated and real-image experiments. Experiments indicate that the algorithm delivers results in the vicinity of the theoretical accuracy bound.

Journal of Electronic Imaging, 9, 2, April 2000, 194-202.

The paper is available as PDF 487Kb


Rationalising the Renormalisation method of Kanatani

W. Chojnacki, M. J. Brooks, A. van den Hengel

The renormalisation scheme of Kanatani is intended to iteratively minimise a cost function of certain form while avoiding systematic bias inherent in Sampson's method of minimisation. This paper is concerned with enhancing our understanding of Kanatani's complex scheme by expressing it within a novel framework common to several methods. This approach enables us to demonstrate that the renormalisation scheme does not have as its theoretical limit the desired minimiser, in contrast with an alternative, simpler approach presented.

Second International Symposium on Advanced Concepts for Intelligent Vision Systems (ACIVS'00), Baden-Baden (Germany), Aug. 2000, pp. 13-19

The paper is available as PDF (Sorry, this file is temporarily unavailable)


Estimating vision parameters given data with covariances

W. Chojnacki, M. J. Brooks, A. van den Hengel, D. Gawley

A new parameter estimation method is presented, applicable to many computer vision problems. It operates under the assumption that the data (typically image point locations) are accompanied by covariance matrices characterising data uncertainty. An MLE-based cost function is first formulated and a new minimisation scheme is then developed. Unlike Sampson's method or the renormalisation technique of Kanatani, the new scheme has as its theoretical limit the true minimum of the cost function. It also has the advantages of being simply expressed, efficient, and unsurpassed in our comparative testing.

British Machine Vision Conference, Bristol, Sept. 2000, pp. 182-191

The paper is available as PDF 122K


A simplified treatment of Kanatani's renormalisation method

W. Chojnacki, M. J. Brooks, A. van den Hengel

The renormalisation method of Kanatani is intended to find the minimisers of cost functions of a certain form. As such, it has applicability to a wide spectrum of computer vision problems that may be couched in these terms. However, despite its sophistication, the method of Kanatani has been slow to gain broad acceptance, perhaps because of the complexity of its original derivation. In this paper we present an alternative and simpler treatment of this important method

International Conference on Control, Automation, Robotics and Computer Vision (ICARCV 2000), Singapore, Dec. 2000, paper 196

The paper is available as PDF 89K


Incorporating optical flow information into a self-calibration procedure for a moving camera

M.J. Brooks, W. Chojnacki, A. Dick, A. van den Hengel, K. Kanatani, N. Ohta

In this paper we consider robust techniques for estimating structure from motion in the uncalibrated case. We show how information describing the uncertainty of the data may be incorporated into the formulation of the problem, and we explore the situations in which this appears to be advantageous. The structure recovery technique is based on a method for self-calibrating a single moving camera from instantaneous optical flow developed in previous work of some of the authors~\cite{BCB:Det}. The method of self-calibration rests upon an equation that we term the differential epipolar equation for uncalibrated optical flow. This equation incorporates two matrices (analogous to the fundamental matrix in stereo vision) which encode information about the ego-motion and internal geometry of the camera. Any sufficiently large, non-degenerate optical flow field enables the ratio of the entries of the two matrices to be estimated. Under certain assumptions, the moving camera can be self-calibrated by means of closed-form expressions in the entries of these matrices. Reconstruction of the scene, up to a scalar factor, may then proceed using a straightforward method~\cite{BCB:Det}. The critical step in this whole approach is therefore the accurate estimation of the aforementioned ratio. To this end, the problem is couched in a least-squares minimisation framework whereby candidate cost functions are derived via ordinary least squares, total least squares, and weighted least squares techniques. Various computational schemes are adopted for minimising the cost functions. Carefully devised synthetic experiments reveal that when the optical flow field is contaminated with inhomogeneous and anisotropic Gaussian noise, the best performer is the weighted least squares approach with renormalisation

SPIE Electronic Imaging '99, Videometrics VI, San Jose, Jan. 1999

The paper is available as PDF 225K


Incorporating the epipolar constraint into a multiresolution algorithm for stereo image matching

J. Magarey, A. Dick, M.J. Brooks, G. Newsam, A. van den Hengel

We present a new algorithm for matching calibrated stereo image pairs. Matching is achieved within a multiresolution framework that utilises a complex wavelet transform of each image. Integrated within this coarse-to-fine approach is a regularisation step at each resolution. Calibration information, in the form of the epipolar constraint, is incorporated into the regularisation step . Uncertainty in the calibration parameters can be accommodated. This composite matching scheme avoids the need for expensive prior resampling of one image along epipolar lines and yields excellent results in various tests with real images.

Applied Informatics '99, 17th IASTED International Conference, Innsbruck, Feb. 1999.

The paper is available as PDF 666K


Fitting surfaces to data with covariance information: fundamental methods applicable to computer vision

W. Chojnacki, M. J. Brooks, A. van den Hengel

We are concerned with solving an equation whose form is applicable to a wide class of problems arising in computer vision. The equation typically relates image point locations to the parameters of some appropriate model. We assume that each measured datum is accompanied by a covariance matrix that characterises the uncertainty of the measurement. Noisy data are assumed to be in plentiful supply, implying that our problem is overdetermined. To tackle noise, the problem is transformed to one of least squares minimisation. In this sense, we are concerned with fitting a surface to data and their covariances. Examples are given of computer vision problems whose forms constitute instances of our general equation. The paper has two principal concerns: the establishing of a suitable cost function for our general problem, and the deriving of effective schemes for minimising the cost function. A weighted least squares (WLS) cost function is obtained by considering an optimal maximum likelihood formulation, and applying certain necessary approximations that render the problem tractable. A new and fundamental Newton-like iterative scheme is then generated for directly minimising the WLS cost function. This proves valuable in the deriving afresh of various existing and modified schemes, and helps us to show that the renormalisation approaches of Kanatani do not theoretically act to minimise the WLS cost function. A portion of this work serves to rationalise renormalisation, and several new variations on the theme are proposed. Various minimisation schemes are then tested. Experiments are carried out on the benchmark conic fitting problem of estimating ellipses from synthetic data points and their covariances. When the data exhibit noise that is anisotropic and inhomogeneous, those methods that make use of covariance information perform markedly better than more traditional methods that do not. None of the methods outperforms the fundamental scheme. Thus, being in addition simply expressed and constituting a genuine minimiser of the WLS cost function, the fundamental scheme offers strong advantages over the alternatives considered.

TR99-03, Department of Computer Science, University of Adelaide, August 1999.

The paper is available as PDF 256K


Robust determination of structure from motion in the uncalibrated case

M.J. Brooks, W. Chojnacki, A. van den Hengel, L. Baumela

Robust techniques are developed for determining structure from motion in the uncalibrated case. The structure recovery is based on previous work of the authors in which it was shown that a camera undergoing unknown motion and having an unknown, and possibly varying, focal length can be self-calibrated via closed-form expressions in the entries of two matrices derivable from an instantaneous optical flow field. Critical to the recovery process is the obtaining of accurate numerical estimates, up to a scalar factor, of these matrices in the presence of noisy optical flow data. We present techniques for the determination of these matrices via least-squares methods, and also a way of enforcing a dependency constraint that is imposed on these matrices. A method for eliminating outlying flow vectors is also given. Results of experiments with real-image sequences are presented that suggest that the approach holds promise.

Proc. Fifth European Conference on Computer Vision - ECCV'98, Freiburg, Germany, June 1998, Lecture Notes in Computer Science (Vol. 1), 1406, Springer Verlag, pp. 281-295

The paper is available as PDF 133K


Robust techniques for the estimation of structure from motion in the uncalibrated case

M. J. Brooks, W. Chojnacki, A. van den Hengel, L. Baumela

Robust techniques are developed for determining structure from motion in the uncalibrated case. It is shown that a camera with unknown and possibly varying focal length and ego-motion can be self-calibrated via closed-form expressions in the elements of two special matrices derivable from an instantaneous optical flow field. Techniques are presented for the robust determination of the two special matrices, up to a scale factor, via least-squares methods; a means of eliminating outlying flow vectors is also described. A method is then given for obtaining a scaled Euclidean 3D-reconstruction from both the optical flow and the previously computed self-calibration parameters. Special camera motions are detailed that preclude self-calibration. Experiments with real-image sequences confirm that the approach holds promise.

IEICE Technical Group Meeting on Pattern Recognition and Media Understanding, December 18--19, 1997, Niigata, Japan

The paper is available as PDF (Sorry, temporarily unavailable)


3D reconstruction from optical flow generated by an uncalibrated camera undergoing unknown motion

M. J. Brooks, W. Chojnacki, A. van den Hengel, L Baumela

A procedure is described for self-calibration of a moving camera from instantaneous optical flow. Under certain assumptions, this procedure allows the ego-motion and some intrinsic parameters of the camera to be determined solely from the instantaneous positions and velocities of a set of image features. The proposed method relies on the use of a differential epipolar equation that relates optical flow to the ego-motion and internal geometry of the camera. The information about the camera's ego-motion and internal geometry enters the differential epipolar equation via two matrices. It emerges that the optical flow determines the composite ratio of some of the entries of the two matrices. It is shown that a camera with unknown focal length undergoing arbitrary motion can be self-calibrated via closed-form expressions in the composite ratio. The corresponding formulae specify five ego-motion parameters, as well as the focal length and its derivative. An accompanying procedure is presented for reconstructing the viewed scene, up to a scale factor, from the derived self-calibration parameters and the optical flow data. Various least-squares techniques and an outlier rejection scheme are presented to facilitate robust estimation of the critical composite ratio. Experimental results are given that suggest the approach holds promise.

International Workshop on Image Analysis and Information Fusion, November 1997, Adelaide, Australia

The paper is available as PDF 213K


Robust estimation of structure from motion in the uncalibrated case

A. van den Hengel

A picture of a scene is a 2-dimensional representation of a 3-dimensional world. In the process of projecting the scene onto the 2-dimensional image plane, some of the information about the 3-dimensional scene is inevitably lost. Given a series of images of a scene, typically taken by a video camera, it is sometimes possible to recover some of this lost 3-dimensional information. Within the computer vision literature this process is described as that of recovering structure from motion. If some of the information about the internal geometry of the camera is unknown, then the problem is described as that of recovering structure from motion in the uncalibrated case. It is this uncalibrated version of the problem that is the concern of this thesis. Optical flow represents the movement of points across the image plane over time. Previous work in the area of structure from motion has given rise to a so-called differential epipolar equation which describes the relationship between optical flow and the motion and internal parameters of the camera. This equation allows the calibration of a camera undergoing unknown motion and having an unknown, and possibly varying, focal length. Obtaining accurate estimates of the camera motion and internal parameters in the presence of noisy optical flow data is critical to the structure recovery process. We present and compare a variety of methods for estimating the coefficients of the differential epipolar equation. The goal of this process is to derive a tractable total least squares estimator of structure from motion robust to the presence of inaccuracies in the data. Methods are also presented for rectifying optical flow to a particular motion estimate, eliminating outliers from the data, and calculating the relative motion of a camera over an image sequence. The thesis thus explores the application of numerical and statistical techniques for estimation of structure from motion in the uncalibrated case.

Ph. D. thesis, Adelaide University, May 2000

The paper is available as PDF 5.5Mb