This page includes links to a number of recent and/or proposed projects
across a number of my areas of interest.
Bioinformatics Group
I have recently formed a small group to work on Computer Science research
issues of interest to biologists. This has been supported by a PhD scholarship
jointly funded by The University of Adelaide, The Faculty of Engineering,
Mathematical and Computer Sciences, ECMS, and The Australian Centre for Plant
Functional Genomics, ACPFG. The Faculty of ECMS has also provided support via
a summer research scholarship.
Additional support has been received from the Australian
Apple University Consortium in the form of an Apple University Development Fund
grant and a scholarship to attend Apple's World Wide Developers Conference in
San Francisco.
Past and present members of the group include myself, Dr Ute Baumann (ACPFG),
Mr Craig Jones and Mr Alex Cichowski. International collaborators include
Dr Ela Hunt's research group at The University of Glasgow in Scotland.
A brief description of relevant bioinformatics projects are given below.
Students interested in pursuing a PhD in any of these
aspects of bioinformatics are welcome to contact us for further information.
Genome Indexing
This project is investigating techniques that may be able to construct
the very large data structures capable of efficiently indexing and searching
genomic data. Typically analysis of sequenced DNA relies on linear
searching which may not be ideal since matching dissimilar gene sequences
is of particular interest. Unfortunately the scale of
the data involved makes constructing indexes difficult. For example,
the human genome is made up of about 3 billion base-pairs (Gbp) of DNA
distributed over 23 chromosones ranging in size from 50Mbp to 263Mbp.
Even if index structures were limited to single chromosones this may
result in index structures of up to 500 million nodes.
Constructing Suffix Trees Larger than Memory
Genome Alignment
Computational Approaches to the Functional Annotation of Expressed Sequence Tags
Bioinformatics is an interdisciplinary field aimed at facilitating research
in genomics through the development of new computational methodologies. Two
independent and often-cited problems in bioinformatics concern 1) functionally
annotating the exponentially increasing body of sequence records, and 2)
incorporating disparate data sources. This research project aims to address
both of these problems through examining issues surrounding the automated
functional annotation of expressed sequence tags (ESTs). Recently, the Gene
Ontology (GO) has been defined that allows for the classification
of the functional properties of a gene product within three
independent ontologies: biological process, molecular function, and cellular
component. The classifications of sequence records with GO terms has
become a priority, but mainly relies on sequence similarity or
manual curation, with little assessment of accuracy. If GO is
to be truly useful as a way of "unifying" biology, robust methods of
automatically assigning GO terms to biological sequences must be developed.
To address this we propose to develop a diverse range
of methods of predicting Gene Ontology (GO) terms for an
EST. The accuracy of each method is to be determined individually,
and data mining techniques used to determine the best model for
using multiple sources of evidence in combination. The reliance of
these various approaches is to be examined in terms of robustness to
annotation error by introducing additional incorrectly annotated sequences into
databases at known rates. Models will be compared to determine the effect
on the entropy and accuracy of each method. It should also be possible
to determine an estimate of the real rate of annotation error through
regression analysis. The methods developed will be initially applied to
existing barley EST data, and further extended to unrelated species.
Persistent Systems
Persistence and Code Generation
This project is investigating portable techniques that can generate high
performance native code for the subset of persistent programming languages
that do not require the run-time state of active processes to be saved.
Central to this project is the use of conservative garbage collection techniques
to allow full use of C as the compiler target language. This will initially
support the non-persistent programming language S-Algol and subsequently,
the first persistent programming language PS-Algol. The PS-Algol implementation
will be used as a vehicle to experiment with prototype implementations
of a persistent object store supporting the PMOS garbage collector.
Partitioned garbage collectors are designed to incrementally collect partitions
of a large object store or database. With an additional layer of software
global knowledge can be constructed in order to identify and collect cyclic
and cross partition garbage. Selecting the order in which to collect partitions
may be difficult. Cook, Zorn & Wolf have published a number of papers
on this topic which have included an analysis of different selection policies.
However, their results appear to be dependent on the artificial nature
of their simulations. We have attempted to repeat their experiments using
a real programming language system and have found that the selection policy
that performed best in their experiments performed worst in ours. This
project will investigate new partition selection policies that will work
in real systems and in the presence of significant cyclic garbage.
Distribution, Object-Orientation and Persistence
This project is investigating techniques to extend the benefits of the
persistence abstraction to wide area networks where distribution must be
explicit and network failures and delays are a significant programmer concern.
Contributions of this project will include a locality mechanism, a network
wide indirection mechanism and a model for distributed programming over
confederated persistent object stores. Confederated stores exhibit the
property of autonomous control with limited interactions with other stores.
An indirection mechanism is to be provided to identify and address those
services that stores wish to publish. Localities are an essential modelling
mechanism to control pointer leaks and allow programmers to reason about
store interactions that do not permit pointers between stores.
This project is investigating techniques to assist a hierarchical file
system, HFS, in the task of prefetching small quantities of relevant satellite
data stored in a massive tertiary store. A key aspect of this project is
determining how applications can drop appropriate hints to the HFS. The
resultant technology should be easily retrofitted to an existing HFS to
yeild an intelligent HFS.
This project is investigating how a hierarchical structure could be employed
in a persistent object store. A prototype persistent store has been constructed
using the NFS server technology developed for the iHFS project.
Using this basic technology it should be possible to implement a hierarchical
storage structure that can operate independently of any running applications.
This Hierarchical Persistent Object Store, HPOS, would be an ideal framework
for experimenting with train-algorithm garbage collectors. Train algorithm
garbage collectors are designed to incrementally collect garbage, including
cycles, in an object store containing large numbers of long lived, Mature,
objects. Since the HPOS server is independent of the running program it
can perform concurrent incremental collections without interfering with
a running application. All that is necessary is that the server presents
a coherent set of pages to the client application and initiates checkpoints
on request. The resultant technology should support the implementation
of a massive HPOS.
Last Updated: 1 Dec 2005.
|