spacer
School of Computer Science The University of Adelaide Australia
Computer Science Home
About the School
News
Current Students
Future Students
International Students
Business & Industry
Visitors
Staff
Programs
Courses
Research
Facilities
Seminars
Occupational Health & Safety
Staff Only
text zoom: S | M | L

School of Computer Science
Plaza Building
THE UNIVERSITY OF ADELAIDE
SA 5005
AUSTRALIA
Email

Telephone: +61 8 8303 5586
Facsimile: +61 8 8303 4366


You are here: Computer Science > Staff > fred> projects

This page includes links to a number of recent and/or proposed projects across a number of my areas of interest.

Bioinformatics Group

I have recently formed a small group to work on Computer Science research issues of interest to biologists. This has been supported by a PhD scholarship jointly funded by The University of Adelaide, The Faculty of Engineering, Mathematical and Computer Sciences, ECMS, and The Australian Centre for Plant Functional Genomics, ACPFG. The Faculty of ECMS has also provided support via a summer research scholarship. Additional support has been received from the Australian Apple University Consortium in the form of an Apple University Development Fund grant and a scholarship to attend Apple's World Wide Developers Conference in San Francisco. Past and present members of the group include myself, Dr Ute Baumann (ACPFG), Mr Craig Jones and Mr Alex Cichowski. International collaborators include Dr Ela Hunt's research group at The University of Glasgow in Scotland. A brief description of relevant bioinformatics projects are given below. Students interested in pursuing a PhD in any of these aspects of bioinformatics are welcome to contact us for further information.

Genome Indexing

This project is investigating techniques that may be able to construct the very large data structures capable of efficiently indexing and searching genomic data. Typically analysis of sequenced DNA relies on linear searching which may not be ideal since matching dissimilar gene sequences is of particular interest. Unfortunately the scale of the data involved makes constructing indexes difficult. For example, the human genome is made up of about 3 billion base-pairs (Gbp) of DNA distributed over 23 chromosones ranging in size from 50Mbp to 263Mbp. Even if index structures were limited to single chromosones this may result in index structures of up to 500 million nodes.

Constructing Suffix Trees Larger than Memory

Genome Alignment

Computational Approaches to the Functional Annotation of Expressed Sequence Tags

Bioinformatics is an interdisciplinary field aimed at facilitating research in genomics through the development of new computational methodologies. Two independent and often-cited problems in bioinformatics concern 1) functionally annotating the exponentially increasing body of sequence records, and 2) incorporating disparate data sources. This research project aims to address both of these problems through examining issues surrounding the automated functional annotation of expressed sequence tags (ESTs). Recently, the Gene Ontology (GO) has been defined that allows for the classification of the functional properties of a gene product within three independent ontologies: biological process, molecular function, and cellular component. The classifications of sequence records with GO terms has become a priority, but mainly relies on sequence similarity or manual curation, with little assessment of accuracy. If GO is to be truly useful as a way of "unifying" biology, robust methods of automatically assigning GO terms to biological sequences must be developed.

To address this we propose to develop a diverse range of methods of predicting Gene Ontology (GO) terms for an EST. The accuracy of each method is to be determined individually, and data mining techniques used to determine the best model for using multiple sources of evidence in combination. The reliance of these various approaches is to be examined in terms of robustness to annotation error by introducing additional incorrectly annotated sequences into databases at known rates. Models will be compared to determine the effect on the entropy and accuracy of each method. It should also be possible to determine an estimate of the real rate of annotation error through regression analysis. The methods developed will be initially applied to existing barley EST data, and further extended to unrelated species.


Persistent Systems

Persistence and Code Generation

This project is investigating portable techniques that can generate high performance native code for the subset of persistent programming languages that do not require the run-time state of active processes to be saved. Central to this project is the use of conservative garbage collection techniques to allow full use of C as the compiler target language. This will initially support the non-persistent programming language S-Algol and subsequently, the first persistent programming language PS-Algol. The PS-Algol implementation will be used as a vehicle to experiment with prototype implementations of a persistent object store supporting the PMOS garbage collector.

Partition Selection Policies for Garbage Collection of an OODBMS

Partitioned garbage collectors are designed to incrementally collect partitions of a large object store or database. With an additional layer of software global knowledge can be constructed in order to identify and collect cyclic and cross partition garbage. Selecting the order in which to collect partitions may be difficult. Cook, Zorn & Wolf have published a number of papers on this topic which have included an analysis of different selection policies. However, their results appear to be dependent on the artificial nature of their simulations. We have attempted to repeat their experiments using a real programming language system and have found that the selection policy that performed best in their experiments performed worst in ours. This project will investigate new partition selection policies that will work in real systems and in the presence of significant cyclic garbage.

Distribution, Object-Orientation and Persistence

This project is investigating techniques to extend the benefits of the persistence abstraction to wide area networks where distribution must be explicit and network failures and delays are a significant programmer concern. Contributions of this project will include a locality mechanism, a network wide indirection mechanism and a model for distributed programming over confederated persistent object stores. Confederated stores exhibit the property of autonomous control with limited interactions with other stores. An indirection mechanism is to be provided to identify and address those services that stores wish to publish. Localities are an essential modelling mechanism to control pointer leaks and allow programmers to reason about store interactions that do not permit pointers between stores.

Intelligent Hierarchical File Systems

This project is investigating techniques to assist a hierarchical file system, HFS, in the task of prefetching small quantities of relevant satellite data stored in a massive tertiary store. A key aspect of this project is determining how applications can drop appropriate hints to the HFS. The resultant technology should be easily retrofitted to an existing HFS to yeild an intelligent HFS.

Hierarchical Persistent Object Stores

This project is investigating how a hierarchical structure could be employed in a persistent object store. A prototype persistent store has been constructed using the NFS server technology developed for the iHFS project. Using this basic technology it should be possible to implement a hierarchical storage structure that can operate independently of any running applications. This Hierarchical Persistent Object Store, HPOS, would be an ideal framework for experimenting with train-algorithm garbage collectors. Train algorithm garbage collectors are designed to incrementally collect garbage, including cycles, in an object store containing large numbers of long lived, Mature, objects. Since the HPOS server is independent of the running program it can perform concurrent incremental collections without interfering with a running application. All that is necessary is that the server presents a coherent set of pages to the client application and initiates checkpoints on request. The resultant technology should support the implementation of a massive HPOS.


Last Updated: 1 Dec 2005.