The University of Adelaide Australia
 Home  >> For  >> Students  >> Scholarships  ||  Courses  ||  Degrees  ||  Staff  ||  Research  || 

Summer Research Scholarships 2006/7



Genetic Algorithms for Allocating Students to Groups

Supervised by: Brad Alexander

Problem

The educational literature shows that group learning can be very beneficial. However, making group learning work depends, in part, on having a fair mix of student abilities within each group. Unfortunately it is infeasible to compute an optimally fair allocation of students to groups in most classes of reasonable size. Instead, a solution is required that generates a near-optimal allocation in reasonable time.

Project Description

This project aims to develop an application to allocate students to groups fairly using genetic algorithms. The project will involve developing a sensible representation of group allocations and appropriate core components of the genetic algorithm. Any usable solution to this problem will have immediate practical applications across a number of domains.

 


Interfaces for Transformation Components

Supervised by: Brad Alexander

Problem

Most modern compilers are only slightly customisable to the needs of the applications they compile. This is partly because compilers are, typically, built from relatively complex transformation components designed to cope with hard-to-transform programming notations. A more flexible architecture for building compilers would lead to benefits in terms of customisability and simplicity.

Project Description

This project aims to construct some fundamental building blocks for transforming programs expressed in point-free form. Point-free programs are easy to transform which, in turn, makes the design of transformation components simpler. The intent of the project is to be able to compose transformation components to form a small part of a compiler. As components are linked together they be checked for compatibility using simple interfaces to describe their input and output. One desired outcome is to achieve some complex transformations by carefully combining simple components.

 


Data Structures for Bioinformatics

Supervised by: Fred Brown and Ute Baumann

Project Description

The aim of this project is to implement a pattern browser to assist in identifying DNA sequences that control gene expression. No knowledge of genetics or biology is required. The identification and interpretation of any biologically significant results is not part of this project.

Background

In recent times significant progress has been made in identifying the DNA sequences that encode the genetic structure of life. One of the most notable projects is the Human Genome project which is attempting to construct the complete sequence for human DNA. A genome is the entire collection of genetical material in a particular organism. The structure of genes, that is those sequences of DNA that encode proteins, varies significantly between organisms. In higher life forms such as man, protein coding regions are separated by long stretches of non-coding sequences, so-called "junk DNA". Changes in the DNA sequence of genes, mutations, can give rise to a genetic disease. A number of inhertied diseases have been found to occur when the number of mutations in a particular gene exceed a certain threshold. At present molecular analysis of DNA sequences can involve searching previously constructured DNA sequences looking for particular genes or gene components. Typically this is performed using linear scanning of DNA sequences which is not the ideal when searching for short sequence motifs. The obvious alternative is to build a data structure such as a suffix tree that supports the searches of interest. A particular feature of a suffix tree is that, if it contains multiple sequences, it can effectively search the multiple sequences in parallel with a single traversal of the tree. A suffix tree that encodes more than one sequence is called a generalised suffix tree. This project will look at constructing a generalised suffix tree that contains all gene promoter regions for a particular organism and attempt to identify particular patterns of interest. A promoter is the region directly adjacent to the protein-coding region of a gene. The presence of short sequence motifs within the promoter controls where and when a gene is allowed to function in the life-time of an organism. The data set of interest consists of 30,000 sequences of about 1,000 characters for each promoter. The analysis of sequences is a biology research project in its own right. Simply enumerating the different motifs that may be found would generate over a terabyte of data so a more resource friendly searching and analysis approach must be developed.

 


Efficient Graph Colouring on Parallel Computers

Supervised by: Paul Coddington

Project Description

Graph colouring is the process of assigning different labels (or ``colours'') to nodes of a graph, so that any two nodes that are connected by an edge will have a different colour.

There are many practical applications of graph colouring. For example, a common application is where the graph represents a mesh of grid points containing variables that are updated in a simulation. Many simulations forbid the updating of neighboring grid points (i.e. connected nodes) at the same time, so when running the simulation on a parallel (i.e. multi-processor) computer, we first need to do a colouring of the graph, and then update nodes of the same colour in parallel.

Finding an optimal colouring (i.e. using the least possible number of colours) of an arbitrary graph is a hard (NP-complete) problem. In practice, it is usually acceptable to use fast heuristic algorithms that provide near-optimal graph colourings. Many different sequential and parallel graph colouring algorithms have been proposed.

This project is a continuation of previous work on developing and comparing parallel graph colouring algorithms using message-passing (using MPI) and shared memory (using Java threads) implementations. The student will be expected to update and modify some existing parallel programs for graph colouring, to develop implementations of some new algorithms, and to benchmark the programs on a variety of different types of graphs on large parallel computers. The student will learn about parallel programming and designing parallel algorithms for multi-processor computers.

 


Static versus adaptive routing for multi-packet communication in a parallel computer system

Supervised by: Cruz Izu

Project Description

Parallel applications exchange traffic messages which follows the 80/80 rule: 80% of the messages on a network are 256 bytes or less and 80% of the data is sent by messages of 8KB or greater. Virtual-cut through networks break long messages into a sequence of packets. Although adaptive routing increases network performance, packets will arrive out of order, so that the network interface must reorder them with the overhead this entails. Using static paths for multi-packet communication guarantees the packets arrive in order but may limit network throughput. This project will use a register-level network simulator to compare the two alternatives under a torus network alike the one used in the IBM BlueGene supercomputer. Basic knowledge of C/C++ required.