A Multi-Threaded Implementation of Nested Data Parallelism
Abstract
In previous work, we have proposed a multi-threaded execution model
for describing nested data-parallelism on distributed multiprocessors
in a fashion generic upon the partitioning of data aggregates
within the system. This paper demonstrates an approach which uses
this abstract model as the basis for a run-time system for a data-parallel
functional language. We describe an active message based implementation of
the model on the Thinking Machines CM-5 and also consider various
issues related to efficient compilation targetting such a platform.
Several issues central to the performance of the model are addressed.
A hybrid scheme for thread synchronization and data flow is
presented which incorporates a highly efficient mode of matching
and one permitting the most general forms of interaction.
We also describe a generic thread library of data-parallel primitive
operations which operate independently of data partitioning. Finally, we
detail work undertaken to instrument our execution environment for
visualization and tracing.