A Multi-Threaded Implementation of Nested Data Parallelism

Abstract

In previous work, we have proposed a multi-threaded execution model for describing nested data-parallelism on distributed multiprocessors in a fashion generic upon the partitioning of data aggregates within the system. This paper demonstrates an approach which uses this abstract model as the basis for a run-time system for a data-parallel functional language. We describe an active message based implementation of the model on the Thinking Machines CM-5 and also consider various issues related to efficient compilation targetting such a platform. Several issues central to the performance of the model are addressed. A hybrid scheme for thread synchronization and data flow is presented which incorporates a highly efficient mode of matching and one permitting the most general forms of interaction. We also describe a generic thread library of data-parallel primitive operations which operate independently of data partitioning. Finally, we detail work undertaken to instrument our execution environment for visualization and tracing.