RSS  

Team Optimization

We optimized the X10 runtime at the transport layer, specifically, x10.util.Team class. X10 provides various type of network transports, e.g., PAMI, DCMF, MPI. The Team class provides collective communication routines for sending and receiving data among all the places, e.g., alltoall, scatter, gather, barrier; however, currently (X10 2.3.1 release) for the MPI transport, all those routines are emulated using point-to-point communication. We modified the collective communication routines in the Team class to use the native implementation of the MPI collective communication routines directly.

We have evaluated the performance of collective communication routines of our X10 Team library against various implementations and varying MPI_THREADS_MULTIPLE option (for enabling MPI multithreading) as described below.

  • Emulation - official X10 Team, which emulates collective communication functions by using point-to-point communication.
  • Native - Our X10 Team, which is on top of MPI collective communication implementation
  • Native (multithread) - Native impementation with MPI_THREADS_MULTIPLE=true
  • at - collective communication implementation implemented by X10 at statement
  • at (multitread) - at Implementation with MPI_THREADS_MULTIPLE=true

Environment

OS:    SUSE Linux Enterprise Server 11 SP1
Machine:    HP Proliant SL390s G7
CPU:    Intel Xeon 2.93 GHz (6 cores) x 2 (Hyperthreading enabled)
Main Memory:    54GB
Network:    QDR InfiniBand x 2 (80Gbps)
MPI:  MVAPICH2 1.9a2 (for Team, Emulation), MVAPICH2 1.6 (for At)
GCC:   gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Build options for X10:    ant -DX10RT_MPI=true -DGCC_SYMBOLS=true -Doptimize=true
Compile options:    x10c++ -cxx-prearg -g -x10rt mpi -O -NO_CHECKS -define -NO_BOUNDS_CHECKS source_files
Environment variables:

  •  X10_NTHREADS=6
  • GC_NPROCS=6
  • MV2_ENABLE_AFFINITY=0
  • MV2_NUM_HCAS=2

Other Configurations: 2 places per one node
For detailed hardware specification, please refer to Tsubame hardware architecture page.

Evaluation Results

gather

 

 

gatherv

 

 

 
 

scatter

 

 

scatterv

 

 

 

bcast

 

 

barrier

 


reduce

 

 

allreduce

 

 

 


alltoall

 

 

alltoallv

 

 

 


allgather

 

 

allgatherv

 

 

 

Download the ScaleGraph

Downloads

Top