TY - GEN
T1 - Parametrizing multicore architectures for multiple sequence alignment
AU - Isaza, Sebastian
AU - Sanchez, Friman
AU - Cabarcas, Felipe
AU - Ramirez, Alex
AU - Gaydadjiev, Georgi
PY - 2011
Y1 - 2011
N2 - Sequence alignment is one of the fundamental tasks in bioinformatics. Due to the exponential growth of biological data and the computational complexity of the algorithms used, high performance computing systems are required. Although multicore architectures have the potential of exploiting the task-level parallelism found in these workloads, efficiently harnessing systems with hundreds of cores requires deep understanding of the applications and the architecture. When incorporating large numbers of cores, performance scalability will likely saturate shared hardware resources like buses and memories. In this paper we evaluate the performance impact of various configurations of an accelerator-based multicore architecture with the aim of revealing and quantifying the bottlenecks. Then, we compare against a multicore using general-purpose processors and discuss the performance gap. Our target application is ClustalW, one of the most popular programs for Multiple Sequence Alignment. Different input data sets are characterized and we show how they influence performance. Simulation results show that due to the high computation-to-communication ratio and the transfer of data in large chunks, memory latency is well tolerated. However, bandwidth is critical to achieving maximum performance. Using a 32KB cache configuration with 4 banks can capture most of the memory traffic and therefore avoid expensive off-chip transactions. On the other hand, using a hardware queue for the tasks synchronization allows us to handle a large number of cores. Finally, we show that using a simple load balancing strategy, we can increase performance of general-purpose cores by 28%.
AB - Sequence alignment is one of the fundamental tasks in bioinformatics. Due to the exponential growth of biological data and the computational complexity of the algorithms used, high performance computing systems are required. Although multicore architectures have the potential of exploiting the task-level parallelism found in these workloads, efficiently harnessing systems with hundreds of cores requires deep understanding of the applications and the architecture. When incorporating large numbers of cores, performance scalability will likely saturate shared hardware resources like buses and memories. In this paper we evaluate the performance impact of various configurations of an accelerator-based multicore architecture with the aim of revealing and quantifying the bottlenecks. Then, we compare against a multicore using general-purpose processors and discuss the performance gap. Our target application is ClustalW, one of the most popular programs for Multiple Sequence Alignment. Different input data sets are characterized and we show how they influence performance. Simulation results show that due to the high computation-to-communication ratio and the transfer of data in large chunks, memory latency is well tolerated. However, bandwidth is critical to achieving maximum performance. Using a 32KB cache configuration with 4 banks can capture most of the memory traffic and therefore avoid expensive off-chip transactions. On the other hand, using a hardware queue for the tasks synchronization allows us to handle a large number of cores. Finally, we show that using a simple load balancing strategy, we can increase performance of general-purpose cores by 28%.
UR - http://www.scopus.com/inward/record.url?scp=80052544415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80052544415&partnerID=8YFLogxK
U2 - 10.1145/2016604.2016642
DO - 10.1145/2016604.2016642
M3 - Conference contribution
AN - SCOPUS:80052544415
SN - 9781450306980
T3 - Proceedings of the 8th ACM International Conference on Computing Frontiers, CF'11
BT - Proceedings of the 8th ACM International Conference on Computing Frontiers, CF'11
T2 - 8th ACM International Conference on Computing Frontiers, CF'11
Y2 - 3 May 2011 through 5 May 2011
ER -