Type of Document Master's Thesis Author Daga, Mayank Author's Email Address firstname.lastname@example.org URN etd-05102011-155739 Title Architecture-Aware Mapping and Optimization on Heterogeneous Computing Systems Degree Master of Science Department Computer Science Advisory Committee
Advisor Name Title Feng, Wu-Chun Committee Chair Cao, Yong Committee Member Onufriev, Alexey V. Committee Member Keywords
- molecular modeling
- performance evaluation
- multicore CPU
Date of Defense 2011-04-27 Availability unrestricted AbstractThe emergence of scientific applications embedded with multiple modes of parallelism has made heterogeneous computing systems indispensable in high performance computing. The popularity of such systems is evident from the fact that three out of the top five fastest supercomputers in the world employ heterogeneous computing, i.e., they use dissimilar computational units. A closer look at the performance of these supercomputers reveals that they achieve only around 50% of their theoretical peak performance. This suggests that applications that were tuned for erstwhile homogeneous computing may not be efficient for today’s heterogeneous computing and hence, novel optimization strategies are required to be exercised. However, optimizing an application for heterogeneous computing systems is extremely challenging, primarily due to the architectural differences in computational units in such systems.
This thesis intends to act as a cookbook for optimizing applications on heterogeneous computing systems that employ graphics processing units (GPUs) as the preferred mode of accelerators. We discuss optimization strategies for multicore CPUs as well as for the two popular GPU platforms, i.e., GPUs from AMD and NVIDIA. Optimization strategies for NVIDIA GPUs have been well studied but when applied on AMD GPUs, they fail to measurably improve performance because of the differences in underlying architecture. To the best of our knowledge, this research is the first to propose optimization strategies for AMD GPUs. Even on NVIDIA GPUs, there exists a lesser known but an extremely severe performance pitfall called partition camping, which can affect application performance by up to seven-fold. To facilitate the detection of this phenomenon, we have developed a performance prediction model that analyzes and characterizes the effect of partition camping in GPU applications. We have used a large-scale, molecular modeling application to validate and verify all the optimization strategies. Our results illustrate that if appropriately optimized, AMD and NVIDIA GPUs can provide 371-fold and 328-fold improvement, respectively, over a hand-tuned, SSE-optimized serial implementation.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Daga_M_T_2011_2.pdf 2.34 Mb 00:10:50 00:05:34 00:04:52 00:02:26 00:00:12
If you have questions or technical problems, please Contact DLA.