Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

15:30 - 16:00
Characterizing the Impact of GPU Power Management on an Exascale System

Mariana Costa, Philippe O. A. Navaux, Arthur Lorenzon
Universidade Federal do Rio Grande do Sul, Brazil

Antigoni Georgiadou, James B. White III, Woong Shin, Bronson Messer
Oak Ridge National Laboratory, USA

Bruno Villasenor Alvarez, Jordà Polo
AMD, USA

As GPU-accelerated high-performance computing (HPC) systems approach exascale performance, controlling energy consumption without compromising throughput is essential. Architectures such as the AMD MI250X-based Frontier supercomputer provide runtime mechanisms like frequency and power capping, enabling energy tuning without modifying application code. Although both target energy reduction, they operate via distinct hardware control paths and influence workloads differently. We present a comprehensive evaluation of these strategies on a leadership-class system using diverse HPC proxy applications representative of production workloads. Our study analyzes performance–energy trade-offs across multiple capping levels, node counts (1 and 32), and application profiles. Results show that frequency capping generally achieves higher energy efficiency and scalability, with gains of up to 13.2% without performance loss, while power capping is more effective for single-node runs or bursty GPU utilization. We also provide practical guidelines to help system administrators and users balance energy efficiency and performance in large-scale scientific workloads.

16th International Workshop on

Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

held in conjunction with SC25: The International Conference for High Performance Computing, Networking, Storage and Analysis