Schedule
09:00 PMBS Introduction and Welcome – B303, The Georgia World Congress Center, Atlanta
Steven Wright
University of York, UK
Session 1: Large Language Models
Chair: Zhengji Zhao, National Energy Scientific Computing Centre, Lawrence Berkeley National Laboratory, USA
09:00 - 09:30
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators
Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath
Argonne National Laboratory, USA
09:30 - 10:00
Comprehensive Performance Modeling and System Design Insights for Foundation Models
Shashank Subramanian, Ermal Rrapaj, Peter Harrington, Steven Farrell, Brian Austin, Samuel Williams, Nicholas Wright, Wahid Bhimji
Lawrence Berkeley National Laboratory, USA
Smeet Chheda
Stony Brook University, USA
10:00 - 10:30 Break
Session 2: Short Papers
Chair: Sascha Hunold, TU Wien, Austria
10:30 - 10:50 Best Short Paper
System-Wide Roofline Profiling - a Case Study on NERSC’s Perlmutter Supercomputer
Brian Austin, Dhruva Kulkarni, Samuel Williams, Nicholas Wright
Lawrence Berkeley National Laboratory, USA
10:50 - 11:10
Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Zen 4
Jan Laukemann, Georg Hager, Gerhard Wellein
University of Erlangen-Nuremberg, Germany
11:10 - 11:30
Benchmarking the Evolution of Performance and Energy Efficiency Across Recent Generations of Intel Xeon Processors
István Z. Reguly, Balázs Drávai
Pázmány Péter Catholic University, Hungary
Session 3: Accelerators
Chair: Steven Wright, University of York, UK
11:30 - 12:00
Performance Analysis of Runtime Handling of Zero-Copy for OpenMP Programs on MI300A APUs
Carlo Bertolli, Thorsten Blass, Jan-Patrick Lehr, Doru Bercea, Dhruva Chakrabarti, Lynd Stringer, Nicole Aschenbrenner, Lawrence Meadows, Ron Liberman
AMD Research, USA
12:00 - 12:30 Best Paper
Ponte Vecchio Across the Atlantic: Single-Node Benchmarking of Two Intel GPU Systems
Thomas Applencourt, Servesh Muralidharan, Colleen Bertoni, Jae-Hyuk Kwack, Ye Luo, Esteban Rangel, John Tramm, Yasaman Ghadar
Argonne National Laboratory, USA
Aditya Sadawarte, Tom Deakin
University of Bristol, UK
Arjen Tamerus, Chris Edsall
University of Cambridge, UK
12:30 - 14:00 Lunch
Session 4: ARM Architectures
Chair: Lilia Zaourar, CEA, France
14:00 - 14:30
Hello SME!
Stefan Remke, Alexander Breuer
Friedrich Schiller University Jena, Germany
14:30 - 15:00
AI-Assisted Design-Space Analysis of High-Performance Arm Processors
Joseph Moore, Tom Deakin, Simon McIntosh-Smith
University of Bristol, UK
15:00 - 15:30 Break
Session 5: Performance of BLAS
Chair: István Reguly, Pázmány Péter Catholic University, Hungary
15:30 - 16:00
Impact of Varying BLAS Precision on DCMESH
Nariman Piroozan, S. John Pennycook, Peter Caday, Nalini Kumar
Intel Corporation, USA
Taufeq Razakh, Aiichiro Nakano
University of Southern California, USA
16:00 - 16:30
Assessing the GPU Offload Threshold of GEMM and GEMV Kernels on Modern Heterogeneous HPC Systems
Finn Wilkinson, Alex Cockrean, Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin
University of Bristol, UK
Session 6: System Modeling
Chair: Simon Hammond, National Nuclear Security Administration, USA
16:30 - 17:00
Understanding VASP Power Profiles on NVIDIA A100 GPUs
Zhengji Zhao, Brian Austin, Ermal Rrapaj, Nicholas Wright
Lawrence Berkeley National Laboratory, USA
17:00 - 17:30
Workload-adaptive Scheduling for Efficient Use of Parallel File System in High-performance Computing Clusters
Alexander Goponenko, Damian Dechev
University of Central Florida, USA
Benjamin Allan, James Brandt
Sandia National Laboratories, USA
17:30 PMBS End