Schedule

09:00 PMBS Introduction and Welcome – B303, The Georgia World Congress Center, Atlanta

Steven Wright
University of York, UK


Session 1: Large Language Models

Chair: Zhengji Zhao, National Energy Scientific Computing Centre, Lawrence Berkeley National Laboratory, USA

09:00 - 09:30
LLM-Inference-Bench: Inference Benchmarking of Large Language Models on AI Accelerators

Krishna Teja Chitty-Venkata, Siddhisanket Raskar, Bharat Kale, Farah Ferdaus, Aditya Tanikanti, Ken Raffenetti, Valerie Taylor, Murali Emani, Venkatram Vishwanath
Argonne National Laboratory, USA


09:30 - 10:00
Comprehensive Performance Modeling and System Design Insights for Foundation Models

Shashank Subramanian, Ermal Rrapaj, Peter Harrington, Steven Farrell, Brian Austin, Samuel Williams, Nicholas Wright, Wahid Bhimji
Lawrence Berkeley National Laboratory, USA

Smeet Chheda
Stony Brook University, USA


10:00 - 10:30 Break


Session 2: Short Papers

Chair: Sascha Hunold, TU Wien, Austria

10:30 - 10:50 Best Short Paper
System-Wide Roofline Profiling - a Case Study on NERSC’s Perlmutter Supercomputer

Brian Austin, Dhruva Kulkarni, Samuel Williams, Nicholas Wright
Lawrence Berkeley National Laboratory, USA


10:50 - 11:10
Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Zen 4

Jan Laukemann, Georg Hager, Gerhard Wellein
University of Erlangen-Nuremberg, Germany


11:10 - 11:30
Benchmarking the Evolution of Performance and Energy Efficiency Across Recent Generations of Intel Xeon Processors

István Z. Reguly, Balázs Drávai
Pázmány Péter Catholic University, Hungary

Session 3: Accelerators

Chair: Steven Wright, University of York, UK

11:30 - 12:00
Performance Analysis of Runtime Handling of Zero-Copy for OpenMP Programs on MI300A APUs

Carlo Bertolli, Thorsten Blass, Jan-Patrick Lehr, Doru Bercea, Dhruva Chakrabarti, Lynd Stringer, Nicole Aschenbrenner, Lawrence Meadows, Ron Liberman
AMD Research, USA


12:00 - 12:30 Best Paper
Ponte Vecchio Across the Atlantic: Single-Node Benchmarking of Two Intel GPU Systems

Thomas Applencourt, Servesh Muralidharan, Colleen Bertoni, Jae-Hyuk Kwack, Ye Luo, Esteban Rangel, John Tramm, Yasaman Ghadar
Argonne National Laboratory, USA

Aditya Sadawarte, Tom Deakin
University of Bristol, UK

Arjen Tamerus, Chris Edsall
University of Cambridge, UK


12:30 - 14:00 Lunch


Session 4: ARM Architectures

Chair: Lilia Zaourar, CEA, France

14:00 - 14:30
Hello SME!

Stefan Remke, Alexander Breuer
Friedrich Schiller University Jena, Germany


14:30 - 15:00
AI-Assisted Design-Space Analysis of High-Performance Arm Processors

Joseph Moore, Tom Deakin, Simon McIntosh-Smith
University of Bristol, UK


15:00 - 15:30 Break


Session 5: Performance of BLAS

Chair: István Reguly, Pázmány Péter Catholic University, Hungary

15:30 - 16:00
Impact of Varying BLAS Precision on DCMESH

Nariman Piroozan, S. John Pennycook, Peter Caday, Nalini Kumar
Intel Corporation, USA

Taufeq Razakh, Aiichiro Nakano
University of Southern California, USA


16:00 - 16:30
Assessing the GPU Offload Threshold of GEMM and GEMV Kernels on Modern Heterogeneous HPC Systems

Finn Wilkinson, Alex Cockrean, Wei-Chen Lin, Simon McIntosh-Smith, Tom Deakin
University of Bristol, UK

Session 6: System Modeling

Chair: Simon Hammond, National Nuclear Security Administration, USA

16:30 - 17:00
Understanding VASP Power Profiles on NVIDIA A100 GPUs

Zhengji Zhao, Brian Austin, Ermal Rrapaj, Nicholas Wright
Lawrence Berkeley National Laboratory, USA


17:00 - 17:30
Workload-adaptive Scheduling for Efficient Use of Parallel File System in High-performance Computing Clusters

Alexander Goponenko, Damian Dechev
University of Central Florida, USA

Benjamin Allan, James Brandt
Sandia National Laboratories, USA


17:30 PMBS End