17:00 - 17:30
On the Performance and Scalability of Cloud Supercomputers: Insights from Eagle and Reindeer
Amirreza Rastegari, Prabhat Ram, Michael F. Ringenburg
Microsoft Corporation, USA
Launch of Eagle, Azure’s hyper-scale supercomputer and the Number 3 on TOP500 list in November 2023, marked a new era where cloud providers are at the forefront of supercomputing. Despite its rapid expansion, public knowledge on the performance and scalability of cloud-based supercomputing is limited, with numerous misconceptions regarding performance implications due to virtualization layer of cloud-based systems. To address these gaps, we present a comparative analysis of two cloud-based supercomputers: Azure Eagle, a hyper-scale system ranked Number 3 on TOP500 in November 2023, and Azure Reindeer, a small-scale system ranked Number 32 on TOP500 in November 2024.
Using a comprehensive performance analysis, we highlight differences in computational efficiency and scaling characteristics of these systems in comparison to their bare-metal on-premises counterparts. We furthermore quantify the overhead from Azure’s virtualization layer, demonstrating its performance implication for real-world HPC workloads to be less than 4%, with typical values ranging from 2–3%.