Document Type
Conference Paper
Publication Date
2025
DOI
10.1145/3731599.3767439
Publication Title
SC Workshops '25: Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
Pages
775-783
Conference Name
SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 16-21, 2025, St. Louis, Missouri
Abstract
This paper presents an efficient implementation of a linear-solver kernel relevant to FUN3D, a suite of computational fluid dynamics software developed at NASA’s Langley Research Center. The linear solver is optimized for a range of block sizes commonly used in FUN3D. The implementation targets Aurora, the Argonne Leadership Computing Facility’s (ALCF) exascale machine featuring Intel Data Center Max 1550 GPUs. The linear solver’s performance is memory bandwidth-bound due to its low arithmetic intensity. The primary performance challenges stem from variable matrix row lengths and indirect memory access patterns inherent in unstructured-grid applications. Variable block sizes introduce additional complexity through differing levels of intra-block parallelism and the constraint of efficiently utilizing 512-bit vector registers. We propose an optimized implementation using ESIMD APIs that efficiently vectorize memory loads for block-sparse vector computations. We demonstrate that performance on the Intel 1550 GPU is within 10% of its bandwidth benchmark peak. We also compare the performance of the ESIMD kernels on Intel GPUs with CUDA-optimized implementations on NVIDIA GPUs.
Rights
© 2025 Copyright held by the owner/authors.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Original Publication Citation
Wassell, J., Zubair, M., Walden, A., Nastac, G., Nielsen, E., & Ewart, T. (2025). An optimized generalized multi-color point implicit solver for Intel GPUs using OneAPI ESIMD. In SC Workshops '25: Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 775-783). Association for Computing Machinery. https://doi.org/10.1145/3731599.3767439
Repository Citation
Wassell, J., Zubair, M., Walden, A., Nastac, G., Nielsen, E., & Ewart, T. (2025). An optimized generalized multi-color point implicit solver for Intel GPUs using OneAPI ESIMD. In SC Workshops '25: Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 775-783). Association for Computing Machinery. https://doi.org/10.1145/3731599.3767439
ORCID
0009-0001-2180-2278 (Wassell), 0000-0002-5449-1779 (Zubair)