Document Type

Conference Paper

Publication Date

2025

DOI

10.1145/3731599.3767439

Publication Title

SC Workshops '25: Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis

Pages

775-783

Conference Name

SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 16-21, 2025, St. Louis, Missouri

Abstract

This paper presents an efficient implementation of a linear-solver kernel relevant to FUN3D, a suite of computational fluid dynamics software developed at NASA’s Langley Research Center. The linear solver is optimized for a range of block sizes commonly used in FUN3D. The implementation targets Aurora, the Argonne Leadership Computing Facility’s (ALCF) exascale machine featuring Intel Data Center Max 1550 GPUs. The linear solver’s performance is memory bandwidth-bound due to its low arithmetic intensity. The primary performance challenges stem from variable matrix row lengths and indirect memory access patterns inherent in unstructured-grid applications. Variable block sizes introduce additional complexity through differing levels of intra-block parallelism and the constraint of efficiently utilizing 512-bit vector registers. We propose an optimized implementation using ESIMD APIs that efficiently vectorize memory loads for block-sparse vector computations. We demonstrate that performance on the Intel 1550 GPU is within 10% of its bandwidth benchmark peak. We also compare the performance of the ESIMD kernels on Intel GPUs with CUDA-optimized implementations on NVIDIA GPUs.

Rights

© 2025 Copyright held by the owner/authors.

This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Original Publication Citation

Wassell, J., Zubair, M., Walden, A., Nastac, G., Nielsen, E., & Ewart, T. (2025). An optimized generalized multi-color point implicit solver for Intel GPUs using OneAPI ESIMD. In SC Workshops '25: Proceedings of the SC '25 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 775-783). Association for Computing Machinery. https://doi.org/10.1145/3731599.3767439

ORCID

0009-0001-2180-2278 (Wassell), 0000-0002-5449-1779 (Zubair)

Share

COinS