Document Type

Conference Paper

Publication Date





11 pp.

Conference Name

IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)


Many scientific and engineering applications require repeated calculations of derivatives of output functions with respect to input parameters. Automatic Differentiation (AD) is a method that automates derivative calculations and can significantly speed up code development. In Computational Fluid Dynamics (CFD), derivatives of flux functions with respect to state variables (Jacobian) are needed for efficient solutions of the nonlinear governing equations. AD of flux functions on graphics processing units (GPUs) is challenging as flux computations involve many intermediate variables that create high register pressure and require significant memory traffic because of the need to store the derivatives. This paper presents a forwardmode AD method based on multivariate dual numbers that addresses these challenges and simultaneously reduces the floatingpoint operation count. The dimension of the multivariate dual numbers is optimized for performance. The flux computations are restructured to minimize the number of temporary variables and reduce register pressure. For effective utilization of memory bandwidth, shared memory is used to store the local flux Jacobian. This AD implementation is compared with several other Jacobian implementations on an NVIDIA V100 GPU (V100). For three-dimensional perfect-gas compressible-flow equations implemented in a practical CFD code, the AD implementation of a flux Jacobian based on multivariate dual numbers of dimension 5 outperforms all other GPU AD implementations on V100. Its performance is comparable with the optimized handdifferentiated version. The implementation achieves 75% of the peak floating-point throughput and 61% of the peak global device memory bandwidth usage.


This is the accepted version of an IEEE-copyrighted work.


© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Original Publication Citation

Zubair, M., Ranjan, D., Walden, A., Nastac, G., Nielsen, E., Diskin, B., Paterno, M., Jung, S., & Davis, J. H. (2023). Efficient GPU implementation of automatic differentiation for computational fluid dynamics. Paper presented at the IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC), 11 pp. doi: 10.1109/HiPC58850.2023.00055


0000-0002-5449-1779 (Zubair)