A Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel

TitleA Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel
Publication TypeJournal Article
Year of Publication2015
JournalA Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel
Volume9
Pages1-6
Date Published06/2015
Abstract

The porting of a key kernel in the tracer advection routines of the Community Atmosphere Model – Spectral Element (CAM-SE) to use Graphics Processing Units (GPUs) using OpenACC is considered in comparison to an existing CUDA FORTRAN port. The development of the OpenACC kernel for GPUs was substantially simpler than that of the CUDA port. Also, OpenACC performance was about 1.5× slower than the optimized CUDA version. Particular focus is given to compiler maturity regarding OpenACC implementation for modern FORTRAN, and it is found that the Cray implementation is currently more mature than the PGI implementation. Still, for the case that ran successfully on PGI, the PGI OpenACC runtime was slightly faster than Cray. The results show encouraging performance for OpenACC implementation compared to CUDA while also exposing some issues that may be necessary before the implementations are suitable for porting all of CAM-SE. Most notable are that GPU shared memory should be used by future OpenACC implementations and that derived type support should be expanded.

URLhttp://www.sciencedirect.com/science/article/pii/S1877750315000605
Funding Program: 
Journal: A Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel
Volume: 9

The porting of a key kernel in the tracer advection routines of the Community Atmosphere Model – Spectral Element (CAM-SE) to use Graphics Processing Units (GPUs) using OpenACC is considered in comparison to an existing CUDA FORTRAN port. The development of the OpenACC kernel for GPUs was substantially simpler than that of the CUDA port. Also, OpenACC performance was about 1.5× slower than the optimized CUDA version. Particular focus is given to compiler maturity regarding OpenACC implementation for modern FORTRAN, and it is found that the Cray implementation is currently more mature than the PGI implementation. Still, for the case that ran successfully on PGI, the PGI OpenACC runtime was slightly faster than Cray. The results show encouraging performance for OpenACC implementation compared to CUDA while also exposing some issues that may be necessary before the implementations are suitable for porting all of CAM-SE. Most notable are that GPU shared memory should be used by future OpenACC implementations and that derived type support should be expanded.

Year of Publication: 2015
Citation: "A Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel." A Case Study of CUDA FORTRAN and OpenACC for an Atmospheric Climate Kernel. 2015;9:1-6.