The push towards larger and larger computational platforms has made it possible for climate simulations to resolve climate dynamics across multiple spatial and temporal scales. This direction in climate simulation has created a strong need to develop scalable time-stepping methods capable of accelerating throughput on high performance computing. This work details the recent advances in the implementation of implicit time stepping on a spectral element cube-sphere grid using graphical processing units (GPU) based machines. We demonstrate how solvers in the Trilinos project are interfaced with ACME and GPU kernels that can significantly increase computational speed of the residual calculations in the implicit time stepping method for the shallow water equations on the sphere. We show the optimization gains and data structure reorganization that facilitates the performance improvements.