Improving Convection Trigger Functions in Deep Convective Parameterization Schemes Using Machine Learning
General circulation models (GCMs) often rain too frequently and at reduced intensity. These deficiencies are conspicuously manifested in simulating the diurnal cycle of precipitation. These problems are known to be closely related to convection trigger function, a set of conditions used to determine whether the convection will be activated at a given time in the convective parameterizations in GCMs. Traditional triggers suffer from large uncertainties and are ad hoc because the mechanism of deep convection occurrence is not fully understood. In this study, we use a machine learning (ML) model to construct a novel convection trigger function trained on the long-term variationally constrained ARM forcing dataset (VARANAL) at its Southern Great Plains (SGP) site in the central US, and the Manaus (MAO) site in the Amazon basin.
The ML convective trigger function greatly outperforms four convective available potential energy (CAPE) based triggers at SGP and MAO sites. To obtain explicit knowledge from the black-box ML trigger functions, a series of augmented rules are derived from the ML trigger. That could be used to improve existing traditional CAPE-based triggers.
In this study, we implemented a novel deep convection trigger function using the XGBoost method, which is a state-of-the-art ML classification model. Data used for training the ML trigger functions are from the ARM continuous forcing and evaluation data products at its SGP and MAO sites. Eleven boreal summer seasons (June, July, August) from 1999 to 2009 are used for SGP, and two years of data are used for MAO from 2014 to 2015 that cover the Green Ocean Amazon (GoAmazon2014/15) field campaign. The ML models are evaluated by separately training for the two sites, as well as a joint training that combines the data from both sites. The training dataset contains a number of large-scale predictors, such as surface heat fluxes, surface temperature, and relative humidity, CAPE, lifting condensation level, and convective inhibition—as well as the vertical profiles of temperature, specific humidity, wind shear, and advective tendencies. The performance of the ML trigger is compared with four convective trigger functions commonly used in GCMs: CAPE, undilute CAPE, dilute dynamic CAPE (dCAPE), and undilute dCAPE. The ML trigger substantially outperforms the four CAPE-based triggers in terms of the F1 score metric, widely used to estimate the performance of ML methods. The site-specific ML trigger functions can achieve, respectively, 91% and 93% F1 scores at SGP and MAO. The unified trigger also has a 91% F1 score, with virtually no degradation from the site-specific training, suggesting the potential of a global ML trigger function. The ML trigger alleviates a GCM deficiency regarding the overprediction of convection occurrence, offering a promising improvement to the simulation of the diurnal cycle of precipitation. Furthermore, to overcome the black box issue of the ML methods, insights derived from the ML model are discussed, which may be leveraged to improve traditional CAPE-based triggers.