In this study, we introduce a testbed for evaluating and comparing climate modeling systems at cloud resolving scales using hindcasts of the June 2012 North American derecho. The testbed is applied to two models: the regionally-refined Simple Cloud-Resolving E3SM Atmosphere Model (SCREAM) at horizontal resolutions ranging from 6.5 to 1.625 km and the Weather Research and Forecasting (WRF) model with 4 km grid spacing. While the regionally refined model (RRM) has been established and validated with other models over many regions of interest, previous studies normally were performed with the highest resolutions around 0.25° and timescales of interest longer than seasonal scales. Tests address RRM grid spacing, differences between hydrostatic and nonhydrostatic dynamics cores, low-resolution and high-resolution model configurations, initialization time, and data source for the initial conditions. The evaluation is performed by focusing on four relevant fields: precipitation, composite radar reflectivity, outgoing longwave radiation, and 10-m wind speed. Metrics are developed that separately address errors in timing and time-averaged pattern.
The simulation results are highly sensitive to the initial conditions, initialization time, and model configurations, with initial conditions from the Rapid Refresh (RAP) producing the best simulation. Significant improvement is identified in the SCREAM simulations as horizontal grid spacing is refined. While a propagation delay of approximately 2 hours is found in both models, SCREAM at 1.625 km simulates the observed bow echo structure of the derecho well and predicts strong surface gusts that exceed 30 m/s. In comparison, WRF hardly produces surface wind over 25 m/s, and the derecho wind gust in WRF is 42-46% lower than in SCREAM. Moreover, WRF has a lower bias in simulating cold clouds but overestimates the precipitation intensity. Both models do well in reproducing the observed outgoing longwave radiation spatial patterns (Pearson correlation > 0.88), although they simulate larger areas of composite radar reflectivity > 40 dBZ by up to 4 times and underestimate the precipitating area by 70% in WRF and 47% in SCREAM, compared to observations.