How Can We Make Model Tuning Less Laborious and More Transparent?

Monday, June 5, 2017 - 15:30
Add to Calendar

Climate model calibration (“tuning”) is a complex and laborious task, requiring many thousands of hours of effort by experienced model developers for each new version of a component or coupled model.  Two key obstacles to efficient model fidelity analysis are: (1) comparisons of model fidelity across multiple simulations are time-consuming, and (2) there is limited consensus on the relative importance of different model variables to evaluating overall fidelity. We address these challenges through a unique, close collaboration among climate modellers, visual analytics researchers, and statisticians, as part of an LDRD project at Pacific Northwest National Laboratory. 

To address the first challenge, we have empirically tested four possible visual designs for the task of comparing model fidelity across many (>10) models and many (>10) variables (Dasgupta, et al., ACM SIGCHI, 2017). This study demonstrated that a visualization called the “slope plot” was preferred and perceived as more efficient for these tasks by climate model developers and users, compared with alternative visual displays (heatmaps, bar charts, and Taylor plots).  This preference was exhibited by both highly experienced and less experienced scientists, and objective task completion accuracy was as good or better with the “slope plot” compared to other plots.

As a first step towards addressing the second challenge, we conducted a broad survey of the climate modelling community (with ca. 100 participants from four continents) to elicit expert judgments about the importance different variables in evaluating model fidelity.  We present community-mean importance ratings for concise sets of variables. We quantify how community-mean importance ratings vary in response to the driving science question, quantify the degree of consensus on the importance of different model variables, and show that the distribution of responses does not differ significantly between less-experienced (median 10 y experience) and more-experienced (median 20.5 y experience) climate modelers. While we found a general lack of consensus with respect to importance ratings of particular variables, our results also suggest that once a certain level of expertise is acquired, additional experience in model evaluation currently may not necessarily lead to differences in assessments of model fidelity by individual experts.

In follow-on work, we are developing a visual analytic tool which will allow scientists to interactively adjust weights assigned to different model metrics and see how these impact model rankings, and to explore how model parameters impact model fidelity in a perturbed physics ensemble.