Skip to main content
U.S. flag

An official website of the United States government

The influence of benchmark data choices on inferred model performance in the Arctic-Boreal region

Presentation Date
Tuesday, December 12, 2023 at 2:10pm - Tuesday, December 12, 2023 at 6:30pm
Location
MC - Poster Hall A-C - South
Authors

Author

Abstract

Model benchmarking is important for evaluating land surface model performance and guiding model improvements. Emerging model benchmark approaches (e.g., functional relationship benchmarks) show promise in providing insights into both model responses to environmental forcings and simulated ecosystem processes. However, the subjective choices made in the selection and application of observational benchmarks can have a large influence on inferred model skill. Therefore, a systematic assessment of the impact of functional benchmarking choices on inferences of model skill is a needed component of robust model evaluation efforts. The International Land Model Benchmarking (ILAMB) tool is used here to test the influence the choice of observational benchmarks has on inferred model skill across the Arctic-Boreal region, as this region represents a potential key tipping point in Earth’s climate system. We evaluate how inferred skill of TRENDY v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. The analysis uses global data sets integrated into ILAMB as well as new regionally specific observational products from the Arctic-Boreal Vulnerability Experiment (ABoVE). We applied seven Gross Primary Production (GPP) and Ecosystem Respiration (ER) observational datasets to infer model skill and found differences around 40%, with inferred model skill degrading as more regionally specific observational benchmarks are applied. These results suggest a false sense of model skill if only using one data product. We also evaluate modeled relationships between ER and air temperature, GPP, and precipitation. Results indicate that the magnitude and shape of response curves, as well as inferred model skill, are highly impacted by the choice of observational data set and the approach used to construct the functional response benchmark. Collectively, these results highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing inferred model skill.

Funding Program Area(s)