01 July 2019

A Flexible, Robust, High-Performance Data System for the GCAM Model

New data system feeds a powerful model while serving as a research tool in its own right.

Science

Increasingly complex human-earth system models have increasingly complex data requirements, to the point that stand-alone software systems are required to track and assemble these data inputs to the main model. A new data system known as “gcamdata” was developed for the Global Change Assessment Model (GCAM) to provide a robust, reproducible, and transparent system to track and prepare hundreds of model inputs and enable researchers to easily construct alternative scenarios for research.

Impact

While this new data system was made specifically for the GCAM model, many of its components and approaches to processing are broadly applicable to, and reusable by, other complex model/data systems aiming to improve transparency, reproducibility, and flexibility. As open-source software with flexible architecture, gcamdata introduces a new way to handle and prepare data to feed complex global models. This saves researchers time and effort, improves traceability and reproducibility, and enables exploratory “what-if” analyses using GCAM.

Summary

Modern, integrated human-Earth system models are complex and require correspondingly detailed input datasets. These models are sophisticated attempts to quantify relationships between environmental, social and economic factors. This new data system software offers clear and easy-to-use application to a variety of modeling scenarios with documentation and error checking. Data objects in gcamdata are required to have descriptive metadata attached, which allows researchers to track data provenance throughout the system. As a result, a full system-wide data map can be constructed with particular data dependencies, upstream and/or downstream, traced through the system. Any object and its dependencies in the system can be explored in detail as all data objects flowing between the various parts of the system include extensive metadata (including title, units, source, and comments). Many parts of the gcamdata package can be repurposed for any data system that involves multiple, potentially interacting, data processing steps, improving the reproducibility and transparency of science in many modeling domains. 

Contact
Mohamad Hejazi
Pacific Northwest National Laboratory
Publications
Bond-Lamberty, B, K Dorheim, R Cui, R Horowitz, A Snyder, K Calvin, L Feng, et al.  2019.  "gcamdata: An R Package for Preparation, Synthesis, and Tracking of Input Data for the GCAM Integrated Human-Earth Systems Model."  Journal of Open Research Software 7(1): 6.  https://doi.org/10.5334/jors.232.