Anatomy of a Climate Science-centric Unified Workflow

Wednesday, May 14, 2014 - 07:00
Add to Calendar

The CASCADE climate group's charter has two major thrusts: detection & attribution of extreme events and climate model's ability to simulate extreme events, both of which are grounded heavily using statistical methodologies and plan to utilize scalable HPC-aware software infrastructure to handle current and upcoming climate science challenges through a common unified workflow. Addressing emerging requirements for the analysis of extreme events is a growing challenge in the climate science community. The scale of data currently at terabytes will only grow larger, while processing three to six hours of intervals will become a more frequent occurrence, and focus increasingly on high resolution datasets (i.e, 1/4th to 1/8th degree and beyond). The high resolution and high frequency analysis will be several orders of magnitude greater, resulting in critical need for effective utilization of HPC resources and a software infrastructure and workflow designed to take advantage of these resources. The CASCADE software infrastructure team is tasked with providing a streamlined interoperable infrastructure and expertise in scaling algorithms, simplifying the exercise and coordination of large ensemble runs for analysis and computation of uncertainties, and leading deployment efforts with the focus on modular, extensible components and an emphasis on usability. This poster highlights three characteristic instances of the CASCADE unified workflow: handling performance challenges, scalability challenges for model fidelity, and providing scalable statistical analysis routines. These components provide the building blocks of use cases from the members of each of the other CASCADE teams. The performance pipeline highlights effort required to speed up analysis routines exercised hundreds to thousands of times, the model-fidelity pipeline highlights effort required to provide scalable ensemble execution, and finally the statistical analysis pipeline highlights the efforts to parallelize spatio-temporal statistical routines such as extreme value analysis which is then utilized within the model fidelity and detection & attribution. In addition to highlighting three instances, the poster will present how resources and efforts are shared within a more unified construction of work that manages effective utilization of resources, data movement, scheduling, and management.