Bundling Runs to Improve Throughput on Mira

Wednesday, May 6, 2015 - 07:00
Add to Calendar

One of the most important tasks for the ACME project is to perform big production simulations. In order to compute these runs within a reasonable time frame, it is important to take full advantage of the DOE leadership computing resource. A large allocation on the Mira machine at Argonne Leadership Computing Facility (ALCF) is assigned to the ACME project but even at global 0.25o resolution ACME simulations are too small to meet the minimum processor count for priority on Mira, resulting in low throughput. We solved this problem by bundling many (4 to 32) small jobs as a large one to exceed the threshold for the higher priority queue. This method has been successfully implemented for perturbed parameter simulations using the regional refined model and for atmosphere-only ("AMIP") climate simulations at 0.25o resolution. It will also be used for upcoming fully coupled ensemble simulations.