Probabilistic Machine Learning for Analyzing Massive Climate Data Sets on GPU-Enabled HPC
Most machine learning (ML) methods do not provide estimates of uncertainty, and those that do cannot analyze very large data sets due to computational limitations. We develop new machine learning methods that naturally include uncertainty quantification and can be applied to very large Earth science data sets. Our method takes advantage of data sets' natural sparsity. The underlying mathematics allow us to discover rather than impose sparse structure in the data. We develop a new computing technique that leverages graphics processing unit resources on high-performance computing systems.
We conducted a record-breaking calculation on a climate science data set of more than five million points. Each part of the calculation took 0.6 seconds on Perlmutter (DOE’s newest supercomputer), down from 15 seconds on Cori, the previous system at NERSC. This proof-of-concept calculation demonstrates that we now have access to an infinitely scalable, probabilistic ML method for analyzing modern-day data volumes on the newest DOE HPC systems.
A Gaussian Process (GP) is a mathematical framework for conducting probabilistic machine learning with uncertainty quantification across many applications in science and engineering. Unfortunately, the use of exact GPs is prohibitively expensive for large datasets due to their unfavorable numerical complexity in computation and storage. All existing methods addressing this issue utilize some form of approximation, which can lead to inaccuracies in predictions and often limit the user's flexibility in designing expressive kernels. Instead of inducing sparsity via data-point geometry and structure, we propose to take advantage of naturally-occurring sparsity by allowing the kernel to discover – instead of inducing – sparse structure. The core concept of exact, and at the same time, sparse GPs relies on kernel definitions that provide enough flexibility to learn and encode not only non-zero but also zero covariances. This principle of ultra-flexible, compactly supported, and non-stationary kernels, combined with HPC and constrained optimization, lets us scale exact GPs well beyond 5 million data points.