Skip to main content
U.S. flag

An official website of the United States government

Interoperability Lessons Migrating netCDF API Workflows from POSIX to Zarr Storage Formats

Presentation Date
Wednesday, December 13, 2023 at 8:30am - Wednesday, December 13, 2023 at 12:50pm
Location
MC - Poster Hall A-C - South
Authors

Author

Abstract

The CF (Climate and Forecast) Conventions are a community-developed standard for self-describing Earth system science (and other) datasets in formats that adhere to the netCDF API. The netCDF library can read or write a variety of backend dataset formats (netCDF3, HDF5, DAP), now including Zarr. Zarr is an object storage format favored for its simplicity and scalability by cloud-based storage (CBS) vendors such as Amazon Web Services and Google. Existing netCDF clients can take advantage of Zarr simply by relinking to netCDF's NCZarr library. Here we describe the interoperability issues raised in migrating prototypical netCDF Operator (NCO) workflows from POSIX to Zarr storage.

NCO currently supports I/O on Zarr objects identified via the file:// scheme. Support for CBS via the s3:// scheme is planned for Fall, 2023. NCO commands work as expected, independent of the back-end storage format. Operators can ingest and output netCDF3, netCDF4, and/or Zarr backend file formats. A new script, ncz2psx, combined with standard input/output techniques, is necessary to emulate globbing and wildcard features used for multi-file operators. Compression, including quantization, works transparently, just as with netCDF4 files. The lack of NCZarr support for unlimited/record dimensions (often the temporal dimension) is perhaps the primary barrier to shifting netCDF API workflows to Zarr/CBS. In all other respects, the highly interoperable netCDF API facilitates shifting CF-compliant workflows from file hierarchy (e.g., POSIX) to object-based (i.e., Zarr) storage, both locally and in cloud environments.

Funding Program Area(s)