The Department of Energy (DOE) Atmospheric Radiation Measurement (ARM) program has established multiple ground-based observation networks over the globe during the past few decades, which continuously provides comprehensive measurements of the atmospheric state, radiation, clouds, precipitation, and aerosols. This study introduces the recently upgraded python-based ARM data-oriented metrics and diagnostics package version 3 (ARM-DIAGS-V3), which aims to facilitate the use of long-term, high-frequency measurements from ARM in evaluating the climate model simulations. The diagnostics package leverages those comprehensive datasets at multiple ARM sites including the Southern Great Plain (SGP), the Eastern North Atlantic (ENA), the North Slope Alaska (NSA), the Tropical West Pacific (TWP), and the mobile facility at Amazonas, Brazil (MAO). This setup of observation networks across various climatic regions, as well as different underlying surfaces, provides an invaluable opportunity to evaluate the climate model performances in different climate regimes. In this study, we will evaluate the performance of selected global climate models (GCMs) that participated in the Coupled Model Intercomparison Project (CMIP5 & CMIP6) via the ARM-DIAGS-V3. Particularly, we will focus on the model representations of surface temperature, precipitation, aerosols, clouds, and radiation climatology, as well as any potential improvement from the CMIP5 to CMIP6 compared with the observations. The process-oriented diagnostics will be applied to assess the representation of cloud-radiation kernels, in turn, cloud-radiation feedbacks in the individual model. The results could be served as a benchmark for further studies by individual modelling groups and the diagnostics package applied here can serve as a feasible toolkit for model comparisons with ARM observations.