Earth System Models (ESMs) included in the Coupled-Model-Intercomparison Project (CMIP) are considered sophisticated in their ability to project the impacts of future climate on important hydroclimatic variables and Earth system processes. However, little is known about their performance against observations across standard hydrological metrics, which hampers our ability to understand their actual utility for simulations under a changing climate, particularly for high-latitude environments due to Arctic amplification. We assess the performance of simulated Arctic runoff that has been routed to river channels using a physically based river routing model, Model for Scale Adaptive River Transport (MOSART), from eleven CMIP6 models. Models were evaluated using metrics to assess model skill for representing total volume, variability, seasonality, extreme events, and overall distributions, which are evaluated over multiple timescales (e.g. daily, monthly, and annual) across the Pan Arctic. Data are compared to observations from medium-to-large river basins (>10,000 km2, n = 611 gages), as the coarse resolution of ESMs prohibits comparison for smaller river basins. Our results indicate that while one-to-one comparisons between ESMs and observations usually result in poor performance, particularly at the daily scale, the ESMs demonstrate some skill in prediction at coarser timesteps or when techniques such as statistical averaging and best-fit model selection were used. We are also able to highlight some spatial structure in the performance of the models for the different metrics. This work is anticipated to be highly useful for understanding the most appropriate applications for ESM streamflow when attempting to understand how Arctic hydrology will change under a future climate.