Using historical simulations of the Coupled Model Intercomparison Project-5 (CMIP5) and multiple observationally-based datasets, we employ skill metrics to analyze the fidelity of the simulated Northern Annular Mode, the North Atlantic Oscillation, the Pacific North America pattern, the Southern Annular Mode, the Pacific Decadal Oscillation, the North Pacific Oscillation, and the North Pacific Gyre Oscillation. We assess the benefits of a unified approach to evaluate these modes of variability, which we call the common basis function (CBF) approach, based on projecting model anomalies onto observed empirical orthogonal functions (EOFs). The CBF approach circumvents issues with conventional EOF analysis, eliminating, for example, corrections of arbitrarily assigned, but inconsistent, signs of the EOF’s/PC’s being compared. It also avoids the problem that sometimes the first observed EOF is more similar to a higher order model EOF, particularly if the simulated EOFs are not well separated. Compared to conventional EOF analysis of models, the CBF approach indicates that models compare significantly better with observations in terms of pattern correlation and root-mean-squared-error (RMSE) than heretofore suggested. In many cases, models are doing a credible job at capturing the observationally-based estimates of patterns; however, errors in simulated amplitudes can be large and more egregious than pattern errors. In the context of the broad distribution of errors in the CMIP5 ensemble, sensitivity tests demonstrate that our results are relatively insensitive to methodological considerations (CBF vs. conventional approach), observational uncertainties in pattern (as determined by using multiple datasets), and internal variability (when multiple realizations from the same model are compared). The skill metrics proposed in this study can provide a useful summary of the ability of models to reproduce the observed EOF patterns and amplitudes. Additionally, the skill metrics can be used as a tool to objectively highlight where potential model improvements might be made. We advocate more systematic and objective testing of simulated extratropical variability, especially during the non-dominant seasons of each mode, when many models are performing relatively poorly.