Numerical building models are typically forced with weather data from a limited number of “representative cities” or weather stations representing different climate regions. The use of representative weather stations reduces computational costs, but often fails to capture spatial heterogeneity in weather that may be important for simulations aimed at understanding how building stocks respond to a changing climate. We quantify the potential reduction in temperature and load biases from using an increasing number of weather stations over the western U.S. Our novel approach is based on deriving temperature and load time series using incrementally more weather stations, ranging from 8 to roughly 150, to evaluate the ability to capture weather patterns across different seasons. Using 8 stations across the western U.S., one from each IECC climate zone, results in an average absolute summertime temperature bias of ∼4.0 °C with respect to a high-resolution gridded dataset. The mean absolute bias drops to ∼1.5 °C using all available weather stations. Temperature biases of this magnitude could translate to absolute summertime mean simulated load biases as high as 13.5%. Increasing the size of the domain over which biases are calculated reduces their magnitude as positive and negative biases may cancel out. Using 8 representative weather stations can lead to a 20–40% bias of peak building loads during both summer and winter, a significant error for capacity expansion planners who may use these types of simulations. Using weather stations close to population centers reduces both mean and peak load biases. This approach could be used by others designing aggregate building simulations to understand the sensitivity to their choice of weather stations used to drive the models.