This study presents results from PAMIP time-slice experiments that aim to isolate the atmospheric response to Arctic sea ice loss at global warming levels of +2°C. Using two different AGCMs, we have increased the ensemble size up to 300 ensemble members, beyond the recommended 100 members. Partitioning the response in groups of 100-ensemble members, we explore the reproducibility of the results, with a focus on the response of the mid-latitude jet streams in the North Atlantic and North Pacific. We find substantial differences in the mid-latitude response among the different experiment subsets, suggesting that 100-member ensembles are still influenced by internal variability, which can mislead conclusions. This lack of consistency is found for responses that are statistically significant according to Student-t and False Discovery Rate tests. This is problematic for multi-model assessment of the response, since differences may be attributed to model difference while they simply arise from internal variability. We propose a method to overcome this consistency issue, that allows for more robust conclusions when only 100 ensemble members are used.