The ability of eleven models in simulating the aerosol vertical distribution from regional to global scales, as part of the second phase of the AeroCom model inter-comparison initiative (AeroCom II) is assessed and compared to results of the first phase. The evaluation is performed using a global monthly gridded dataset of aerosol extinction profiles built for this purpose from the CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) Layer Product 3.01. Results over 12 sub-continental regions show that five models improved whereas three degraded in reproducing the inter-regional variability in Zα 0-6 km, the mean extinction height diagnostic, as computed from the CALIOP aerosol profiles over the 0-6 km altitude range for each studied region and season. While the models’ performance remains highly variable, the simulation of the timing of the Zα 0-6 km peak season has also improved for all but two models from AeroCom Phase I to Phase II. The biases in Zα 0-6 km are smaller in all regions except Central Atlantic, East Asia, North and South Africa. Most of the models now underestimate Zα 0-6 km over land, notably in the dust and biomass burning regions in Asia and Africa. At global scale, the AeroCom II models better reproduce the Zα 0-6 km latitudinal variability over ocean than over land. Hypotheses for the performance and evolution of the individual models and for the inter-model diversity are discussed. We also provide an analysis of the CALIOP limitations and uncertainties contributing to the differences between the simulations and observations.