Non-Parametric Projections of National Income Distribution Consistent with the Shared Socioeconomic Pathways
Projections of income distribution are of growing importance for a variety of purposes. Future consumption is highly dependent on the distribution of income as the demand for most commodities is non-linear in income. Future projections of income distribution are also important in representing the future vulnerability to social, economic, and environmental stressors. Several global, country-level projections of income distribution are available but most project only the Gini coefficient (a summary statistic of the distribution) or utilize the Gini along with the assumption of a functional form of distribution. The log-normal distribution is the most common assumption for the functional form of the income distribution, regardless of the source of the Gini projection. However, this functional form has the documented limitation that observations are known to deviate from the lognormal in the tails of the distribution. Since the Gini coefficient is more representative of the middle portion of the income distribution, specifying a lognormal distribution based on a Gini coefficient introduces further error in representing the data.
In this paper, we develop an alternative model for the representation of income distribution data based on principal component analysis (PCA) and test it, along with the lognormal assumption, against a consistent dataset on income distributions. We then use this PCA-based model, combined with the projected Gini coefficients from Rao et al. (2018), to produce a new set of global, country-specific income distributions consistent with the Shared Socioeconomic Pathways (SSPs). The PCA-based model presented in this paper is found to provide a better fit to data on household income distributions and thus improves upon projections from existing models, especially the lognormal-based model.
The lognormal functional form almost universally underestimates the income shares of the higher income deciles. In contrast, the Principal Component-based model results in higher levels of inequality within a region for the same level of regional income compared to the lognormal functional form. While the differences in the projected income shares between models are most prominent for the upper deciles, there are large changes in the absolute income levels for the lower deciles (as high as +100%). This implies that using these income distributions may have significant impacts when modeling consumer demand for goods and services for all consumers, especially for low-income consumers, and will also affect the analysis of the implications of future poverty levels.