统计与信息论坛

2025, 12, v.40 18-30

多源收入数据的序贯合并与不平等再测算

基金项目(Foundation): 国家社会科学基金一般项目“共享发展推动共同富裕的理论、测度方法与治理体系研究”(22BTJ036); 北京市社会科学基金规划项目重点项目“共享发展促进北京中等收入群体提质扩容研究”(23JJA004)

邮箱(Email):

DOI: 10.20207/j.cnki.1007-3116.2025.0051

262	0	155
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

使用上市公司高管薪酬数据和胡润富豪榜数据估计住户调查数据中收入信息覆盖不足的区间，将传统收入数据的合并拓展至多源收入数据的内生序贯合并。为评估合并效果，运用EM算法估计其有限元混合分布，并利用拟合面积误差比率分别对基于内生合并点与外生合并点生成的多源收入数据进行评估。此外，使用合并后的数据集测度不平等指标，以提高对高收入群体的统计调查进度和广度。研究结果显示，基于内生合并点将高收入和过高收入群体的数据信息补充到住户调查数据中，修正了影响住户调查数据准确描述收入分配尾部的小样本问题，使用基于内生合并点将住户调查数据与高收入数据进行序贯合并后的数据集误差比率的值，相对低于基于外生合并点将住户调查数据与胡润富豪榜收入数据进行合并，能够降低对收入差距的低估，更准确地反映居民收入分配状况，为共同富裕进程中规范收入分配秩序提供数据支撑。

关键词： 收入分布; 序贯合并; 高收入群体; 过高收入群体; 多源数据;

Abstract：

This paper proposes a novel approach to measuring income inequality in China by constructing a multi-source income dataset that more effectively captures the full spectrum of income distribution, particularly the upper tail.A key methodological innovation lies in the sequential merging of contemporaneous income data from three distinct data sources—household surveys, executive compensation records, and billionaire wealth rankings—enabling the endogenous identification of merging thresholds between these data sources.This strategy addresses the long-standing underrepresentation of high-income individuals in household survey data and enhances the accuracy and comprehensiveness of income inequality measurement.The merging procedure identifies two internal thresholds using representative ratios and their cumulative counterparts.These thresholds determine the income levels at which the original household data are systematically reweighted to account for high-income individuals.Specifically, household income data are drawn from two major nationally representative surveys: the China Family Panel Studies(CFPS 2016—2020) and the Chinese General Social Survey(CGSS 2015—2021),which reliably capture income patterns of low-and middle-income groups.For the high-income group, income data of listed company executives is used.For the ultra-high-income group, income data from the Hurun Rich List is adopted.Empirical results show that in 2015,the first merging point—between survey data and executive compensation data—corresponded to the 81st income percentile(RMB 62 500),rising to RMB 111 200 in 2021.The second merging point—between executive compensation and billionaire income—corresponded to the 91st percentile(RMB 1.21 million) in 2015 and RMB 1.41 million in 2021.Lorenz curves are employed to visualize income distribution before and after data integration.After correction, the upper tail of the distribution shows a noticeable “bulge”,suggesting a higher and more accurate income concentration among top earners.This pattern holds across years, underscoring the systematic underestimation of inequality in raw household survey data.These indicate that high-income undercoverage in surveys persists and intensifies over time.To evaluate the effect of the merging procedure, the paper adopts an Expectation-Maximization(EM) algorithm to estimate a flexible mixture distribution that combines three components: normal—normal—Pareto.Among several candidate models, this combination yields the best fit, as it minimizes the distribution error ratio—a metric used to evaluate the goodness-of-fit between empirical and estimated income distributions.The endogenous merging approach outperforms the exogenous one in capturing the multi-source data's distributional properties, offering a stronger empirical foundation for future studies.Further, the merged dataset not only recovers a more realistic income distribution but also supports more accurate estimation of inequality metrics such as the K coefficient and top income shares.In summary, this paper offers a data-enhanced and methodologically rigorous approach to better understanding income inequality in China.By sequential merging diverse income data sources through a theoretically grounded and statistically validated process, the study provides a valuable tool for policymakers seeking to monitor and adjust excessively high incomes in pursuit of common prosperity.

KeyWords： income distribution; sequential merging; high-income groups; ultra-income groups; multi-source data;

如需获取全文，请访问cnki.net

参考文献

[1] 王有捐.也谈居民收入的统计与调查方法[EB/OL].(2010-08-24)[2025-08-10].http://www.stats.gov.cn/ztjc/ztfx/grdd/201008/t20100824_59068.html.

[2] 王海港，周开国.中国城乡居民收入分配的不平等程度被低估了吗?——基于帕雷托分布的一个检验[J].统计研究，2006(4):8-15.

[3] 李实，罗楚亮.中国收入差距究竟有多大?——对修正样本结构偏差的尝试[J].经济研究，2011,46(4):68-79.

[4] 王小鲁.灰色收入与居民收入差距[J].中国税务，2007(10):48-49.

[5] 艾小青，祁磊.信息不完全下收入或财富基尼系数的估算[J].数量经济技术经济研究，2021,38(6):146-165.

[6] 李实，魏众，丁赛.中国居民财产分布不均等及其原因的经验分析[J].经济研究，2005(6):4-15.

[7] 唐雪梅.居民财产性收入增长差距的影响因素研究[J].兰州学刊，2014(4):127-132.

[8] 王海港.我国居民收入分配的格局——帕雷托分布方法[J].南方经济，2006(5):73-82.

[9] COWELL F,FLACHAIRE E.Income distribution and inequality measurement:the problem of extreme values[J].Journal of econometrics,2007(2):1044-1072.

[10] 段景辉，陈建宝.基于家庭收入分布的地区基尼系数的测算及其城乡分解[J].世界经济，2010,33(1):100-122.

[11] 陈涛，阮敬.收入分布曲线的线性正态插值函数拟合方法[J].经济数学，2012,29(3):47-50.

[12] 阮敬，纪宏，刘楚萍.分布视角下的异质性群体收入分配格局研究[J].数理统计与管理，2015,34(1):109-124.

[13] PIKETTY T,YANG L,ZUCMAN G.Capital accumulation,private property,and rising inequality in China,1978—2015[J].American economic review,2019,109(7):2469-2496.

[14] 罗楚亮，陈国强.富豪榜与居民财产不平等估算修正[J].经济学(季刊),2021,21(1):201-222.

[15] ALVAREDO F.A note on the relationship between top income shares and the Gini coefficient[J].Economics letters,2011,110(3):274-277.

[16] XIE Y,JIN Y.Household wealth in China[J].Chinese sociological review,2015,47(3):203-229.

[17] MEDEIROS M,DE CASTRO GALVAO J,DE AZEVEDO NAZARENO L.Correcting the underestimation of top incomes:combining data from income tax reports and the Brazilian 2010 census[J].Social indicators research,2018,135(1):233-244.

[18] LI Q,LI S,WAN H.Top incomes in China:data collection and the impact on income inequality[J].China economic review,2020,62:101495.

[19] 罗楚亮.高收入人群缺失与收入差距低估[J].经济学动态，2019(1):15-27.

[20] BLANCHET T,FLORES I,MORGAN M.The weight of the rich:improving surveys using tax data[J].The journal of economic inequality,2022,20(1):119-150.

[21] 周南南，李阿健.共同富裕背景下中国中等收入群体规模测度及质量追踪[J].统计与信息论坛，2024,39(6):114-128.

[22] 彭刚，高劲松，沈亚楠.中国城市数字经济发展对共同富裕的影响与分异[J].统计与信息论坛，2025,40(4):16-31.

[23] DEVILLE J C,S?RNDAL C E.Calibration estimators in survey sampling[J].Journal of the American Statistical Association,1992,87(418):376-382.

[24] 阮敬，刘雅楠.从分享到共享——基于CFPS收入数据的发展成果多维共享格局研究[J].财经研究，2020,46(4):4-17.

[25] 李金昌，任志远.共同富裕背景下中等收入群体的界定标准与合理规模研究[J].统计与信息论坛，2023,38(2):16-28.

[26] 陈永伟，侯升万，符大海.我国农村相对贫困标准估计与贫困动态[J].统计研究，2022,39(5):107-118.

[27] 范世铖，牛蕊.数字技术、要素收入分配与相对劳动收入份额[J].商业研究，2024(3):70-79.

(1)如CHIP、CHNS、CGSS、CFPS、CHFS等。

(2)例如,住户调查数据能够提供对中低收入群体的较完整刻画,而随着j的增加,引入的外部数据来源逐步聚焦于更高的收入水平。高管薪酬数据主要覆盖高收入群体,而胡润富豪榜数据则进一步代表超高收入人群。

(3)例如,调查数据通常对不同收入群体的覆盖存在系统性偏差,高收入群体因隐私保护、避税或拒绝调查等因素,往往在调查数据中的代表性不足。相反,部分调查可能倾向于覆盖更多低收入群体,使得低收入群体被过度代表。

(4)这样的设定基于对中低收入数据代表性的假设,即住户调查数据在该收入区间内的抽样偏误较小,因此无需依赖外部数据进行校准,从而保持合并过程中对已有数据结构的最小干预。

基本信息:

DOI：10.20207/j.cnki.1007-3116.2025.0051

中图分类号:F124.7;F222

引用信息:

[1]阮敬,刘瑞琪.多源收入数据的序贯合并与不平等再测算[J].统计与信息论坛,2025,40(12):18-30.DOI:10.20207/j.cnki.1007-3116.2025.0051.

基金信息:

国家社会科学基金一般项目“共享发展推动共同富裕的理论、测度方法与治理体系研究”(22BTJ036); 北京市社会科学基金规划项目重点项目“共享发展促进北京中等收入群体提质扩容研究”(23JJA004)

请选择需要下载的pdf数据

统计与信息论坛

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

统计与信息论坛

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈