2011 03 v.26;No.126 32-38
随机森林方法研究综述
基金项目(Foundation):
中央高校基本科研业务费专项资金《基于数据挖掘的数据质量管理研究》(2010221040);;
国家统计局重点项目《金融风险中的统计方法》(2009LZ045)
邮箱(Email):
DOI:
中文作者单位:
厦门大学经济学院计划统计系;厦门大学数据挖掘研究中心;
摘要(Abstract):
随机森林(RF)是一种统计学习理论,它是利用bootsrap重抽样方法从原始样本中抽取多个样本,对每个bootsrap样本进行决策树建模,然后组合多棵决策树的预测,通过投票得出最终预测结果。它具有很高的预测准确率,对异常值和噪声具有很好的容忍度,且不容易出现过拟合,在医学、生物信息、管理学等领域有着广泛的应用。为此,介绍了随机森林原理及其有关性质,讨论其最新的发展情况以及一些重要的应用领域。
关键词(KeyWords):
随机森林;;分位数回归森林;;生存回归森林;;应用
39,764 | 1,947 | 13 |
下载次数 | 被引频次 | 阅读次数 |
参考文献
[1]Breiman L.Bagging Preditors[J].Machine Learning,1996,24(2).
[2]Dietterich T.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging,Boosting and Randomization[J].Machine Learning,2000,40(2).
[3]Ho T K.The Random Subspace Method for Constructing Decision Forests[J].Trans.on Pattern Analysis and MachineIntelligence,1998,20(8).
[4]Amit Y,Geman D.Shape Quantization and Recognition with Randomized Trees[J].Neural Computation,1997,9(7).
[5]Breiman L.Random Forests[J].Machine Learning,2001,45(1).
[6]Tibshirani R.Bias,Variance,and Prediction Error for Classification Rules[C].Technical Report,Statistics Department,University of Toronto,1996.
[7]Wolpert D H,Macready W G.An Efficient Method to Estimate Bagging’s Generalization Error[J].Machine Learning,1999,35(1).
[8]Breiman L.Out-of-bag Estimation[EB/OL].[2010-06-30].http//stat.berkeley.edu/pub/users/breiman/OOB estimation.ps.
[9]Breiman L.Randomizing Outputs to Increase Prediction Accuracy[J].Machine Learning,2000,40(3).
[10]Ishwaran H,Kogalur U B,Blackstone E H,Lauer M S.Random Survival Forests[J].The Annals of Applied Statis-tics,2008,2(3).
[11]Ishwaran H,Udaya B,Kogalur.Consistency of Random Survival Forests[J].Statistics and Probability Letters,2010,80(13/14).
[12]Nicolai,Meinshausen.Quantile Regression Forests[J].Journal of Machine Learning Research,2006,7(6).
[13]Lin Y,Jeon Y.Random Forests and Adaptive Nearest Neighbors[J].Journal of the American Statistics Assoccation,2006,101(474).
[14]Sexton J,Laake P.Standard Errors for Bagged and Random Forest Estimators[J].Computational Statistics&DataAnalysis,2009,53(1).
[15]Brence J R,Brown D E.Improving the Robust Random Forest Regression Algorithm[R].Systems and InformationEngineering Technical Papers,Department of Systems and Information Engineering,University of Virginia,2006.
[16]Parkhurst D F,Brenner K P,Dufour A P,Wymer L J.Indicator Bacteria at Five Swimming Beaches—Analysis UsingRandom Forests[J].Water Research,2005,39(7).
[17]Smith A,Sterba-Boatwright B,Mott J.Novel Application of a Statistical Technique,Random Forests,in a BacterialSource Tracking Study[J].Water Research,2010,44(14).
[18]Perdiguero-Alonso D,Montero F E,A Kostadinova,Raga J A,Barrett J.Random Forests,a Novel Approach forDiscrimination of Fish Populations Using Parasites as Biological Tags[J].International Journal for Parasitology,2008,38(12).
[19]Gislason P O,Benediktsson J A,Sveinsson J R.Random Forests for Land Cover Classification[J].Pattern RecognitionLetters,2006,27(4).
[20]Jan P,Bernard D B,Niko E C V,Roeland S,Sven D,Piet D B,Willy H.Random Forests as a Tool for EcohydrologicalDistribution Modelling[J].Ecological Modelling,2007,207(2/4).
[21]Lee S L A,Kouzania A Z,Hu E J.Random Forest Based Lung Nodule Classification Aided by Clustering[J].Computerized Medical Imaging and Graphics,2010,34(7).
[22]Diaz-Uriate R,Andres S A D.Gene Selection and Classification of Microarray Data Using Random Forest[J].BMCBioinformatics,2006,7(3).
[23]Chen X W,Liu M.Prediction of Protein-protein Interactions Using Random Decision Forest Framework[J].Bioinformatics,2006,21(24).
[24]Pal M.Random Forest Classifier for Remote Sensing Classification[J].Remote Sens,2005,26(1).
[25]Ham J,Chen Y C,Crawford M P,Ghosh J.Investigation of the Random Forest Framework for Classification ofhyperspectral Data[J].IEEE Trans.Geosci.Remote Sens,2005,43(3).
[26]Gislason P O,Benediktsson J A,Sveinsson J R.Random Forests for Land Cover Classification[J].Pattern Recogn.Lett,2006,27(4).
[27]Xu P,Jelinek F.Random Forests and the Data Sparseness Problem in Language Modeling[J].Computer Speech&Language,2007,21(1).
[28]Auret L,Aldrich C.Change Point Detection in Time Series Data with Random Forests[J].Control Engineering Practice,2010,18(8).
[29]Larivière B,Poel D V D.Predicting Customer Retention and Profitability by Using Random Forests and RegressionForests Techniques[J].Expert Systems with Applications,2005,29(2).
[30]Xie Y,Li X,Ngai E W T,Wei Y Y.Customer Churn Prediction Using Improved Balanced Random Forests[J].ExpertSystems with Applications,2009,36(3).
[31]Coussement K,Poel D V D.Churn Prediction in Subscription Services:An Application of Support Vector MachinesWhile Comparing Two Parameter-Selection Techniques[J].Expert Systems with Applications,2008,34(1).
[32]Burez J,Poel D V D.Handling Class Imbalance in Customer Churn Prediction[J].Expert Systems with Applications,2009,36(3).
[33]Coussement K,Poel D V D.Improving Customer Attrition Prediction by Integrating Emotions from Client/CompanyInteraction Emails and Evaluating Multiple Classifiers[J].Expert Systems with Applications,2009,36(3).
[34]Buckinx W,Verstraeten G,Poel D V D.Predicting Customer Loyalty Using the Internal Transactional Database[J].Expert Systems with Applications,2007,32(1).
[35]Figini S,Fantazzini D.Random Survival Forests Models for SME Credit Risk Measurement[J].Methodology andComputing in Applied Probability,2009,11(1).
[36]Yasushi U,Hiroyuki M.Credit Risk Evaluation of Power Market Players with Random Forest[J].Transactions onPower and Energy,2008,128(1).
[37]林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报:自然科学版,2007,46(2).
[38]方匡南,朱建平.基于随机森林方法的基金超额收益方向预测与交易策略研究[J].经济经纬,2010(2).
[39]刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报:自然科学版,2008,36(1).
[40]Keely L C,Tan C M.Understanding Preferences for Income Redistribution[J].Journal of Public Economics,2008,92(516).
[41]Lessmann S,Sung M-C,Johnson J E V.Alternative Methods of Predicting Competitive Events:An Application inHorserace Betting Markets[J].International Journal of Forecasting,2010,26(3).
[42]Verikas A,Gelzinis A,Bacauskiene M.Mining Data with Random Forests:A Survey and Results of New Tests[J].Pattern Recognition,2011,44(2).
[2]Dietterich T.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging,Boosting and Randomization[J].Machine Learning,2000,40(2).
[3]Ho T K.The Random Subspace Method for Constructing Decision Forests[J].Trans.on Pattern Analysis and MachineIntelligence,1998,20(8).
[4]Amit Y,Geman D.Shape Quantization and Recognition with Randomized Trees[J].Neural Computation,1997,9(7).
[5]Breiman L.Random Forests[J].Machine Learning,2001,45(1).
[6]Tibshirani R.Bias,Variance,and Prediction Error for Classification Rules[C].Technical Report,Statistics Department,University of Toronto,1996.
[7]Wolpert D H,Macready W G.An Efficient Method to Estimate Bagging’s Generalization Error[J].Machine Learning,1999,35(1).
[8]Breiman L.Out-of-bag Estimation[EB/OL].[2010-06-30].http//stat.berkeley.edu/pub/users/breiman/OOB estimation.ps.
[9]Breiman L.Randomizing Outputs to Increase Prediction Accuracy[J].Machine Learning,2000,40(3).
[10]Ishwaran H,Kogalur U B,Blackstone E H,Lauer M S.Random Survival Forests[J].The Annals of Applied Statis-tics,2008,2(3).
[11]Ishwaran H,Udaya B,Kogalur.Consistency of Random Survival Forests[J].Statistics and Probability Letters,2010,80(13/14).
[12]Nicolai,Meinshausen.Quantile Regression Forests[J].Journal of Machine Learning Research,2006,7(6).
[13]Lin Y,Jeon Y.Random Forests and Adaptive Nearest Neighbors[J].Journal of the American Statistics Assoccation,2006,101(474).
[14]Sexton J,Laake P.Standard Errors for Bagged and Random Forest Estimators[J].Computational Statistics&DataAnalysis,2009,53(1).
[15]Brence J R,Brown D E.Improving the Robust Random Forest Regression Algorithm[R].Systems and InformationEngineering Technical Papers,Department of Systems and Information Engineering,University of Virginia,2006.
[16]Parkhurst D F,Brenner K P,Dufour A P,Wymer L J.Indicator Bacteria at Five Swimming Beaches—Analysis UsingRandom Forests[J].Water Research,2005,39(7).
[17]Smith A,Sterba-Boatwright B,Mott J.Novel Application of a Statistical Technique,Random Forests,in a BacterialSource Tracking Study[J].Water Research,2010,44(14).
[18]Perdiguero-Alonso D,Montero F E,A Kostadinova,Raga J A,Barrett J.Random Forests,a Novel Approach forDiscrimination of Fish Populations Using Parasites as Biological Tags[J].International Journal for Parasitology,2008,38(12).
[19]Gislason P O,Benediktsson J A,Sveinsson J R.Random Forests for Land Cover Classification[J].Pattern RecognitionLetters,2006,27(4).
[20]Jan P,Bernard D B,Niko E C V,Roeland S,Sven D,Piet D B,Willy H.Random Forests as a Tool for EcohydrologicalDistribution Modelling[J].Ecological Modelling,2007,207(2/4).
[21]Lee S L A,Kouzania A Z,Hu E J.Random Forest Based Lung Nodule Classification Aided by Clustering[J].Computerized Medical Imaging and Graphics,2010,34(7).
[22]Diaz-Uriate R,Andres S A D.Gene Selection and Classification of Microarray Data Using Random Forest[J].BMCBioinformatics,2006,7(3).
[23]Chen X W,Liu M.Prediction of Protein-protein Interactions Using Random Decision Forest Framework[J].Bioinformatics,2006,21(24).
[24]Pal M.Random Forest Classifier for Remote Sensing Classification[J].Remote Sens,2005,26(1).
[25]Ham J,Chen Y C,Crawford M P,Ghosh J.Investigation of the Random Forest Framework for Classification ofhyperspectral Data[J].IEEE Trans.Geosci.Remote Sens,2005,43(3).
[26]Gislason P O,Benediktsson J A,Sveinsson J R.Random Forests for Land Cover Classification[J].Pattern Recogn.Lett,2006,27(4).
[27]Xu P,Jelinek F.Random Forests and the Data Sparseness Problem in Language Modeling[J].Computer Speech&Language,2007,21(1).
[28]Auret L,Aldrich C.Change Point Detection in Time Series Data with Random Forests[J].Control Engineering Practice,2010,18(8).
[29]Larivière B,Poel D V D.Predicting Customer Retention and Profitability by Using Random Forests and RegressionForests Techniques[J].Expert Systems with Applications,2005,29(2).
[30]Xie Y,Li X,Ngai E W T,Wei Y Y.Customer Churn Prediction Using Improved Balanced Random Forests[J].ExpertSystems with Applications,2009,36(3).
[31]Coussement K,Poel D V D.Churn Prediction in Subscription Services:An Application of Support Vector MachinesWhile Comparing Two Parameter-Selection Techniques[J].Expert Systems with Applications,2008,34(1).
[32]Burez J,Poel D V D.Handling Class Imbalance in Customer Churn Prediction[J].Expert Systems with Applications,2009,36(3).
[33]Coussement K,Poel D V D.Improving Customer Attrition Prediction by Integrating Emotions from Client/CompanyInteraction Emails and Evaluating Multiple Classifiers[J].Expert Systems with Applications,2009,36(3).
[34]Buckinx W,Verstraeten G,Poel D V D.Predicting Customer Loyalty Using the Internal Transactional Database[J].Expert Systems with Applications,2007,32(1).
[35]Figini S,Fantazzini D.Random Survival Forests Models for SME Credit Risk Measurement[J].Methodology andComputing in Applied Probability,2009,11(1).
[36]Yasushi U,Hiroyuki M.Credit Risk Evaluation of Power Market Players with Random Forest[J].Transactions onPower and Energy,2008,128(1).
[37]林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报:自然科学版,2007,46(2).
[38]方匡南,朱建平.基于随机森林方法的基金超额收益方向预测与交易策略研究[J].经济经纬,2010(2).
[39]刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报:自然科学版,2008,36(1).
[40]Keely L C,Tan C M.Understanding Preferences for Income Redistribution[J].Journal of Public Economics,2008,92(516).
[41]Lessmann S,Sung M-C,Johnson J E V.Alternative Methods of Predicting Competitive Events:An Application inHorserace Betting Markets[J].International Journal of Forecasting,2010,26(3).
[42]Verikas A,Gelzinis A,Bacauskiene M.Mining Data with Random Forests:A Survey and Results of New Tests[J].Pattern Recognition,2011,44(2).
基本信息:
DOI:
中图分类号:O212.2
引用信息:
[1]方匡南,吴见彬,朱建平等.随机森林方法研究综述[J].统计与信息论坛,2011,26(03):32-38.
基金信息:
中央高校基本科研业务费专项资金《基于数据挖掘的数据质量管理研究》(2010221040);; 国家统计局重点项目《金融风险中的统计方法》(2009LZ045)
暂无数据