39,782 | 1,947 | 200 |
下载次数 | 被引频次 | 阅读次数 |
随机森林(RF)是一种统计学习理论,它是利用bootsrap重抽样方法从原始样本中抽取多个样本,对每个bootsrap样本进行决策树建模,然后组合多棵决策树的预测,通过投票得出最终预测结果。它具有很高的预测准确率,对异常值和噪声具有很好的容忍度,且不容易出现过拟合,在医学、生物信息、管理学等领域有着广泛的应用。为此,介绍了随机森林原理及其有关性质,讨论其最新的发展情况以及一些重要的应用领域。
Abstract:Random Forests is a statistical learning theory,using bootsrap re-sampling method form sample sets,and then combining the tree predictors by majority voting so that each tree is grown using a new bootstrap training set.It is widely applied in medicine,bioinformatics,economics and other fields,because of its high prediction accuracy,good tolerance of noisy data,and the law of large numbers they do not overfit.In this paper we first introduce the concept of random forest and the latest research,then provide some important aspects of applications in economics,and a summary is given in the final section.
[1]Breiman L.Bagging Preditors[J].Machine Learning,1996,24(2).
[2]Dietterich T.An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees:Bagging,Boosting and Randomization[J].Machine Learning,2000,40(2).
[3]Ho T K.The Random Subspace Method for Constructing Decision Forests[J].Trans.on Pattern Analysis and MachineIntelligence,1998,20(8).
[4]Amit Y,Geman D.Shape Quantization and Recognition with Randomized Trees[J].Neural Computation,1997,9(7).
[5]Breiman L.Random Forests[J].Machine Learning,2001,45(1).
[6]Tibshirani R.Bias,Variance,and Prediction Error for Classification Rules[C].Technical Report,Statistics Department,University of Toronto,1996.
[7]Wolpert D H,Macready W G.An Efficient Method to Estimate Bagging’s Generalization Error[J].Machine Learning,1999,35(1).
[8]Breiman L.Out-of-bag Estimation[EB/OL].[2010-06-30].http//stat.berkeley.edu/pub/users/breiman/OOB estimation.ps.
[9]Breiman L.Randomizing Outputs to Increase Prediction Accuracy[J].Machine Learning,2000,40(3).
[10]Ishwaran H,Kogalur U B,Blackstone E H,Lauer M S.Random Survival Forests[J].The Annals of Applied Statis-tics,2008,2(3).
[11]Ishwaran H,Udaya B,Kogalur.Consistency of Random Survival Forests[J].Statistics and Probability Letters,2010,80(13/14).
[12]Nicolai,Meinshausen.Quantile Regression Forests[J].Journal of Machine Learning Research,2006,7(6).
[13]Lin Y,Jeon Y.Random Forests and Adaptive Nearest Neighbors[J].Journal of the American Statistics Assoccation,2006,101(474).
[14]Sexton J,Laake P.Standard Errors for Bagged and Random Forest Estimators[J].Computational Statistics&DataAnalysis,2009,53(1).
[15]Brence J R,Brown D E.Improving the Robust Random Forest Regression Algorithm[R].Systems and InformationEngineering Technical Papers,Department of Systems and Information Engineering,University of Virginia,2006.
[16]Parkhurst D F,Brenner K P,Dufour A P,Wymer L J.Indicator Bacteria at Five Swimming Beaches—Analysis UsingRandom Forests[J].Water Research,2005,39(7).
[17]Smith A,Sterba-Boatwright B,Mott J.Novel Application of a Statistical Technique,Random Forests,in a BacterialSource Tracking Study[J].Water Research,2010,44(14).
[18]Perdiguero-Alonso D,Montero F E,A Kostadinova,Raga J A,Barrett J.Random Forests,a Novel Approach forDiscrimination of Fish Populations Using Parasites as Biological Tags[J].International Journal for Parasitology,2008,38(12).
[19]Gislason P O,Benediktsson J A,Sveinsson J R.Random Forests for Land Cover Classification[J].Pattern RecognitionLetters,2006,27(4).
[20]Jan P,Bernard D B,Niko E C V,Roeland S,Sven D,Piet D B,Willy H.Random Forests as a Tool for EcohydrologicalDistribution Modelling[J].Ecological Modelling,2007,207(2/4).
[21]Lee S L A,Kouzania A Z,Hu E J.Random Forest Based Lung Nodule Classification Aided by Clustering[J].Computerized Medical Imaging and Graphics,2010,34(7).
[22]Diaz-Uriate R,Andres S A D.Gene Selection and Classification of Microarray Data Using Random Forest[J].BMCBioinformatics,2006,7(3).
[23]Chen X W,Liu M.Prediction of Protein-protein Interactions Using Random Decision Forest Framework[J].Bioinformatics,2006,21(24).
[24]Pal M.Random Forest Classifier for Remote Sensing Classification[J].Remote Sens,2005,26(1).
[25]Ham J,Chen Y C,Crawford M P,Ghosh J.Investigation of the Random Forest Framework for Classification ofhyperspectral Data[J].IEEE Trans.Geosci.Remote Sens,2005,43(3).
[26]Gislason P O,Benediktsson J A,Sveinsson J R.Random Forests for Land Cover Classification[J].Pattern Recogn.Lett,2006,27(4).
[27]Xu P,Jelinek F.Random Forests and the Data Sparseness Problem in Language Modeling[J].Computer Speech&Language,2007,21(1).
[28]Auret L,Aldrich C.Change Point Detection in Time Series Data with Random Forests[J].Control Engineering Practice,2010,18(8).
[29]Larivière B,Poel D V D.Predicting Customer Retention and Profitability by Using Random Forests and RegressionForests Techniques[J].Expert Systems with Applications,2005,29(2).
[30]Xie Y,Li X,Ngai E W T,Wei Y Y.Customer Churn Prediction Using Improved Balanced Random Forests[J].ExpertSystems with Applications,2009,36(3).
[31]Coussement K,Poel D V D.Churn Prediction in Subscription Services:An Application of Support Vector MachinesWhile Comparing Two Parameter-Selection Techniques[J].Expert Systems with Applications,2008,34(1).
[32]Burez J,Poel D V D.Handling Class Imbalance in Customer Churn Prediction[J].Expert Systems with Applications,2009,36(3).
[33]Coussement K,Poel D V D.Improving Customer Attrition Prediction by Integrating Emotions from Client/CompanyInteraction Emails and Evaluating Multiple Classifiers[J].Expert Systems with Applications,2009,36(3).
[34]Buckinx W,Verstraeten G,Poel D V D.Predicting Customer Loyalty Using the Internal Transactional Database[J].Expert Systems with Applications,2007,32(1).
[35]Figini S,Fantazzini D.Random Survival Forests Models for SME Credit Risk Measurement[J].Methodology andComputing in Applied Probability,2009,11(1).
[36]Yasushi U,Hiroyuki M.Credit Risk Evaluation of Power Market Players with Random Forest[J].Transactions onPower and Energy,2008,128(1).
[37]林成德,彭国兰.随机森林在企业信用评估指标体系确定中的应用[J].厦门大学学报:自然科学版,2007,46(2).
[38]方匡南,朱建平.基于随机森林方法的基金超额收益方向预测与交易策略研究[J].经济经纬,2010(2).
[39]刘微,罗林开,王华珍.基于随机森林的基金重仓股预测[J].福州大学学报:自然科学版,2008,36(1).
[40]Keely L C,Tan C M.Understanding Preferences for Income Redistribution[J].Journal of Public Economics,2008,92(516).
[41]Lessmann S,Sung M-C,Johnson J E V.Alternative Methods of Predicting Competitive Events:An Application inHorserace Betting Markets[J].International Journal of Forecasting,2010,26(3).
[42]Verikas A,Gelzinis A,Bacauskiene M.Mining Data with Random Forests:A Survey and Results of New Tests[J].Pattern Recognition,2011,44(2).
基本信息:
DOI:
中图分类号:O212.2
引用信息:
[1]方匡南,吴见彬,朱建平等.随机森林方法研究综述[J].统计与信息论坛,2011,26(03):32-38.
基金信息:
中央高校基本科研业务费专项资金《基于数据挖掘的数据质量管理研究》(2010221040);; 国家统计局重点项目《金融风险中的统计方法》(2009LZ045)