As a flexible nonparametric learning tool, the random forest algorithm has been widely applied in various practical applications and exhibits attractive empirical performance even in the presence of high-dimensional feature spaces. Uncovering the underlying mechanisms has yielded some important recent theoretical results on the consistency of random forest algorithms and their variants. However, to our knowledge, almost all existing studies on random forest consistency in high-dimensional settings were established for various modified random forest models whose splitting rules are response-independent. A few exceptions assume simple data generation models with binary functionality. In light of this, this paper derives the consistency rate of the random forest algorithm associated with the sample CART splitting criteria. This is what was used in the original version of the algorithm for biased variance decomposition analysis in a general high-dimensional nonparametric regression setting. Our new theoretical results show that random forests indeed adapt to high dimensions and allow discontinuous regression functions. Bias analysis explicitly characterizes how the bias of a random forest depends on the sample size, tree height, and column subsampling parameters. We also describe some limitations to the current results.

    Source link


    Leave A Reply