We develop a theoretical framework for the analysis of diagonal decision trees, where the split at each decision node occurs at a linear combination of covariates (forcing splits along an axis containing only a single covariate as opposed to traditional tree building). This methodology has received a great deal of attention from the computer science and optimization community since the mid-80s, although its superiority over its axial counterpart is only empirically justified. , the explanation for its success is largely heuristic-based. To bridge this long-standing gap between theory and practice, oblique regression trees (constructed by recursively minimizing the squared error) satisfy a kind of Oracle inequality and consist of ridge functions and their linear combinations. We show that it can be adapted to an extensive library of regression models. limit point. This provides a quantitative baseline to compare and contrast decision trees with other less interpretable methods, such as projection pursuit regression and neural networks, which target similar model forms. Contrary to popular belief, you don’t always have to trade off accuracy for interpretability. Specifically, we show that under appropriate conditions, skewed decision trees achieve prediction accuracy similar to neural networks from the same library of regression models. To address the combinatorial complexity of finding the optimal dividing hyperplane at each decision node, the proposed theoretical framework can accommodate many existing computational tools in the literature. Our results rely on a (perhaps surprising) relationship between recursive adaptive partitioning and iterative greedy approximation algorithms (such as the orthogonal greedy algorithm) for convex optimization problems and are subject to independent theoretical interest. There is a possibility.