A wide range of techniques have been proposed to deal with distribution shifts, but a simple baseline training on a $\textit{undersampled}$ balanced dataset is often the case in several popular benchmarks. Achieve near-state-of-the-art accuracy. This is rather surprising because the undersampling algorithm discards excess majority group data. To understand this phenomenon, we ask whether learning is fundamentally limited by the lack of samples for minority groups. In the setting of nonparametric binary classification, we prove that this is indeed the case. Our results show that, in the worst case, unless there is a high degree of overlap between the training and test distributions (which is not the case in real datasets), the algorithm is less likely to undersample than undersampling, unless the algorithm exploits additional structure. It also shows that it cannot perform well. About delivery shift. In particular, for label shifting, we show that there always exists a minimax-optimal undersampling algorithm. For group covariate shifts, we show that there are undersampling algorithms that are minimax optimal when the overlap between group distributions is small. We also performed an experimental case study on the label shift dataset and found that, in line with theory, the test accuracy of robust neural network classifiers is constrained by the number of small samples.