Maximum Mean Discrepancy (MMD) is widely used in the fields of machine learning and statistics to quantify the distance between two distributions in $p$ -dimensional Euclidean space. The asymptotic property of sample MMD is well studied when the dimension $p$ is fixed using the theory of U statistics. As motivated by the frequent use of MMD tests on medium/high-dimensional data, we propose to investigate the behavior of sample MMDs in high-dimensional environments and develop new studentized test statistics. Specifically, we obtain the central limit theorem for the Studentized sample MMD, since both the dimension $p$ and the sample size $n,m$ diverge to infinity. Our results apply to a wide range of kernels, including the general Gaussian and Laplacian kernels, and also cover energy distances as a special case. We also derive explicit convergence rates under mild assumptions, and our results suggest that the accuracy of the normal approximation may improve with dimensionality. Furthermore, we provide a general theory for power analysis under the alternative hypothesis and show that the proposed test can detect differences between two distributions in a moderately high-dimensional region. Numerical simulations demonstrate the effectiveness of the proposed test statistic and normal approximation.

Source link


Leave A Reply