We propose a goodness-of-fit test for order-corrected stochastic block models (DCSBM). This test is based on the adjusted chi-square statistic for measuring equality of means between groups of $n$ multinomial distributions with $d_1,\dots,d_n$ observations. In the context of network models, the setting deviates from classical asymptotics because the number of polynomials $n$ grows much faster than the number of observations $d_i$ corresponding to the degree of node $i$. As long as the harmonic mean of $\{d_i\}$ grows to infinity, we show that the statistic converges to a distribution below zero with a simple adjustment. When applied consecutively, the test can also be used to determine the number of communities. The test works on a compressed version of the adjacency matrix conditional on the degree, so it is highly scalable for large sparse networks. It incorporates the novel idea of ​​compressing rows based on the $(K+1)$ community allocation when testing $K$ communities. This approach increases the power of sequential applications without sacrificing computational efficiency and proves consistency in recovering community numbers. Its utility extends beyond sequential testing, as the test statistic is independent of any particular choice, and can be used to simultaneously test against a wide range of choices outside the DCSBM family. In particular, we prove that our tests are consistent for a general family of latent variable network models with community structure.

Source link


Leave A Reply