In empirical work, it is common to estimate model parameters and report associated standard errors that account for unit “clustering” where clusters are defined by factors such as geography. Clustering adjustments are usually motivated by concerns that unobserved components of results for units within clusters are correlated. However, this motivation does not provide guidance on questions such as: How can we justify the common practice of clustering in observational studies rather than randomized experiments, or by state rather than by gender? And are there alternatives that are responsive to the data and less conservative? (iii) How does the choice of whether to cluster and how to cluster make a difference in a difficult situation? We use a sampling and design reasoning framework to address these questions. We argue that if sampling follows a two-step process, clustering may be needed to address the sampling problem. In the first stage a subset of clusters is sampled from the population of clusters and in the second stage units are sampled from the sampled clusters. Clustered standard errors then account for the presence of clusters in the population that are not found in the sample. Clustering may be necessary to account for design issues when treatment assignment is correlated with membership within a cluster. We propose a new variance estimator to deal with intermediate settings where conventional cluster standard errors are unnecessarily conservative and robust standard errors are too small.

    Source link


    Leave A Reply