Higher order robust resampling and multiple testing methods
Resampling methods are powerful tools in modern statistics and econometrics. For instance, bootstrap procedures and subsampling procedures have widespread applicability, and are useful for a wide variety of inference problems in many fields. Resampling methods are essential for a large class of problems and applications in Economics, Econometrics and Statistics, when an estimate of the finite sample distribution of a relevant statistic is needed and, at the same time, analytical asymptotic approximations of this distribution are either not easily applicable or not sufficiently accurate. Multiple testing problems are linked to a broad class of applications, in which it is necessary to compute the distribution of the relevant statistic, under a given null hypothesis, by a suitable resampling approach. In these settings, one needs to jointly test a potentially large number of null hypotheses, by computing statistics that are functionals of a (potentially large) set of individual tests statistics, associated with different models or objects. Resampling methods are key also to compute asymptotically valid significance levels in these contexts, while controlling the probability of erroneously rejecting too many null hypothesis in a multiple test. The need for robust statistical procedures in estimation and testing has been stressed by many authors and is now widely recognized. More recent research has shown that inference provided by bootstrap and subsampling tests may also be easily inflated by a small fraction of anomalous observations. Intuitively, this feature is explained by the too high fraction of anomalous data that is often simulated by standard bootstrap and subsampling procedures, when compared to the actual fraction of anomalous observations in the original data. Since it is not possible to mitigate this problem simply by applying conventional resampling methods to more robust estimators or test statistics, this direction of research has proposed different types of more robust resampling schemes. The project starts by studying a new broad class of estimators and resampling methods, with improved robustness and convergence properties. In contrast to the vast majority of the literature in Statistics and Econometrics, a main focus is on higher order (at least, second order) robust methods, in the attempt to simultaneously ensure good robustness features of a statistic, together with an improved convergence to the relevant limit distribution. In parallel, the project considers a broadly applicable robust approach for multiple testing problems, based on our robust resampling procedures. In this research, our goal is to produce multiple testing methods that can control the probability of performing too many false null hypothesis rejections in a way that is robust to the presence of outliers or other anomalous data points. We develop our (higher order) robust resampling approach starting from the class of second order robust estimators, recently studied in La Vecchia et al (2012). In this setting, we introduce robust resampling procedures with improved robustness and convergence features, relative to existing proposals in the literature. This way, we hope to open the door for a potentially large number of new applications, in which our robust resampling procedures can help to produce more robust and reliable inference results. We feel that this is a desirable goal, (i) given the unavoidable use of resampling methods for many relevant research problems in applied Economics, Econometric and Statistics, and (ii) because of the fragility and weak convergence features of most existing resampling schemes in presence of anomalous observations. Using higher order robust resampling methods, we can also systematically study the robustness features of multiple testing approaches, proposing a new methodology for producing robust inference conclusions also in such complex settings. In these contexts, the robustness problem can be particularly important, because it is amplified by (i) the (potentially) large number of joint hypotheses tested, (ii) the complex dependence between samples of observations underlying the individual null hypotheses and (iii) the unavoidable use of resampling methods to compute the test critical values.