Statistical solutions for regression-type models with big spatial data
Due to the increased human ability to acquire detailed information through very different sources and sophisticated technical devices, the last decades have witnessed a formidable explosion of geo-located data in all branches of scientific research. Spatial econometrics methods provide the right environment for spatial data modelling to identify causal mechanism and assist empirically-supported decisions. However, such models become computationally prohibitive even with increasing power computing machines, when applied to very large datasets. Since the availability of data increases at a higher speed than computing power, in the absence of appropriate new statistical approaches, we will find ourselves in the future with massive quantities of spatial data that we will be unable to treat timely and accurately as it is required in many applications (like, e. g., in environmental disasters management). This project has two aims. First of all it aims at evaluating the performances of the currently available methods in the estimation of spatial regression models based on very large datasets. The models will be evaluated (in terms of time, accuracy and memory storage required) through an intense program of Monte Carlo simulations. The second aim is to identify possible alternative modeling strategies that drastically depart from the current standard methodologies employed for spatial regressions and that will be able to tackle the current computational problems will big datasets. We will explore, in particular, three alternatives. First of all, we will develop Bayesian versions of the currently available methodologies. Secondly, we will adapt recursive methods (such as Kalman filter and state-space models) to the peculiar case of spatial data. Finally, we will make use of multilevel methods to examine the data at different hierarchical levels adopting the "divide et impera" tactic of reducing a complex phenomenon into smaller sub-problems that are easier to treat both analytically and computationally. The expected results will include theory (new models specification, new estimation methods), tools (software routines) and applications, with a particular emphasis on environmental, economic and health data.