Snowball sampling and conditional estimation for exponential random graph models for large networks in high performance computing
The main objective of the proposed project Snowball is to develop and co-design a new evaluation framework that extends the possibility of specifying and estimating Exponential Random Graph Models (ERGMs) for the analysis of very large social networks. This will be done by combining new algorithmic ideas with state of the art knowledge in software development for high-performance computing. We will create and implement the Snowball framework along co-design principles, as the algorithmic developments will be done in close interplay with the realization of the new framework. The resulting libraries will provide new computational tools for estimating ERGMs to the research community in social sciences, which will be capable to exploit the computational power of current and future supercomputers. It is expected that these new possibilities provided by Snowball eventually will significantly expand the statistical analysis of social network data with ERGMs far beyond what is currently available.
Exponential random graph models are a class of statistical models for complex network structures, which have been used to study social networks, communication networks, and organizational structures. Furthermore, they have been applied widely across the social sciences, from studies of animal social behavior to criminal networks to health behaviors to archaeology. However, estimation is computationally demanding for large networks scaling at least quadratically in the size of the network. Moreover, this estimation is inherently sequential, thus limiting the ability take advantage of high performance computing. In order to overcome this severe restriction, Stivala et al. (2014) have proposed the parallel estimation of multiple snowball samples, combined with a post-hoc multilevel modelling technique used for aggregation of the obtained data. This not only enables efficient ERGM parameter estimation for networks far larger than previously possible, but also allows for using parallel computers. Using this technique, up to now ERGM parameters for an empirical collaboration network with over 40’000 nodes have been estimated.
However, there are still several challenges to be answered, and the method has to be refined, improved, and extended on the methodological side. It is therefore the aim of this project Snowball, to address these central aspects by co-designing a new estimation framework for ERGMs. Snowball - which is a collaboration between the University of Lugano and the University of Melbourne, Australia, where the approach of Stivala et al. has been developed - will enable additional progress in social sciences by allowing for the analysis of large networks, i.e. the estimation and evaluation of Exponential random graph models (ERGMs). This will be achieved by combining new ideas on the side of the methods with carefully co-designed scientific software. The methods and scientific libraries developed and designed in Snowball will be made available to the scientific community, thus enabling computational driven research on current and future supercomputers in social sciences.