AnaGraM: Adaptive numerical methods for time series analysis of time-dependent dynamical Graphs in the presence of Missing data
Graph-theoretical approaches are becoming more and more popular in many areas of applied science. One of the main reasons for that is a rapid development of graph theory in the last years, relating it to the theory of dynamical systems and, especially relating graphs with stochastic dynamical systems like Markov processes and random walks. These groundbreaking theoretical developments have led to an emergence of a large family of practical applications like (just to mention a few of them): non-linear dimension reduction methods (e.g., based on diffusion and Laplacian eigenmaps); transfer operator approach to dynamical systems; or new engines for internet search such as Google's PageRank algorithm. Graph-related approaches are also getting a lot of attention as tools for analyzing and visualizing the relation between the different components of very large data sets. The current research proposal is a two-year follow-up of the previous three-year SNSF-project AnaGraph: Adaptive Numerical methods for time series Analysis of time-dependent graphs in context of dynamical systems. AnaGraph was investigating the methodological approaches to data-based inference of non-stationary (i.e., changing in time) graphs in context of dynamical systems. The research program of the AnaGraph-project was outperformed, resulting in 10 scientic papers (7 of them are already published and 3 are in press) in international peerreviewed journals. Computational tools developed in the AnaGraph-project have demonstrated optimal parallelizability on modern hardware architectures of the Swiss Supercomputing Centre in Lugano (CSCS) and are applicable to realistic data analysis problems. Based on a given time series of graph-related observables, developed algorithms allow to identify the temporal changes in the underlying models, as well as to perform a model/data reduction and informationtheoretical comparison of different methods. Main published applications of the methods developed in the predecessor AnaGraph-project were concentrated in two following application areas: (i) climate/weather/turbulence research; as well as (ii) in computational sociology. Main methodological aim of this follow-up proposal is an extension of the Finite Element Methods of time series analysis with Bounded Variation of model parameters (FEM-BV) developed in the previous AnaGraph-project towards the incomplete/missing observational data. Main challenge will be in creating the methods that allow a structure-preserving (in the sense of conservation properties and causality relations of the related dynamical system) inference of the underlying graph structures and missing data imputation. Emerging methods should operate beyond the restrictive stationarity and homogeneity assumption of the standard graph and matrix completion theory approaches. Main practical aim of the current follow-up proposal will be in High-Performance Computing implementation of the developed methods on modern supercomputers and their application to problems in bioinformatics: to gene expression data analysis in genomics, data-driven inference of probabilistic gene networks, their analysis and reduction. The practical problem will be formulated in FEM-BV context as graph problems with missing data and attacked by the methods and tools that were developed in the previous project (i.e., FEM-BV-methods for model inference and reduction) and the new missing data imputation tools that will be developed in the theoretical part of the current project. Previously published work and preliminary results demonstrate that the methods developed in the AnaGraph project have some conceptual and practical/computational advantages in comparison to the standard machine learning tools in this area (e.g., Bayesian mixture models, unsupervised hierarchical Bayesian clustering and K-means methods) in feature extraction for realistic gene expression sets. In a collaboration with external experts from experimental biology, as well as with the local experts in HPC (in the areas of cloud- and GPGPU-computing), graph-related data analysis and classification methods will be compared and evaluated wrt. their HPC-performance and potential in the rapidly developing area of bioinformatics.