Ricerca di contatti, progetti,
corsi e pubblicazioni

State space Gaussian processes for big data analytics

Persone

 

Zaffalon M.

(Responsabile)

Benavoli A.

(Co-responsabile)

Schuerch M.

(Collaboratore)

Abstract

Big data analytics is the process of examining big amounts of data to uncover hidden patterns, unknown relations and other useful information that can be used to take better decisions. It will be the upcoming key tool for "internet of things", business intelligence, quantitative finance -just to mention a few. Big data severely constrain the type of algorithms we can use on them, because, for instance, even models that take time quadratic in the size of the data most probably will not work fast enough to process the data in the needed time. Much research has thus opted for using very simple algorithms, or subsampling of the data, or algorithms based on (deep) neural nets, which are universal approximators that are relatively viable with big data, even though they give no reliability guarantees and are not easy to design and train. Now imagine an alternative to all these approaches that is principled, sophisticated, naturally comes with measures of reliability, is much simpler to train than neural nets, and that automatically adapts the complexity of the model to the size of data. This alternative exists: it is Gaussian Processes (GPs), which is part of non-linear Bayesian non-parametrics. Its being non-linear means that it can capture very general trends in multidimensional data; non-parametric means that it makes very weak assumptions, so it leads to more reliable models; and being Bayesian means that there is solid theory behind it on which we can do maths to derive algorithms and prove their properties. GPs are an emerging tool in machine learning, and yet large data problems are mostly uncharted territory for them: in fact, they can only be applied to at most a few thousand training points n, due to the O(n^3) time and O(n^2) space required for learning. The ambitious goal of this project is very simple to state: we want to derive a principled, accurate, approximation of GPs that can exploit all the available data while taking only O(n) complexity both in time and space. Stated differently, to achieve a great modelling power in a time and space complexity that is the minimum possible when taking all the data into account. This will allow an entire new range of possible applications to be addressed by machine learning, namely, all those whose data are currently too big to be analysed. It has the potential to create groundbreaking new applications and tools, and to overcome the limitations of deep neural networks. To show this potential, we will apply it to MeteoSwiss' rainfall intensity forecast as well as Armasuisse's electrosmog spatio-temporal estimation.

Informazioni aggiuntive

Data d'inizio
01.02.2017
Data di fine
31.12.2021
Durata
59 Mesi
Enti finanziatori
SNSF
Stato
Concluso
Categoria
Swiss National Science Foundation / NRP - National Research Programmes