Ricerca di contatti, progetti,
corsi e pubblicazioni

Multiresolution methods for unstructured data

Persone

 

Multerer M.

(Responsabile)

Abstract

In our daily lives, unstructured data is ubiquitous, while the amount of data isimmense and rapidly increasing. The processing of social network data, textdata, audio files, photos and videos, but also scientific data, likemeasurements and simulation data has become vital to our modern society.Multiresolution methods in general and wavelets in particular are wellestablished tools for nonlinear approximation, image analysis, signal processingand machine learning. They have successfully been employed to process the datasources mentioned above. With a few notable exceptions however, these methodsrely on an embedding of discrete data into a continuous, functional setting, forexample by means of regression or interpolation.The present project aims at the development and the numerical analysis of novel,fully discrete and data centered multiresolution methods for unstructured data.More specifically, we shall develop the corresponding analytical framework, thearithmetic, high-dimensional variants and adaptive strategies for such methods.As a highly relevant application, we shall consider physical models with randominput data, such as diffusion problems with uncertain permeability or mechanicalproblems with uncertain material parameters. Indeed, the numerical approximationof such problems amounts to a particular sampling strategy for the random input,entailing the solution of the corresponding realizations of the underlyingpartial differential equation. This results in high-dimensional unstructureddata sets, often comprising millions of high-dimensional data points.To facilitate making predictions from such data sets, we shall improve theperformance of existing methods for scattered data interpolation, foremostkernel interpolation and Gaussian process regression. To this end, we representkernel matrices in suitable multiresolution bases, which results in essentiallysparse representations. To speed up numerical algebra computations entailed bythe aforementioned learning tasks, we shall develop a sparse arithmetic forkernel matrices with rigorous error bounds. Furthermore, addressingintrinsically high-dimensional and correlated data or anisotropic kernels, weshall devise associated anisotropic multiresolution methods to efficientlyprocess them. Many active learning tasks, such as the computation of failureprobabilities in physical models with random inputs, require adaptive samplingstrategies to be computationally tractable. To efficiently solve such problems,we shall devise adaptive refinement strategies for the envisaged discretemultiresolution methods. Besides these algorithmic aspects and applications, weintend to make a significant contribution to the analytical framework ofmultiresolution methods for unstructured data. Specifically, we shall examine inhow far classical compression results from wavelet theory are applicable formultiresolution methods in the discrete setting. Despite, we plan to develop thetheory of these methods in the context of reproducing kernel Hilbert spaces,which naturally accommodate them, and to study their limit behavior for theinfinite data limit. In summary, adopting a data centered approach, we shallcontribute to the theory of multiresolution methods in general. Novel, efficientand mathematically sound computational tools for kernel based learning forhigh-dimensional and large data will be obtained. Numerically tractablediscretizations of nonlocal operators in high dimensions will be enabled. Theobtained methods will be complemented by new discrete and multiresolution basedadaptive strategies.The envisaged arithmetic will significantly speed up kernel learning tasks,while giving rigorous error bounds. Classical limitations of kernel methods tointrinsically low-dimensional problems will be alleviated by taking into accountanisotropy in kernels and data. This is highly relevant in the computation ofresponse surfaces in physical models with random inputs. The adaptivemultiresolution strategy which will be developed in this project provides a newmeans for the efficient computation of non-smooth quantities of interest. Moregenerally, the intended approach gives rise to new multiresolution basedacquisition strategies for adaptive sampling methods. Beyond the impact closelyrelated to the project, we expect a strong impact on fields that are related ina wider sense. The developed methods and fundamental ideas can easily beextended to further data sources. A major threat that we face in our digitaldata reliant world is data tampering, a prominent example of which aredeepfakes. Recently, multiresolution methods, more precisely Haar wavelets andwave-packets have been employed for deepfake discovery in images. Themethodology developed in this project is flexible enough to extend this approachto higher-dimensional data, like videos including soundtracks, but also moreabstract data like text files.

Informazioni aggiuntive

Data d'inizio
01.01.2023
Data di fine
31.12.2027
Durata
61 Mesi
Enti finanziatori
SNSF, Swiss National Science Foundation
Stato
In corso
Categoria
Swiss National Science Foundation / Transitional Measures / SNSF Starting Grant