Learning under near-ignorance: models and methods
Persone
(Responsabile)
Abstract
Can a rational agent learn about a domain from data starting in a condition of ignorance? Addressing this question is important because it is often desirable to start analyzing a problem with very weak initial assumptions, thus trying to implement an objective-minded approach to learning and inference. This question has been most thoroughly investigated so far within the well-established Bayesian learning theory. In this case, ignorance is modeled by a so-called noninformative prior distribution (e.g., a flat one); data are summarized by a likelihood model. Posterior inferences (e.g., predictions, hypothesis testing, etc.) are obtained from the prior and the likelihood by Bayes' rule. The use of a single - albeit noninformative - prior as a model of ignorance has been criticized by several authors. A more compelling alternative, in our opinion, relies on using a set M of priors that generate lower (\lpr) and upper (\upr) expectations s.t. \lpr(g)=\inf_x g(x) and \upr(g)=\sup_x g(x) for suitable functions of interest g. This expresses very faithfully that one's expectations about g are totally vacuous, which is a clear state of ignorance. These models have been first suggested in [walley, 1991] with the name of "near-ignorance priors". These have been successfully applied to classification problems: by naturally leading to indeterminate (i.e., set-valued) predictions, when the information from data is not enough to draw stronger conclusions, they originate very credible and reliable models. However, in some cases near-ignorance originates also the fundamental problem [piatti et al., 2007] to make learning from data impossible: the set of posteriors may coincide again with the set of priors. This amounts to making data useless, which is clearly a serious issue. To tackle this problem, we have established some general principles that should govern a well-defined learning process based on near-ignorance, and have specialized them for the case of the one-parameter exponential family. Then we have proved that there is a unique set of priors that complies with these principles, and hence that guarantees both near-ignorance and learning to be attained. We have also shown that the obtained set of priors reduces to other already known models of prior near-ignorance and thus, indirectly, proven the uniqueness of these models. Moreover, we have derived new models of near-ignorance that were not available before. We regard these results as a basic stepping stone to properly answer the initial question. This seminal work should now be deepened and extended to fully tackle the question of learning under near-ignorance, and to make it closer to applications. We plan to do this according to the following tasks. - Extension to the multivariate exponential family: this extension is fundamental as the most important applications are based on it; this is for instance the case of regression analysis. - Linear regression: to employ the multivariate model to develop new robust linear regression algorithms. Linear regression is an important area of data analysis, where near-ignorance can provide a leading approach for reliable inference. - Metrics: to develop new performance metrics to compare a set-valued estimate/prediction with a single-valued one. In fact, while standard estimators yield determinate (or single-valued) outcomes, those based on near-ignorance lead to set-valued ones. The problem of how to fairly compare through a metric a set-valued estimate with a single-valued estimate is thus fundamental to judge the quality of the models in practice. - Extension to the non-parametric case: this extension is important as non-parametric models are spreading in the context of artificial intelligence, because of the increasing computational power. Modeling near-ignorance with a non-parametric model can allow to build very general models for statistical inference. - Understanding the ultimate relationship between near-ignorance and learning: we plan to go deeper into these questions, exploiting important results derived in the context of Bayesian inference for the consistency and convergence of posterior distributions.