Ricerca di contatti, progetti,
corsi e pubblicazioni

Ignorance, caution and the naive credal classifier

Persone

 

Zaffalon M.

(Responsabile)

Abstract

Classification is a technique for knowledge discovery in data sets: classifiers learn from data the relationship that holds between a set of attributes of a given object, and the class the object belongs to; later, they use such a knowledge to predict the unknown class of new objects. The growing importance of the applications of classifiers makes the question of reliability of classifiers a sensible matter. To this extent, in the years we have developed a new classifer called “naive credal classifier” (NCC). NCC has the ability to suspend the judgment when it recognizes that the information in the data is not strong enough to issue a determinate classification (i.e., a single output class). In this way, reliability is maintained even in the case of small data sets or of data sets affected by missing values. The credibility of NCC follows by carefully modeling ignorance: both ignorance about the domain and ignorance about the process responsible for the missing data. Yet, the question of ignorance is subtle. A drawback is that assuming ignorance implies giving much credit a priori to all the features of an object as being useful to predict the class; and this makes NCC spend time to recognize those that are instead irrelevant. During this time, NCC is more cautious than necessary. To address this point we propose extending NCC to model averaging. Model averaging means to consider the ensemble of NCCs that arise by enumerating all the subsets of features; the classification is then issued by averaging the predictions of all the classifiers in the ensemble. Model averaging will enable NCC to quickly recognize the irrelevant features, eventually discarding them from consideration. However, also model averaging is subject to the question of prior ignorance because it is started without knowing anything about the relative credibility of the NCCs in the ensemble. Some preliminary research indicates that extending model averaging to an imprecise-probability model of prior ignorance makes learning from data impossible. Whence, we propose to develop and justify a model of so-called “quasi-ignorance” that can be effectively implemented in model averaging. Moreover, we plan to investigate the application of quasi-ignorance also as a replacement for NCC's original model of prior ignorance. Finally, we propose to develop a framework to address the general question of modeling ignorance, not only limited to NCC. This is a known problem at the foundations of statistical learning, which we plan to address formally by the theory of coherent lower previsions.

Informazioni aggiuntive

Data d'inizio
01.01.2009
Data di fine
31.12.2011
Durata
36 Mesi
Enti finanziatori
SNSF
Stato
Concluso
Categoria
Swiss National Science Foundation / Project Funding / Mathematics, Natural and Engineering Sciences (Division II)