Abstract
Most classifiers operate by selecting the maximum of an estimate of theconditional distribution p(y|x)p(y|x) where xx stands for the features of theinstance to be classified and yy denotes its label. This often results in ahubristic bias: overconfidence in the assignment of a definite label. Usually,the observations are concentrated on a small volume but the classifier providesdefinite predictions for the entire space. We propose constructing conformalprediction sets [vovk2005algorithmic] which contain a set of labels rather thana single label. These conformal prediction sets contain the true label withprobability 1−α1-lpha. Our construction is based on p(x|y)p(x|y) rather thanp(y|x)p(y|x) which results in a classifier that is very cautious: it outputs thenull set - meaning `I don't know' --- when the object does not resemble thetraining examples. An important property of our approach is that classes can beadded or removed without having to retrain the classifier. We demonstrate theperformance on the ImageNet ILSVRC dataset using high dimensional featuresobtained from state of the art convolutional neural networks.