The University of Montana
Department of Mathematical Sciences

Technical report #21/2008

Exact bootstrap k-nearest neighbor learners

Brian M. Steele
Dept. of Mathematical Sciences
The University of Montana
Missoula MT 59812


Abstract

Bootstrap aggregation, or bagging, is a method of reducing the prediction error of a statistical learner. The goal of bagging is to construct a new learner which is the expectation of the original learner with respect to the empirical distribution function. In nearly all cases, the expectation cannot be computed analytically, and bootstrap sampling is used to produce an approximation. The k-nearest neighbor learners are exceptions to this generalization, and exact bagging of many k-nearest neighbor learners is straightforward. This article presents computationally simple and fast formulae for exact bagging of k-nearest neighbor learners and extends exact bagging methods from the conventional bootstrap sampling (sampling n observations with replacement from a set of n observations) to bootstrap sub-sampling schemes (with and without replacement). In addition, a partially exact k-nearest neighbor regression learner is developed. The article also compares the prediction error associated with elementary and exact bagging k-nearest neighbor learners, and several other ensemble methods using a suite of publicly available data sets.

Keywords: bagging, k-nearest neighbor, classication, regression, ensemble methods

AMS Subject Classification:

Download Technical Report: pdf (171 KB)