Next: Previous research Up: Similarity Metric Learning for Previous: Similarity Metric Learning for

Introduction

Classification methods based on nearest-neighbour interpolation have attracted growing interest in the neural network community. In part, this is because they support rapid incremental learning from new instances without degradation in performance on previous training data. Since the interpolation function is determined from a set of nearest neighbours at run time, it is easy to incrementally incorporate new training data and, if desired, to discount old data in a controlled manner (Atkeson, 1989; Omohundro, 1992). These capabilities are missing from the most popular neural network learning methods, yet they are necessary for models of biological learning and for on-line learning applications.

However, classical nearest-neighbour methods often exhibit poor generalization performance as compared to recent neural network learning methods. It has not been clearly recognized in the classical nearest-neighbours literature that the performance of these methods is highly dependent on the similarity (distance) metric that is used to select neighbours. In this paper, we combine a variable interpolation kernel with cross-validation optimization of the similarity metric and kernel size. The resulting system is called VSM (variable-kernel similarity metric) learning. It has much better generalization than the classical nearest-neighbour approach, and it performs as well or better than current neural network techniques on the comparison data sets to which it has been applied.

A particular advantage of this approach is that it solves for orders of magnitude fewer parameters than back propagation or radial basis function (RBF) methods, which greatly reduces the problem of overlearning. There is no need for the user to select model minimization parameters, such as the number of hidden units or basis function centers (in other learning methods these are usually determined by performing extensive cross-validation testing with each set of possible parameter values). VSM learning can be run as a black box without setting problem-specific parameters, which is a necessary requirement for biological models and for many on-line learning applications.

In addition to the problem of poor generalization, nearest-neighbour methods have been criticized for slow run-time performance and for increased memory requirements. The well-known k-d tree algorithm (Friedman, Bentley & Finkel, 1977; Sproull, 1991) can be used to identify the nearest neighbours, but its computation time is known to become large for random points in high dimensional spaces (in these cases, it must sometimes measure the distance to a large proportion of the inputs to find the single nearest neighbour). However, following similarity metric optimization, there is an effective dimensionality reduction as some dimensions are assigned higher weightings than others. This greatly speeds the k-d tree algorithm. In conjunction with the fact that VSM learning only looks at a small number (about 10) nearest neighbours, the run-time performance on many problems is actually better than competing methods such as back-propagation. If the k-d tree algorithm is still not efficient enough in certain cases, then an approximation to the nearest-neighbour can be used that looks at only a limited number of leaves of the k-d tree. The problem of increased memory requirements has been partly addressed by an editing procedure that removes unnecessary training data from regions where there is little uncertainty.

Next: Previous research Up: Similarity Metric Learning for Previous: Similarity Metric Learning for

David Lowe
Wed Jul 16 17:08:22 PDT 1997