Next: References Up: Similarity Metric Learning for Previous: Relevance to models

Conclusions and future directions

Nearest-neighbour methods have often shown poor generalization in comparison to other learning methods, and therefore have attracted little interest in the neural network community in spite of a number of attractive properties. This paper shows that with the choice of an appropriate kernel and optimization of the similarity metric, the generalization can be as good or better than the alternatives. In the data sets that have been tested, VSM learning achieves better generalization than the back-propagation algorithm and most forms of RBF networks. It also has a much reduced training time, and a large reduction in the number of parameters to be optimized. A particular advantage of the method is its ability to operate as a black box without the need for the user to assign critical parameter values.

One important area for further research is the ability to learn weights that vary between regions of the input space. Clearly, there are many problems for which the optimal feature weights vary for different regions of the input. On the other hand, there must be a fair quantity of training data to determine the feature weights with statistical reliability, so their optimization must also avoid being too local. One approach to this problem would be to partition the input space into regions using a data structure such as the k-d tree, and to perform the optimization separately in each region. The local parameters could be stabilized to also minimize their distance from the global values, which would reduce the problems of overlearning.

Another area of potential improvement would be to incorporate the learning of local linear models such as have been explored by Atkeson (1991) and Bottou & Vapnik (1992). These approaches fit a linear model to a set of neighbours around each input at classification time. At the cost of a large increase in run-time computation, the output can be based on a more accurate interpolation between inputs that accounts for their particular spatial distribution in the input space. This is likely to be particularly useful for continuous outputs.

Next: References Up: Similarity Metric Learning for Previous: Relevance to models

David Lowe
Wed Jul 16 17:08:22 PDT 1997