A Hyperplane-based Algorithm for Semi-supervised Dimension Reduction

Abstract

We consider the semi-supervised dimension reduction problem$:$ given a high dimensional dataset with a small number of labeled data and huge number of unlabeled data, the goal is to find the low-dimensional embedding that yields good classification results. Most of the previous algorithms for this task are linkage-based algorithms. They try to enforce the must-link and cannotlink constraints in dimension reduction, leading to a nearest neighbor classifier in low dimensional space. In this paper, we propose a new hyperplane-based semi-supervised dimension reduction method—the main objective is to learn the low-dimensional features that can both approximate the original data and form a good separating hyperplane. We formulate this as a non-convex optimization problem and propose an efficient algorithm to solve it. The algorithm can scale to problems with millions of features and can easily incorporate non-negative constraints in order to learn interpretable non-negative features. Experiments on real world datasets demonstrate that our hyperplane-based dimension reduction method outperforms state-of-art linkagebased methods when very few labels are available.

Publication
In IEEE International Conference on Data Mining, 2017

Bibtex

@inproceedings{FangCH17,
  author    = {Huang Fang and
               Minhao Cheng and
               Cho{-}Jui Hsieh},
  title     = {A Hyperplane-Based Algorithm for Semi-Supervised Dimension Reduction},
  booktitle = {{IEEE} International Conference on Data Mining},
  pages     = {101--110},
  year      = 2017
}
Avatar
Huang Fang
Researcher

My research interests include optimization, learning theory, algorithm design and data mining.

Related