We consider the semi-supervised dimension reduction problem$:$ given a high dimensional dataset with a small number of labeled data and huge number of unlabeled data, the goal is to find the low-dimensional embedding that yields good classification results. Most of the previous algorithms for this task are linkage-based algorithms. They try to enforce the must-link and cannotlink constraints in dimension reduction, leading to a nearest neighbor classifier in low dimensional space. In this paper, we propose a new hyperplane-based semi-supervised dimension reduction method—the main objective is to learn the low-dimensional features that can both approximate the original data and form a good separating hyperplane. We formulate this as a non-convex optimization problem and propose an efficient algorithm to solve it. The algorithm can scale to problems with millions of features and can easily incorporate non-negative constraints in order to learn interpretable non-negative features. Experiments on real world datasets demonstrate that our hyperplane-based dimension reduction method outperforms state-of-art linkagebased methods when very few labels are available.
@inproceedings{FangCH17,
author = {Huang Fang and
Minhao Cheng and
Cho{-}Jui Hsieh},
title = {A Hyperplane-Based Algorithm for Semi-Supervised Dimension Reduction},
booktitle = {{IEEE} International Conference on Data Mining},
pages = {101--110},
year = 2017
}