A Fast Algorithm for Unsupervised Feature Value Selection

Abstract

The difficulty of unsupervised feature selection results from the fact that many local solutions can exist simultaneously in the same dataset. No objective measure exists for judging the appropriateness of a particular local solution, because every local solution may reflect some meaningful but different interpretation of the dataset. On the other hand, known accurate feature selection algorithms perform slowly, which limits the number of local solutions that can be obtained using these algorithms. They have a small chance of producing a feature set that can explain the phenomenon being studied. This paper presents a new method for searching many local solutions using a significantly fast and accurate algorithm. In fact, our feature value selection algorithm (UFVS) requires only a few tens of milliseconds for datasets with thousands of features and instances, and includes a parameter that can change the local solutions to select. It changes the scale of the problem, allowing a user to try many different solutions and pick the best one. In experiments with labeled datasets, UFVS found feature value sets that explain the labels, and also, with different parameter values, it detected relationships between feature value sets that did not line up with the given labels.

Publication
Journal of Source Themes, 1(1)
Click the Cite button above to demo the feature to enable visitors to import publication metadata into their reference management software.
Click the Slides button above to demo Academic’s Markdown slides feature.

Supplementary notes can be added here, including code and math.

Related