Learning category distance metric for data clustering

Publication date: 6 September 2018 Source:Neurocomputing, Volume 306 Author(s): Baoguo Chen, Haitao Yin Unsupervised learning of adaptive distance metrics for categorical data is currently a challenge due to the difficulties in defining an inherently meaningful measure parameterizing the heterogeneity within matched or mismatched categorical symbols. In this paper, a new distance metric called category distance and a non-center-based algorithm are proposed for categorical data clustering. The new metric is formulated based on the category weights for each categorical attribute, no more depending on the common assumption that all categories on the same attribute are independent of each other. The problem of learning the category distance is therefore transformed into the new problem of learning a set of category weights, which can be jointly optimized with the clusters optimization. A case study on DNA sequences and experimental results on ten real-world data sets from different domains are given to demonstrate the performance of the proposed methods with comparisons to the existing distance measures for categorical data.
Source: Neurocomputing - Category: Neuroscience Source Type: research