A new Stratified Immuned based approach for Clustering High Dimensional Categorical Data

Link to full Paper

With development in Database Technology, many existent real world applications contain outsized volumes of categorical data, which are playing an important role in data analysis and effective decision making. However, the clustering algorithms are deliberated for numerical data only, for the reason that of their similarity of measures. There is an enormous work carried on clustering categorical data with predefined similarity measure explicitly defined over categorical data. However, intricate problem with real world domain is that the feature in the data may depend on some hidden and transonic perspective, which is explicitly not in the given form of predictive features. So this poses a covenant with categorical data competently and proficiently. In this paper, a stratified immune based approach is proposed for clustering categorical data CAIS, is proposed with new similarity measure to minimize distance function. CAIS adopts an immunology based approach for effective discovery of clusters over categorical data. It selects frequently subsist nomadic feature as representative object and perform grouping into clusters with new affinity measure. CAIS is scaled to large number of attributes to minimize miscluster rate in the datasets. The extensive empirical analysis on CAIS shows that the proposed approach attains better mining efficiency on various categorical datasets and outperforms with Expectation Maximization (EM) in different settings.

Nifty tech tag lists fromĀ Wouter Beeftink