Title: Clustering Categorical Data using Bayesian Concept
Abstract: In This Project Report of Thesis. Clustering is the process of grouping similar objects. Naïve Bayes Classifier is the classification technique which is widely used to predict the unknown class labels. Here in this paper we extend this concept to unsupervised classification, clustering. As in K-modes the proposed method starts the clustering process with the modes. Based on the prior information bayes theorem is used to place the object in the respective clusters. The feature of the proposed algorithm is scalability and it need only one data scan. The proposed Bayesian clustering to cluster categorical data is experimented with the real data sets obtained from the UCI machine learning data repository and compared with the well known K-modes algorithm to cluster the categorical data. Experimental results prove that the proposed method is efficient than K-modes.Keywords—clustering, categorical data, Bayesian theorem, mode.
For More About This Project Details Click On Download
Clustering is understood as a decomposition or partition of data set into groups in such a way that the objects in one group are similar to each other but as different as possible from the objects in other groups. Thus, the main goal of clustering is to detect whether or not the general population is heterogeneous, that is, whether the data fall in to distinct groups. Clustering in a historical perspective rooted in mathematics, statistics and numerical analysis. From machine learning perspective, cluster corresponds to hidden patterns . Grouping the object is carried out using some matching criteria. These criteria may be a simple Euclidean measure for numeric data. Geometric properties of objects are used in numeric clustering. These geometric properties can not be applied to categorical or nominal data. Categorical data is usually with small domains and which can not be ordered . Huang proposed the simple mismatching measure as, if the attribute values of two objects are unequal then the distance is assumed to be one else it is zero. The concept of similarity alone is not sufficient for categorical data. Due to the special properties of categorical data it seems more complicated than that of numerical data. Compared to continuous values, nominal values are with small domains.
You can also Subscribe to PROJECTSWORLDS by Email for more such projects and seminar.
Keywords: Bayesian Concept, Clustering, Computer Science, CSE Thesis, Final Year Thesis Projects, M.Sc Computer Science Thesis, M.Sc Thesis