Stacked Feature Selection and C5.0 Classification Model with Tsallis Entropy for Medical Dataset
Keywords:Feature Selection, Entropy, C5.0, Classification.
Feature selection is considered to be one of the important tasks in data mining. It identifies the subset of features from the dataset that are best related to the response variable. Here, the five different medical dataset such as Pima diabetes, Liver, Hepatitis, Chronic kidney disease and Breast cancer are considered. The optimal subset of features for the various dataset is obtained by the intersection of top ‘n’ features returned by feature selection algorithms such as CMIM, JMI, mRMR, CFS, Boruta and SVM-RFE. Then, the method of C5.0 algorithm with Tsallis entropy and Association function is tested with these top n features selected. Accuracy value obtained for the proposed method is of 61% for Pima diabetes dataset, 85% for Liver dataset, 95% for Hepatitis dataset, 99.5% for Chronic Kidney disease dataset and 97% for Breast cancer dataset. The performance measures show that the proposed method works better than that of SVM, Naïve Bayes, KNN and Random forest.