Investigation of Information gain and Chi test feature selection methods in dimensionality reduction using Machine learning for drug discovery
DOI:
https://doi.org/10.47750/pnr.2022.13.S01.58Keywords:
Information gain, Chi test, dimensionality reduction, features, unrelevant data, drug discovery, pre-processing, SVM classifier, classification, chemical compounds.Abstract
Despite all the recent improvements made in the pharmaceutical industry, especially in the area of cancer research, there is still much room for development. The process researchers use to find new drugs haven’t really changed. It costs money to take a drug from its discovery to market availability. The Tufts Center conducted research that indicates it takes at least 13 years and costs about $2.6 billion to develop a new drug. To reduce drug discovery timeline, machine learning plays a major role. Machine learning typically uses feature selection as a preprocessing step. The performance of learning algorithms is frequently enhanced by the elimination of duplicate and unnecessary data. Comparison on Information gain and Chi test in drug discovery is presented in this paper, mainly concentrated on dimensionality reduction and used SVM classifier to categorize chemical compounds.