LIVER CANCER PREDICTION FOR TYPE-II DIABETES USING CLASSIFICATION ALGORITHM

S. Agilan, J. Kumaran Kumar

Abstract


In recent years, type II diabetes with liver cancer became a serious disease that threatens the health and mind of human. Efficient predictive modelling is required for medical researchers and practitioners. To develop a prediction model using data mining technique for type II diabetes patients with liver cancer within 6 years of diagnosis. Data has been collectedfrom the NHIRD (National Health Insurance Research Database). That selected patients who were newly diagnosed with type II diabetes. In this data 2060 cases were founded and assigned them to a case group (diagnose patients with liver cancer) and control group (diagnosed patients without liver cancer). In This proposal a liver cancer prediction for type II diabetes predictive model based on random forest which aims at analysing some readily available indicator (age, liver diseases, Alcoholic fatty liver diseases, hyperlipidaemia, etc.)using this the risk factor were identified, then chi-square test was conducted on each independent variable to make a differentiate between patients with liver cancer and patients without liver cancer. The dataset were randomly divided into two groups (training group and testing group). The training group contain of 70% of dataset (1442 cases) where the prediction model was done using training dataset. The remaining 30% of dataset is assigned to the test group for model validation. Random forest algorithm uses multiple decision trees to train the samples, and integrates weight of each tree to get the final results. The validation result shows that the random forest algorithm can greatly reduce the problem of modelling error of the single decision tree, and it can effectively predict the impact of these readily available indicators on the risk liver cancer for diabetes patients. Additionally, to get better prediction accuracy in random forest model than using the Artificial Neural Network (ANN), AdaBoost and Logistic Regression algorithm.

Keywords


Artificial Neural Network (ANN), AdaBoost, LogisticRegression, Random forest algorithm.

Full Text:

PDF


DOI: https://doi.org/10.26483/ijarcs.v9i2.5856

Refbacks

  • There are currently no refbacks.




Copyright (c) 2018 International Journal of Advanced Research in Computer Science