Decision Trees For Training Data Sets Containing Numerical Attributes With Measurement Errors

Main Article Content

C. Sudarsana Reddy
Dr. V. VASU, B. Kumara Swamy Achari

Abstract

Classification is one of the most important techniques in data analysis. Decision tree is the most commonly used data classification
technique. Training data sets are not error free due to measurement errors in data collection process. In general, values of numerical attributes in
training data sets are always inherently associated with errors.
Measurement errors in training data sets can be properly handled by assuming an appropriate error correction model such as Gaussian error
distribution. Data errors are corrected by fitting appropriate error correction model to the training data set. Different types of errors in the
training data sets are not considered during the construction of existing decision tree classifiers. Hence, classification results of existing decision
tree classifiers are less accurate or inaccurate in many cases because of different types of data errors present in the training data sets.
It is proposed to employ existing decision tree classifier construction algorithm using error corrected numerical attributes of the training data sets
to construct new effective decision tree classifier. Errors in numerical attributes of the training data sets are corrected by using truncated
Gaussian distribution. This new decision tree classifier construction algorithm is called error corrected decision tree classifier construction
algorithm. It proves to be more effective regarding classification accuracy when compared with the existing decision tree classifier construction
algorithm.
Computational complexity of error corrected decision tree classifier construction algorithm is approximately same as that of existing decision
tree classifier construction algorithm but the classification accuracy of error corrected decision tree classifier construction algorithm is much
more than the existing decision tree classifier construction algorithm.

Keywords: decision tree, error corrected values of the numerical attributes of the training data sets, training data sets containing numerical
attributes, measurement errors in training data sets, types of errors in the training data sets, training data sets, classification, data mining,
machine learning

Downloads

Download data is not yet available.

Article Details

Section
Articles