Logistic regression and its implementation for email spam filtering

Main Article Content

K. Srikanth
S. Ramakrishna, K.V.S.Sarma


This paper deals with an experiment on spam filters using Logistic Regression in which the efficiency of the filter is influenced by characteristics of the frequency distribution of the tokens. The focus of discussion lies on the need for data cleaning before developing the model. Features that are inconsistent shall be separated out before including them in the model. The UCI dataset showing the percentage of token counts in each mail is used in the model and the discriminating ability of the filter is studied with the help of ROC curve.

Keywords: spam, Roc curve, Logistic, UCI data.


Download data is not yet available.

Article Details