Near Duplicate Matching scheme for E-mail Spam Detection using Spam Trees

Main Article Content

Ch.Vijaya Kumar
G. Santi

Abstract

One of the major problems that the users of Email in the internet are facing is spam mails or e-mail spam. In recent years there are so many schemes are developed to detect the spam emails. The basic idea is to have a similarity matching scheme for spam detection by maintaining a known spam database, formed by users feedback, to block the subsequent near-duplicate spam’s. We propose a novel e-mail abstraction scheme, which considers e-mail layout structure to represent e-mails using HTML content in email which effectively captures the near duplicate phenomenon of Spam mails. To detect near duplicates and duplicate spam mails faster, we propose a new approach SimHash.


Keywords: Spam mails, Emails, Near Duplicate, SimHash, Spam Trees

Downloads

Download data is not yet available.

Article Details

Section
Articles