Scrutinizing Near Duplicate Document Detection Techniques

Farheen Naaz, Dr. Farheen Siddique


Identifying the duplicate file from the bale of files is not an easy task at all, investigators and examiners often deal with it, in the past when they used to start an investigation they used to put all their efforts to identify the duplicate files, to overcome this problem some tools exist in the market now, now they use Duplicate Files Detection tools to classify the concern files, the biggest advantage of these tools is they perform a given tasks very expeditiously like these tools easily and quickly identify the documents that are akin to other documents. Forensic tools that are in use today for catching similar or duplicate files enforced over the low-level bits of the file technique. It is in demand now on the web due to its array of services like detecting adjacent duplicates. As the Internet decreasing its cost day by day, many people and organizations are uploading their huge files and documents with full of information on the cloud. A big issue that came to light recently in information retrieval is identifying the duplicate files because of its dimensionality, then result come into high-cost and more time consumption.


Information retrieval, Near-duplicate, Similarity Matrix

Full Text:




  • There are currently no refbacks.

Copyright (c) 2017 International Journal of Advanced Research in Computer Science