Crawling the website deeply: Deep Page crawling

Main Article Content

Pooja Tevatia
Vinit Kumar Gunjan, Dr Allam Appa Rao

Abstract

This paper presents a study of how deep web crawling can be much more efficient than that of a normal crawling. World Wide Web can be divided into two parts: Surface Web and Deep Web. The Surface Web refers to the part of the Web that can be crawled and indexed by general purpose search engines, while the Deep Web refers to the abundant information that is “hidden†behind the query interfaces and not directly accessible to the search engines. Hence, there is need to access the Hidden Web. The normal crawling can retrieve only Surface web pages ignoring the large amounts of high quality information ‘hidden’ behind search forms that need to be filled manually by the user. The retrieved Hidden web documents are thereof stored in a repository.

Keywords: Deep web crawlers, Search engine crawler, Crawler designing, Google crawler, User centric crawler

Downloads

Download data is not yet available.

Article Details

Section
Articles