Indexing Size Approximation of WWW Repository with Leading Information Retrieval and Web Filtering Robots
Main Article Content
Abstract
The biggest information system of World Wide Web indexing is critical to estimate. Web is the beneficial and growing scientific
utility like digital library to explore electronic literature to its lovers. Indexing estimation of WWW information is an open problem since 1998.
Yahoo has claimed 19 billion web documents as its indexed size on which Google is not satisfied because in accordance with last published
study by Gulli and Signorini the total “indexed web size†was around 11.5 billion pages. Web is growing hastily; what is the current size of
web? Which search engine possesses large indexing of authentic information (PDF files)? Which search engine provides large indexing of all
types of Web pages? This article provides the answers of all above questions. We estimated the index size of leading search engines (Google,
Yahoo and MSN) under easy and cost effective approach because if easy way persists then why we select tough heuristics. Our technique relies
on querying over the search engines with selected common affixes that can be a part of each and every document or web page. This paper
concludes the total size of current “indexed web contents†and provides comparative analysis to support the scholars; which search engine has
more authentic information and large indexing size.
Â
Â
Â
Keywords: Index Size of Search Engines, Total Web Size, Comparison of Google, Yahoo and MSN, Web Crawlers, Web Robots
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.