CRAWLING OF JAPANESE REAL-ESTATE WEBSITES USING SCRAPY

Main Article Content

Bassam Farooq
Dr. Mohd. Shahid Husain
Mohammad Suaib

Abstract

Web crawler is a program in the softwarespace that enables the download of data from websites.This paper implements a python web crawler framework,Scrapy.The crawler framework implemented mainlyfocusses on major real-estate websites of Japan. Themotivation behindthe implementation of the Scrapyframework was the speed of website crawling supplied by the framework of Scrapy, data filters that can beappliedand also, the wide library support for pythonprogramming language.

 

 

Downloads

Download data is not yet available.

Article Details

Section
Articles

References

80legs. (n.d.). Retrieved February 13, 2018, from https://80legs.groovehq.com/knowledge_base/topics/how-80legs-crawls-urls-depth-first-vs-breadth-first-vs-greedy

Dallmeier V., Burger M., Orth T., Zeller A. (2013) WebMate: Generating Test Cases for Web 2.0. In: Winkler D., Biffl S., Bergsmann J. (eds) Software Quality. Increasing Value in Software and Systems Development. SWQD 2013. Lecture Notes in Business Information Processing, vol 133. Springer, Berlin, HeidelbergI. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,†in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.

Myers, D., & McGuffee, J. W. (2015). Choosing Scrapy. Journal of Computing Sciences in Colleges, 31(1), 83-89.

Hareendran, S., Parashar, A., & Khan, F. U. (2014). Automated specification extraction for consolidated product catalogue. 2014 IEEE Students Conference on Electrical, Electronics and Computer Science. doi:10.1109/sceecs.2014.6804527

Harrison, J. R., Roberts, D. L., & Hernandez-Castro, J. (2016). Assessing the extent and nature of wildlife trade on the dark web. Conservation Biology, 30(4), 900-904. doi:10.1111/cobi.12707

Umbrich, J., Harth, A., Hogan, A., & Decker, S. (2008). Four Heuristics to Guide Structured Content Crawling. 2008 Eighth International Conference on Web Engineering. doi:10.1109/icwe.2008.42

Sharma, S. (2017, October 16). Crawling the Web with Scrapy. Retrieved February 13, 2018, from http://opensourceforu.com/2017/10/crawling-web-scrapy/

Mishra, P. (2012). Focused Crawling Techniques. International Journal of Computers & Technology, 2(2). doi:ISSN: 2277–3061 (online)

Architecture overview. (n.d.). Retrieved February 13, 2018, from https://doc.scrapy.org/en/latest/topics/architecture.html

S. (2018, January 25). Scrapy/w3lib. Retrieved February 13, 2018, from https://github.com/scrapy/w3lib

Jasani, B. M. (2016). Analyzing search engine mechanism and developing a prototype for web crawling architectural model for effectiveness of search engine

(Master's thesis, Saurashtra University Rajkot, Gujarat, India).Rajkot: Saurashtra University. doi:http://hdl.handle.net/10603/103719S. (2018, January 25).

Wang, J., & Guo, Y. (2012). Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao. 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. doi:10.1109/cyberc.2012.17

XPath Tutorial. Retrieved February 07, 2018, from https://www.w3schools.com

Shi, Z., Shi, M., & Lin, W. (2016). The Implementation of Crawling News Page Based on Incremental Web Crawler. 2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering (ACIT-CSII-BCD). doi:10.1109/acit-csii-bcd.2016.073