Bassam Farooq, Dr. Mohd. Shahid Husain, Mohammad Suaib


Web crawler is a program in the softwarespace that enables the download of data from websites.This paper implements a python web crawler framework,Scrapy.The crawler framework implemented mainlyfocusses on major real-estate websites of Japan. Themotivation behindthe implementation of the Scrapyframework was the speed of website crawling supplied by the framework of Scrapy, data filters that can beappliedand also, the wide library support for pythonprogramming language.




Japan; Framework; Python; Real-estate; Scrapy; Web Crawler

Full Text:



80legs. (n.d.). Retrieved February 13, 2018, from

Dallmeier V., Burger M., Orth T., Zeller A. (2013) WebMate: Generating Test Cases for Web 2.0. In: Winkler D., Biffl S., Bergsmann J. (eds) Software Quality. Increasing Value in Software and Systems Development. SWQD 2013. Lecture Notes in Business Information Processing, vol 133. Springer, Berlin, HeidelbergI. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.

Myers, D., & McGuffee, J. W. (2015). Choosing Scrapy. Journal of Computing Sciences in Colleges, 31(1), 83-89.

Hareendran, S., Parashar, A., & Khan, F. U. (2014). Automated specification extraction for consolidated product catalogue. 2014 IEEE Students Conference on Electrical, Electronics and Computer Science. doi:10.1109/sceecs.2014.6804527

Harrison, J. R., Roberts, D. L., & Hernandez-Castro, J. (2016). Assessing the extent and nature of wildlife trade on the dark web. Conservation Biology, 30(4), 900-904. doi:10.1111/cobi.12707

Umbrich, J., Harth, A., Hogan, A., & Decker, S. (2008). Four Heuristics to Guide Structured Content Crawling. 2008 Eighth International Conference on Web Engineering. doi:10.1109/icwe.2008.42

Sharma, S. (2017, October 16). Crawling the Web with Scrapy. Retrieved February 13, 2018, from

Mishra, P. (2012). Focused Crawling Techniques. International Journal of Computers & Technology, 2(2). doi:ISSN: 2277–3061 (online)

Architecture overview. (n.d.). Retrieved February 13, 2018, from

S. (2018, January 25). Scrapy/w3lib. Retrieved February 13, 2018, from

Jasani, B. M. (2016). Analyzing search engine mechanism and developing a prototype for web crawling architectural model for effectiveness of search engine

(Master's thesis, Saurashtra University Rajkot, Gujarat, India).Rajkot: Saurashtra University. doi: (2018, January 25).

Wang, J., & Guo, Y. (2012). Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao. 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. doi:10.1109/cyberc.2012.17

XPath Tutorial. Retrieved February 07, 2018, from

Shi, Z., Shi, M., & Lin, W. (2016). The Implementation of Crawling News Page Based on Incremental Web Crawler. 2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering (ACIT-CSII-BCD). doi:10.1109/acit-csii-bcd.2016.073



  • There are currently no refbacks.

Copyright (c) 2018 International Journal of Advanced Research in Computer Science