Big Data: A Survey Paper on Big Data Innovation and its Technology

Tasleem Nizam, Syed Imtiyaz Hassan


Any kind of datasets which are so large and complex which becomes difficult to process them using traditional data processing applications is considered as Big Data. While handling huge dataset different challenges may be faced by the user.One can get additional large data from analysis of single large set of related data as compared to separate smaller dataset with the same amount of data. For example correlations to be found to "prevent diseases, spot business trends, combat crime and so on." It is difficult to work with Big Data using traditional database management systems and visualization packages and desktop statistics requiring instead "massively parallel software running on hundreds, or even thousands of servers". Data sets with sizes beyond the capability of usually used software tools to capture, manage, and process data within a tolerable elapsed time, are included in Big Data. Big Data "size" is a constantly increasing, as of its ranging from a few dozen terabytes to many petabytes of data. So, Big Data is a collection of techniques and technologies that need new forms of integration to uncover large hidden values from large datasets that are complex, diverse and of a massive scale. Big Data environment is used to arrange and analyse the various types of data. Big Data is data which is so large in volume, so various in variety or moving with high velocity is called Big Data. Acquiring and analysing Big Data is a challenging task as it includes large distributed file systems which should be flexible, fault tolerant and scalable. Different technologies used by big data application to handle the huge amount of data are Hadoop, Map Reduce, etc. In this paper, firstly the definition of big data is presented.. Following that, the architecture of different technologies which are used for handling Big Data is defined. Finally, applications of Big Data system is represented.


Big data, Hadoop, MapReduce, HDFS, YARN

