Unveiling the Power of Big Data: A New Frontier in Information

 What is Big Data ?

  • The problem arises due to storage or processing of data is called big data.
  • The solutions for the big data are more than 10k but one famous solution is Hadoop.
  • The data has 5 problems such as Volume, Value(Purity), Visualization, Velocity, Variety.
    • The problem can be processing also.
    • It can be a technology for achieving speed also.
  • The Layers of Data Include:
    • Automation
    • Storage
    • Testing
    • Visualization
    • DS,ML,AI

History of Hadoop

  • In the year 2002 Google has introduced new file system called GFS(Google File System) and in the year 2004 it has introduced new system for processing data called GMR(Google Map Reduce).
  • So, Doug Cutting has donated an open source software to Apache called Hadoop(GFS+GMR) which is combination of Both systems introduced by Google.
  • The Hadoop consists of HDFS(Hadoop Distributed File System) and MR(Map Reduce).
  • Big Data Technology is used by various organizations in the Industry called Commercial Products such as:
    • Cloudera, Hartonworks
    • EMR-Amazon
    • DataProc-Google
    • HDInsight-Microsoft
    • BigInsight-IBM
    • DBSandBox-DataBricks
  • In general we will be using Apache Hadoop during learning phase but when working in real-time projects we should work with Commercial Products.

Comments

Popular posts from this blog

MR(Map Reduce)

Spark Yarn Cluster

Hadoop Installation