Hadoop



About Hadoop Architecture

  • Hadoop Consists:
    • HDFS(Hadoop Distributed File System) [Parallel Processing]
    • MR(Map Reduce) [Processing]
    • Hive(Query Engine Introduced by Facebook) [Uses SQL]
    • Pig(Introduced by Yahoo) [Uses PigLatin]
    • Sqoop(Introduced by a group of people) [Uses Java]
    • Oozie(Scheduler Introduced by Yahoo) [XML,Java]
    • Flume(Messaging Queue) [Only Incoming Request]
    • Mahoth(Data Science,AI,ML Component)
    • HBASE(Hadoop Database Introduced by Facebook)

Key Points

  • All the databases in Big Data are NOSQL(Modern).
  • Any framework in Big Data is Loosely Coupled.
    • Loosely Coupled: Removal of one component doesn't affect the technology(Hadoop) function. Ex: C Sharp
  • Hadoop can be integrated with any other Big Data Technology.

Question & Answers

  • Q. What is file System ?
    • Ans. Used to read/write to and from Hard Disk
      • Ex: NTFS(Windows), EXT(Linux), MACFS(MAC)
    • A program in execution is called a process.
  • Q. What is Block ?
    • Ans. A large file is divided into small units called chunks
      • Ex: NTFS 16K
  • Q. Client and Server
    • Ans. Client->Requests and Server->Responds
  • Q. Types of File System
    • Standalone file system---NTFS,EXT,MACFS
    • Distributed file system---HDFS,S3
  • Q. Types of Distributed File System
    • Master and Slave(Hadoop, Spark) [One Master and N-Slaves]
    • Peer to Peer(NOSQL-Cassandra) [Each and every node connected to each other]
  • The background processes are called as Daemon Processes
    • In Hadoop we have 5 Daemon Processes(JP1, JP2, JP3, JP4, JP5 (Java Program))
  • Node is an Individual System or Virtual Machine and Cluster is a group of nodes together.
  • API-Application Program Interface
  • BLOCK SIZE in Hadoop
    • 1-Version(1B=64MB)(Default)
    • Latest Version(1B=128MB)(Default)
  • Replication: Replicate or duplicate's of data
    • In general the replication factor if Hadoop is 3
    • If we load 1GB of data we total of 3GB for replication
    • No duplicates in same node
    • Failure of one node can prevent data corrupt

HDFS+MR------------> HADOOP
HADOOP+OTHER--> HADOOP FRAMEWORK

Comments

Popular posts from this blog

MR(Map Reduce)

Spark Yarn Cluster

Hadoop Installation