Hadoop
About Hadoop Architecture
- Hadoop Consists:
- HDFS(Hadoop Distributed File System) [Parallel Processing]
- MR(Map Reduce) [Processing]
- Hive(Query Engine Introduced by Facebook) [Uses SQL]
- Pig(Introduced by Yahoo) [Uses PigLatin]
- Sqoop(Introduced by a group of people) [Uses Java]
- Oozie(Scheduler Introduced by Yahoo) [XML,Java]
- Flume(Messaging Queue) [Only Incoming Request]
- Mahoth(Data Science,AI,ML Component)
- HBASE(Hadoop Database Introduced by Facebook)
Key Points
- All the databases in Big Data are NOSQL(Modern).
- Any framework in Big Data is Loosely Coupled.
- Loosely Coupled: Removal of one component doesn't affect the technology(Hadoop) function. Ex: C Sharp
- Hadoop can be integrated with any other Big Data Technology.
Question & Answers
- Q. What is file System ?
- Ans. Used to read/write to and from Hard Disk
- Ex: NTFS(Windows), EXT(Linux), MACFS(MAC)
- A program in execution is called a process.
- Q. What is Block ?
- Ans. A large file is divided into small units called chunks
- Ex: NTFS 16K
- Q. Client and Server
- Ans. Client->Requests and Server->Responds
- Q. Types of File System
- Standalone file system---NTFS,EXT,MACFS
- Distributed file system---HDFS,S3
- Q. Types of Distributed File System
- Master and Slave(Hadoop, Spark) [One Master and N-Slaves]
- Peer to Peer(NOSQL-Cassandra) [Each and every node connected to each other]
- The background processes are called as Daemon Processes
- In Hadoop we have 5 Daemon Processes(JP1, JP2, JP3, JP4, JP5 (Java Program))
- Node is an Individual System or Virtual Machine and Cluster is a group of nodes together.
- API-Application Program Interface
- BLOCK SIZE in Hadoop
- 1-Version(1B=64MB)(Default)
- Latest Version(1B=128MB)(Default)
- Replication: Replicate or duplicate's of data
- In general the replication factor if Hadoop is 3
- If we load 1GB of data we total of 3GB for replication
- No duplicates in same node
- Failure of one node can prevent data corrupt
Comments
Post a Comment