Hive
About Hive
- Hive is introduced by Facebook.
- It's a data warehouse (data collected from various sources).
- It's functionality is based on SQL.
- It's basically a Query Engine it can also called as database.
- It's an open source framework.
- It's a vehicle than runs on engine(MR).
- It's replacing Java not Map Reduce.
- Basically we are using java in MR to replace the use of that we are using hive.
- The hive has metadata which stores information but not data.
- The hive metadata stores only in RDBMS(Oracle, MySQL) but not the data you insert.
- The data you insert is stored in HDFS.
- In the absence of RDBMS for metadata hive will create an embedded RDBMS called derby.
- The combinations in hive are:
- MYSQL + Hive=Remote Metastore
- Derby + Hive=Embedded Metastore
- The drawback of embedded metastore is data concurrency in clustered system as multiple nodes maybe present.
- By default all hive tables are stored in bin/hadoop fs -ls /user/hive/warehouse
- Without load command also we can move data to hive using put command.
- For load there is MapReduce job but for insert it is.
- HQL(Hive Query Language) which is used for distributed system which provides parallelism.
Few Hive Commands
- create table(sno int,name string) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile;
- load data local inpath '/user/test.txt' into table test1.table1;
- load data inpath '/user/test.txt' into table test1.table1;
- drop table table1;
- desc table1;
- desc extended table1;
Comments
Post a Comment