Hive

 


About Hive

  • Hive is introduced by Facebook.
  • It's a data warehouse (data collected from various sources).
  • It's functionality is based on SQL.
  • It's basically a Query Engine it can also called as database.
  • It's an open source framework.
  • It's a vehicle than runs on engine(MR).
  • It's replacing Java not Map Reduce.
  • Basically we are using java in MR to replace the use of that we are using hive.
  • The hive has metadata which stores information but not data.
  • The hive metadata stores only in RDBMS(Oracle, MySQL) but not the data you insert.
  • The data you insert is stored in HDFS.
  • In the absence of RDBMS for metadata hive will create an embedded RDBMS called derby.
  • The combinations in hive are:
    • MYSQL + Hive=Remote Metastore
    • Derby + Hive=Embedded Metastore
  • The drawback of embedded metastore is data concurrency in clustered system as multiple nodes maybe present.
  • By default all hive tables are stored in bin/hadoop fs -ls /user/hive/warehouse
  • Without load command also we can move data to hive using put command.
  • For load there is MapReduce job but for insert it is.
  • HQL(Hive Query Language) which is used for distributed system which provides parallelism.

Few Hive Commands

  • create table(sno int,name string) row format delimited fields terminated by ',' lines terminated by '\n' stored as textfile;
  • load data local inpath '/user/test.txt' into table test1.table1;
  • load data inpath '/user/test.txt' into table test1.table1;
  • drop table table1;
  • desc table1;
  • desc extended table1;

Comments

Popular posts from this blog

MR(Map Reduce)

Spark Yarn Cluster

Hadoop Installation