Ethan Millar

7 years ago · 2 min. reading time · ~100 ·

Blogging
>
Ethan blog
>
Explore Metadata In Kind Of Tables In Apache Hive With Hadoop Integration Experts

Explore Metadata In Kind Of Tables In Apache Hive With Hadoop Integration Experts

Hadoop integration professionals will make you learn how to explore metadata in kind of tables in Apache Hive via this post. You can read this post and find how hadoop professionals explore metadata in Hive.

Introduction:

Apache Hadoop is a data framework which can support to process the big data. Hive is data warehouse which build on top of Hadoop. Hive is very powerful in providing the query in big data. Because it creates the mapping metadata to real data in Hadoop distributed file system and can process the data in Map Reduce. Besides, Hive can change the execution engine to process with Spark or Tez in the latest version. Hive have feature which support to do a complex data type with UDFs and a variety of built-in functions. For UDFs in Hive, I will introduce in another blog.

Explore Metadata In Kind Of Tables In Apache Hive With Hadoop Integration ExpertsHADOOP Cluster
(HDFS + Map-Reduce)

 

Name Node Job Tracker
= =

3 &
=m


In Hive, it has a relational database on the master node (Name node) to keep storing all Hive statuses. For example, when we create a table with command "CREATE TABLE Student(id string) LOCATION 'hdfs://data/sample/';", this table schema is stored in the database as a metadata of Hive.

Assume that we have a partitioned table, the partitions information will be stored in the relational database on name node (so it allows Hive to use lists of partitions and find the data very easily). These things are called 'metadata'. Metadata contains information such as format table, mapping location, file of data etc. And it is stored in memory of name node.

When we drop an internal table (default table), it drops both the data and the metadata in memory from name node. However, when we drop an external table, it only drops the metadata and our data is still keep on the Hadoop distributed file system. That means hive is ignorant of that data now. It does not touch the data itself.

This is very important when working with Hive - Hadoop. In my experiences, I have seen a lot of engineers and developers have this mistake then lost entire the data from our datawarehouse. I hope that this blog will help us understand about metadata concept and kinds of table in Hive.

Environment

Java: JDK 1.7

Cloudera version: CDH5.4.7, please refer to this link: http://www.cloudera.com/downloads/cdh/5-4-7.html

Initial steps

1. We need to prepare some input data file, open the file with vi tool to create a local file:

vi file1

1;Jack

2;Ryan

3;Jean


2. We need to put the local files to Hadoop Distributed File System (HDFS), use this command:

hadoop fs -mkdir -p /data/mydata/sample

hadoop fs -put file1 /data/mydata/sample/


Code walk through and verify the result

This is Hive script which using Hadoop, Hive to create and drop external and default table


DROPTABLE IF EXISTSmydatabase.sample;

CREATE EXTERNAL TABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';

DROPTABLE IF EXISTSmydatabase.sample;

CREATETABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';


1. We need to check if the local is put to Hadoop distributed file system or not

hadoop fs -ls /data/mydata/sample/

It should be showed the file1 in the /data/mydata/sample


2. We will access to Hive and run this command:


DROPTABLE IF EXISTSmydatabase.sample;

CREATE EXTERNAL TABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';


3. We will use this command to check if the table is created or not

show create table mydatabase.sample

-> It should be showed the structure of sample table


4. We will drop the external table with this command

drop table mydatabase.sample


5. We will try again at step 3 and see that the table is not exist anymore


6. Now we will check the datafromhdfs to make sure Hive deleted only metadata or deleted both metadata and data.

hadoop fs -ls /data/mydata/sample/

-> You can see the data still there. Therefore, you can see that external table only delete metadata.


7. Now we will run this command to create default Hive table


DROPTABLE IF EXISTSmydatabase.sample;

CREATETABLEmydatabase.sample

(

accountId string,

name string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\;'

STORED AS TEXTFILE

LOCATION '/data/mydata/sample/';


8.  We will follow step 3, 4, 5, 6 to verify how Hive handles metadata and actual data in Hadoop distributed file system

hadoop fs -ls /data/mydata/sample/

-> You can see the data is gone. Therefore, you can see that internal table deletesboth metadata and actual data.


The following steps are the same for load data, indexing, create view in Hive tables (external and internal tables). Hope that you guys can understand how Hive works with kinds of table.


This article is intended by hadoop integration professionals to make people learn how to explore metadata in kind of tables in Apache Hive. You can share your thoughts regarding this post with other readers









"
Comments

Articles from Ethan Millar

View blog
2 years ago · 1 min. reading time

A few years ago, making an HTTP call from Dynamics CRM Services used to be very complex. The develop ...

2 years ago · 3 min. reading time

An economic cycle contains many ups and downs, which are constantly in effect. Though navigating dur ...

6 years ago · 0 min. reading time

While many Microsoft Dynamics AX technical users and developers are trying their hands on the latest ...

You may be interested in these jobs

  • Studio Aangan

    Junior & Intern Architect

    Found in: Appcast Linkedin IN C2 - 1 day ago


    Studio Aangan Indore, India

    About Studio Aangan: · Studio Aangan is a dynamic and innovative architectural design firm located in the heart of Indore. With a passion for creating exceptional spaces, we specialize in designing cutting-edge residential, commercial, and hospitality projects that redefine moder ...

  • Job Excel

    Designer

    Found in: Talent IN C2 - 1 day ago


    Job Excel Vadodara, India

    Electrical System Design: Develop comprehensive electrical designs, specifications, and plans for EPC projects, ensuring alignment with project objectives, client requirements, and industry standards · Project Coordination: Collaborate closely with multidisciplinary teams includi ...

  • SYSTRA

    Principal Design Engineer-Alignment

    Found in: Talent IN C2 - 1 day ago


    SYSTRA Ahmedabad, India Permanent

    The global leader in public transport infrastructure, SYSTRA has specialized in the mass transit and rail market since 1957. The main shareholders of SYSTRA are SNCF and RATP. Its ambition is to serve emerging needs for collective and sustainable mobility throughout the world. Wi ...