Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due
Lab 4
Introduction
In this lab,you will try out Hive and Trino on NYU's Dataproc clusterObjectives
- Familiarize yourself with SQL.
- Practice basic Hive and Trino commands.
- Learn how to use Hive and Trino to perform data analytics.
Preparation
You need to run thislabin NYU's Dataproc cluster.Make sure you haveaccess to it.Reading
Pleaseread Chapter 17(pages 471-475,478-493,500-503,505-507,510-515)of Hadoop:The Definitive Guide,4th Edition(NYU students canread it online for free).
Tasks
Create input data
First,create a directory on HDFS: hadoop fs -mkdir hiveInputThen,paste the following data into a new fle named mallweather1.txt:
886781199099999,1958,051507884+68750+023550FM-12+038299999V0203381N0067122800
e04301199099999,1950,051512004+68750+023550FM-12+038299999v0203201N006712200e:
804381199099999,1958,05151804+68750+023550FM-12+038299999V0203281N0026122800
00430126509999,1949,032412804+62300+016750FM-12+048599999v0202701N0846122eee:
804381265099999,1949,03241804+62300+010750FM-12+048599999VO202781N0046122000
Next,put thedata on HDFS:
hadoop fs -put smal1Meather1.txt hiveInputConnect to Hive
Connect tothe Hive shell:beeline -u jdbc:hive2://localhost:10000
Now,you are inthe Hive shellFirst,set the execution engine to MapReduce:
e:jdbe:hive2://localhost:16eee/set hive.execution.engine=mr;
0:jdbc:hive2://localhost:10000/)set hive,fetch.task.conversion-mininal;
Then,select the database that has already been created for you-thedatabase name is yourNetID_nyu_edu.0:jabe:hive2://localhost:1e0ee/)use yourNetID_nyu_edu;
You need to run the above commands each time you connect to Hive.
Create an external Hive table
You can use thefollowing command to showthe tables you have created:
0:jdbc:hive2://localhost:1eeee/show tables;
Next,create anexternal table with our weather data:
0:jdbe:hive2://localhost:10000/s create external table wi(datal string,yea
.... ........ >row format delimited fields terninated by
.... ........>location '/user/yourNetID_nyu_edu/hivelnput
Verify thatthis table has been created:
:jdbe:hive2://localhost:1eeee/)show tables;0:jdbc:hive2://localhost:10000/)describe fornatted wl;
You should see it's an external table on HDFS.
View your data using HiveQL queries
Trythe following queries:
0:jdbc:hive2://localhost:10000/)select*fron we:jdbe:hive2://localhost:16eee/select*from wd 1imit 2;0:jdbc:hive2://localhost:10000/)select year fron wl
Theabove queries should be veryfast.Next,try these queries:
0:jdbe:hive2://localhost:1e0ee/s select *fron wi where year>1949;0:jdbchive2://localhost:1eeee/>select*from wni where year >=1949;0:jdbe:hive2://lacalhost:100ee/>select distinct year from wd;
You should notice that these queries take much longerto complete,because a MapReduce job runsthis time.
Next,let's try what we did in Lab2: