CSCl-GA.2436-001 fall24 lab4

Hello, if you have any need, please feel free to consult us, this is my wechat: wx91due

Lab 4
Introduction

In this lab,you will try out Hive and Trino on NYU's Dataproc cluster

Objectives

  • Familiarize yourself with SQL.
  • Practice basic Hive and Trino commands.
  • Learn how to use Hive and Trino to perform data analytics.

Preparation

You need to run thislabin NYU's Dataproc cluster.Make sure you haveaccess to it.

Reading

Pleaseread Chapter 17(pages 471-475,478-493,500-503,505-507,
510-515)of Hadoop:The Definitive Guide,4th Edition(NYU students canread it online for free).

Tasks

Create input data

First,create a directory on HDFS: hadoop fs -mkdir hiveInput
Then,paste the following data into a new fle named mallweather1.txt:
886781199099999,1958,051507884+68750+023550FM-12+038299999V0203381N0067122800
e04301199099999,1950,051512004+68750+023550FM-12+038299999v0203201N006712200e:
804381199099999,1958,05151804+68750+023550FM-12+038299999V0203281N0026122800
00430126509999,1949,032412804+62300+016750FM-12+048599999v0202701N0846122eee:
804381265099999,1949,03241804+62300+010750FM-12+048599999VO202781N0046122000

Next,put thedata on HDFS:

hadoop fs -put smal1Meather1.txt hiveInput

Connect to Hive

Connect tothe Hive shell:
beeline -u jdbc:hive2://localhost:10000
Now,you are inthe Hive shellFirst,set the execution engine to MapReduce:

e:jdbe:hive2://localhost:16eee/set hive.execution.engine=mr;

0:jdbc:hive2://localhost:10000/)set hive,fetch.task.conversion-mininal;

Then,select the database that has already been created for you-thedatabase name is yourNetID_nyu_edu.
0:jabe:hive2://localhost:1e0ee/)use yourNetID_nyu_edu;
You need to run the above commands each time you connect to Hive.

Create an external Hive table

You can use thefollowing command to showthe tables you have created:

0:jdbc:hive2://localhost:1eeee/show tables;
Next,create anexternal table with our weather data:

0:jdbe:hive2://localhost:10000/s create external table wi(datal string,yea

....  ........ >row format delimited fields terninated by
....  ........>location '/user/yourNetID_nyu_edu/hivelnput
Verify thatthis table has been created:
:jdbe:hive2://localhost:1eeee/)show tables;0:jdbc:hive2://localhost:10000/)describe fornatted wl;
You should see it's an external table on HDFS.
View your data using HiveQL queries
Trythe following queries:
0:jdbc:hive2://localhost:10000/)select*fron we:jdbe:hive2://localhost:16eee/select*from wd 1imit 2;0:jdbc:hive2://localhost:10000/)select year fron wl
Theabove queries should be veryfast.Next,try these queries:
0:jdbe:hive2://localhost:1e0ee/s select *fron wi where year>1949;0:jdbchive2://localhost:1eeee/>select*from wni where year >=1949;0:jdbe:hive2://lacalhost:100ee/>select distinct year from wd;
You should notice that these queries take much longerto complete,because a MapReduce job runsthis time.
Next,let's try what we did in Lab2:

发表评论

电子邮件地址不会被公开。 必填项已用*标注