Laboratory 5 - Hive - NoSQL
docx
keyboard_arrow_up
School
Conestoga College *
*We aren’t endorsed by this school
Course
1058
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
4
Uploaded by MateJellyfishMaster4491
Laboratory Exercise 5 - HIVE
In this lab you will leverage from the explanations provided during the Hive Theory Class and experience
the Map Reduce Process through HIVE.
Exercise 1 – Run the Below to obtain the maximum price of stocks
DONE
IMPORTANT: yourfirstname is not the name of the table – You must use your first name for
this exercise.
1.
Create a Table in Hive
DONE
CREATE EXTERNAL TABLE IF NOT EXISTS
yourfirstname
(
exch STRING,
symbol STRING,
ymd STRING,
price_open FLOAT,
price_high FLOAT,
price_low FLOAT,
price_close FLOAT,
volume INT,
price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/user/hirw/input/stocks';
2.
Verify if table was created
DONE
SHOW TABLES
3.
Limit of 25 rows
DONE
SELECT * FROM
yourfirstname
LIMIT 25;
Capture one screenshot once the job runs – Screenshot 1
4.
Create a Query that will display the maximum price of stock open, grouped by
symbol
DONE
SELECT symbol, max(price_close) max_close FROM
yourfirstname
GROUP BY symbol;
Capture one screenshot once the job runs – Screenshot 2
5.
Count the number of rows on the
yourfirstname
database?
DONE
SELECT COUNT(*) FROM
yourfirstname
Capture one screenshot once the job runs – Screenshot 3
Laboratory Exercise 5 - HIVE
Exercise 2 – Elaborate some query modifications
DONE
1.
Based on Step 4 on Exercise 1, modify the query to execute a job that obtains the
minimum price at price open.
DONE
Capture one screenshot once the job runs – Screenshot 4
2.
Delete the table you created for both Exercise 1 and Exercise 2
DONE
DROP Table yourfirstname
Exercise 3 – Full Case Study
Access the Hive terminal and follow the instructions for each case study below:
Case Study: Sales Data Analysis
In this lab, you will work with a dataset containing sales data for a company. The dataset
includes information such as transaction ID, product name, quantity sold, and sales amount.
Your goal is to perform basic data querying tasks using Apache Hive to gain insights from the
sales data.
Scenario:
You have been provided with a dataset that contains sales data for a company. The dataset
includes information about the transactions such as the transaction ID, product name, quantity
sold, and sales amount. The company wants to analyze the sales data to identify the top-selling
products, calculate the total sales amount, and find out the average quantity sold.
Important >> The database name should be unique, otherwise this is not going to
work!
Lab Steps:
1.
Create a database: Start by creating a new database named salesdb to store the sales
data.
DONE
Answer:
CREATE DATABASE IF NOT EXISTS
yourfirstname
;
2.
Create a table: Create a table named sales with appropriate columns to hold the sales
data.
DONE
Answer:
CREATE TABLE IF NOT EXISTS sales ( transaction_id INT, product_name STRING, quantity_sold
INT, sales_amount FLOAT );
Laboratory Exercise 5 - HIVE
3.
Insert data: Insert the provided sales records into the table to populate it with data.
DONE
Answer:
INSERT INTO sales VALUES (1, 'Product A', 10, 100.0), (2, 'Product B', 5, 75.0), (3, 'Product C', 8,
120.0), (4, 'Product A', 15, 200.0), (5, 'Product D', 3, 50.0), (6, 'Product C', 12, 180.0), (7, 'Product
B', 6, 90.0), (8, 'Product A', 9, 150.0), (9, 'Product D', 4, 60.0), (10, 'Product B', 7, 105.0);
4.
Basic data exploration: Perform basic queries to explore the sales data, such as
selecting all records, selecting specific columns, and filtering records based on certain
conditions.
DONE
Answers:
Select all records:
SELECT * FROM sales;
Select specific columns (transaction_id and sales_amount):
SELECT transaction_id, sales_amount FROM sales;
Filter records where quantity_sold is greater than 5:
SELECT * FROM sales WHERE quantity_sold > 5;
Capture one screenshot once the job runs – Screenshot 5
5.
Sorting data: Sort the sales data in ascending or descending order based on a specific
column.
DONE
Answers:
Sort by sales_amount in descending order:
SELECT * FROM sales ORDER BY sales_amount DESC;
Sort by quantity_sold in ascending order:
SELECT * FROM sales ORDER BY quantity_sold ASC;
Capture one screenshot once the job runs – Screenshot 6
6.
Aggregating data: Use aggregate functions like COUNT, SUM, MIN, and MAX to
calculate various metrics on the sales data.
DONE
Answers:
Calculate the total sales amount:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Laboratory Exercise 5 - HIVE
SELECT SUM(sales_amount) AS total_sales_amount FROM sales;
Calculate the average quantity sold:
SELECT AVG(quantity_sold) AS average_quantity_sold FROM sales;
Capture one screenshot once the job runs – Screenshot 7
7.
Truncate table: Delete all the data from the sales table while keeping the table
structure intact.
DONE
Answer:
TRUNCATE TABLE sales;
8.
Dropping a table: Drop the sales table to clean up the database.
DONE
Answer:
DROP TABLE sales;
9.
Delete Database
DONE
Answer:
DROP DATABASE database_name CASCADE;
***
Important: You should retain the seven screenshots in a single word file. Make sure the following
happens:
1.
All 7 screenshots should be in a single word file;
DONE
2.
Each screenshot must have a short description;
DONE
3.
Screenshots must follow the execution order above;
DONE
4.
Make sure to only put one screenshot in each page alongside its description;
DONE
5.
Insert page numbers;
DONE
6.
All names of databases must follow the script provided above (
yourfirstname
means you
need to use your first name as the name);
DONE
7.
If you submit where the name does not correspond to your first name. the exercise will
not be considered for grading.