Laboratory 5 - Hive - NoSQL

docx

School

Conestoga College *

*We aren’t endorsed by this school

Course

1058

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

Uploaded by MateJellyfishMaster4491

Laboratory Exercise 5 - HIVE In this lab you will leverage from the explanations provided during the Hive Theory Class and experience the Map Reduce Process through HIVE. Exercise 1 – Run the Below to obtain the maximum price of stocks DONE IMPORTANT: yourfirstname is not the name of the table – You must use your first name for this exercise. 1. Create a Table in Hive DONE CREATE EXTERNAL TABLE IF NOT EXISTS yourfirstname ( exch STRING, symbol STRING, ymd STRING, price_open FLOAT, price_high FLOAT, price_low FLOAT, price_close FLOAT, volume INT, price_adj_close FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hirw/input/stocks'; 2. Verify if table was created DONE SHOW TABLES 3. Limit of 25 rows DONE SELECT * FROM yourfirstname LIMIT 25;  Capture one screenshot once the job runs – Screenshot 1 4. Create a Query that will display the maximum price of stock open, grouped by symbol DONE SELECT symbol, max(price_close) max_close FROM yourfirstname GROUP BY symbol;  Capture one screenshot once the job runs – Screenshot 2 5. Count the number of rows on the yourfirstname database? DONE SELECT COUNT(*) FROM yourfirstname  Capture one screenshot once the job runs – Screenshot 3

Laboratory Exercise 5 - HIVE Exercise 2 – Elaborate some query modifications DONE 1. Based on Step 4 on Exercise 1, modify the query to execute a job that obtains the minimum price at price open. DONE  Capture one screenshot once the job runs – Screenshot 4 2. Delete the table you created for both Exercise 1 and Exercise 2 DONE DROP Table yourfirstname Exercise 3 – Full Case Study Access the Hive terminal and follow the instructions for each case study below: Case Study: Sales Data Analysis In this lab, you will work with a dataset containing sales data for a company. The dataset includes information such as transaction ID, product name, quantity sold, and sales amount. Your goal is to perform basic data querying tasks using Apache Hive to gain insights from the sales data. Scenario: You have been provided with a dataset that contains sales data for a company. The dataset includes information about the transactions such as the transaction ID, product name, quantity sold, and sales amount. The company wants to analyze the sales data to identify the top-selling products, calculate the total sales amount, and find out the average quantity sold.  Important >> The database name should be unique, otherwise this is not going to work! Lab Steps: 1. Create a database: Start by creating a new database named salesdb to store the sales data. DONE Answer: CREATE DATABASE IF NOT EXISTS yourfirstname ; 2. Create a table: Create a table named sales with appropriate columns to hold the sales data. DONE Answer: CREATE TABLE IF NOT EXISTS sales ( transaction_id INT, product_name STRING, quantity_sold INT, sales_amount FLOAT );

Laboratory Exercise 5 - HIVE 3. Insert data: Insert the provided sales records into the table to populate it with data. DONE Answer: INSERT INTO sales VALUES (1, 'Product A', 10, 100.0), (2, 'Product B', 5, 75.0), (3, 'Product C', 8, 120.0), (4, 'Product A', 15, 200.0), (5, 'Product D', 3, 50.0), (6, 'Product C', 12, 180.0), (7, 'Product B', 6, 90.0), (8, 'Product A', 9, 150.0), (9, 'Product D', 4, 60.0), (10, 'Product B', 7, 105.0); 4. Basic data exploration: Perform basic queries to explore the sales data, such as selecting all records, selecting specific columns, and filtering records based on certain conditions. DONE Answers:  Select all records: SELECT * FROM sales;  Select specific columns (transaction_id and sales_amount): SELECT transaction_id, sales_amount FROM sales;  Filter records where quantity_sold is greater than 5: SELECT * FROM sales WHERE quantity_sold > 5;  Capture one screenshot once the job runs – Screenshot 5 5. Sorting data: Sort the sales data in ascending or descending order based on a specific column. DONE Answers:  Sort by sales_amount in descending order: SELECT * FROM sales ORDER BY sales_amount DESC;  Sort by quantity_sold in ascending order: SELECT * FROM sales ORDER BY quantity_sold ASC;  Capture one screenshot once the job runs – Screenshot 6 6. Aggregating data: Use aggregate functions like COUNT, SUM, MIN, and MAX to calculate various metrics on the sales data. DONE Answers:  Calculate the total sales amount:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Laboratory Exercise 5 - HIVE SELECT SUM(sales_amount) AS total_sales_amount FROM sales;  Calculate the average quantity sold: SELECT AVG(quantity_sold) AS average_quantity_sold FROM sales;  Capture one screenshot once the job runs – Screenshot 7 7. Truncate table: Delete all the data from the sales table while keeping the table structure intact. DONE Answer: TRUNCATE TABLE sales; 8. Dropping a table: Drop the sales table to clean up the database. DONE Answer: DROP TABLE sales; 9. Delete Database DONE Answer: DROP DATABASE database_name CASCADE; *** Important: You should retain the seven screenshots in a single word file. Make sure the following happens: 1. All 7 screenshots should be in a single word file; DONE 2. Each screenshot must have a short description; DONE 3. Screenshots must follow the execution order above; DONE 4. Make sure to only put one screenshot in each page alongside its description; DONE 5. Insert page numbers; DONE 6. All names of databases must follow the script provided above ( yourfirstname means you need to use your first name as the name); DONE 7. If you submit where the name does not correspond to your first name. the exercise will not be considered for grading.

Laboratory 5 - Hive - NoSQL

Related Documents