Skip to main content

Documents Computer Science

BUAN+6346+Assignment+2.pdf

BUAN+6346+Assignment+2

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

101

Subject

Computer Science

Date

Nov 24, 2024

Type

pdf

Pages

25

Uploaded by CoachSciencePigeon1540

Dr. JERRY F. PEREZ Shalini Vijayaraghavan / SXV220089 Assignment 2 – Nov. 6 th , 2023

MIS 6346 BIG DATA 1 Table of Contents Lab Chapter 6: Create and Populate Tables in Impala or Hive .............................................................. 2 Create and Query a Table in Impala or Hive ........................................................................................... 2 Use Sqoop to Import Directly into Hive or Impala .................................................................................. 6 Lab Chapter 7: Select a Format for a Data File ..................................................................................... 9 Lab Chapter 8: Partition Data in Impala or Hive ................................................................................. 15 Lab Chapter 9: Collect Web Server Logs with Flume .......................................................................... 18 Create an HDFS Directory for Flume to save ingested data .................................................................. 18 Create a local directory for web server log output ............................................................................... 19 Configure Flume ................................................................................................................................... 21 Run the Agent ....................................................................................................................................... 21 Simulate Apache web server output ..................................................................................................... 23

MIS 6346 BIG DATA 2 Create and Query a Table in Impala or Hive 1. I decided to use Impala, so I started the server, by typing the following commands in the terminal window: 2. The HUE page was then opened in Chrome, and I clicked on HUE icon. Then, I went to the HUE homepage. The login credentials were: • Login ID: training • Password: training $ sudo service zookeeper-server start $ sudo service hive-server2 start

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 3 3. Then I opened the query editor menu and selected “Hive” from the pull-down menu and that prompted me to go to the hive query editor. 4. In the query editor pane, I entered a SQL command to create a table for the webpage data imported in the previous exercise: Query: CREATE EXTERNAL TABLE webpage (page_id SMALLINT, name STRING, assoc_files STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/loudacre/webpage'

MIS 6346 BIG DATA 4 5. The “Execute” button was pressed to run the query. To see the table that was just created, I pressed on the ‘refresh’ button next to ‘Database’ on the left-hand side of the page. I could find the webpage table with the columns: (page_id), name and (assoc_files).

MIS 6346 BIG DATA 5 6. The ‘New Query’ button was pressed to execute a test query as follows and I could see the findings of the query by clicking on the “Results” tab of the panel. 7. Then I previewed the sample data by clicking on the preview sample data icon near the table name.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 6 Use Sqoop to Import Directly into Hive and Impala 8. Now I went to the terminal and imported the device table directly into the hive metascore. The command I entered to do this was: 9. Using hive and HDFS command line I went to the specific data location : {/user/hive/warehouse/device}. This is to view the imported data files. This is also the default hive warehouse where the data got duplicated. sqoop import \ --connect jdbc:mysql://localhost/loudacre \ --username training --password training \ --fields-terminated-by '\t' \ --table device \ --hive-import

MIS 6346 BIG DATA 7 10. I opened part-m-00000 file to view the contents of the file which got copied into the hive warehouse. 11. Because, I am using Impala, I refreshed the Impala metadata cache by entering the command in the Hue Impala Query Editor: INVALIDATE METADATA

MIS 6346 BIG DATA 8 12. To see all the columns of the device table, I had to run a test query: SELECT * FROM device LIMIT 10;

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 9 Lab Chapter 7: Select a Format for a Data File 1. First, I changed the directory to the lab 7 exercise directory using the below command in the terminal window: 2. Then I imported the “accounts” table to an avro data format using the below query: cd $DEV1/exercises/data-format/ $ sqoop import \ --connect jdbc:mysql://localhost/loudacre \ --username training --password training \ --table accounts \ --target-dir /loudacre/accounts_avro \ --null-non-string '\\N' \ --as-avrodatafile

MIS 6346 BIG DATA 10 3. Then I went to the file browser and viewed the imported files by Sqoop into HDFS. 4. Then I viewed the contents of the file part-m-00000.avro

MIS 6346 BIG DATA 11 5. I could see that the file called sqoop_import_accounts.avsc in the current directory. 6. Then I viewed the contents of the sqoop_import_accounts.avsc file. Below is the screenshot of the file view:

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 12 7. Then I navigated to the hue browser and into the loudacre directory and clicked on the upload button. A dialog box with an option to select files appeared. 8. I chose the file sqoop_import_accounts.avsc to upload the file.

MIS 6346 BIG DATA 13 9. Now I can see that the file has been uploaded to the loudacre directory in HDFS. 10. In impala the I created the table accounts_avro using this query: CREATE EXTERNAL TABLE accounts_avro STORED AS AVRO LOCATION '/loudacre/accounts_avro' TBLPROPERTIES ('avro.schema.url'= 'hdfs:/loudacre/sqoop_import_accounts.avsc ');

MIS 6346 BIG DATA 14 11. Then I ran a test query to check if the table was created or not. The query I ran was: 12. Then optionally I used DESCRIBE FORMATTED command to list the columns and data types of the accounts_avro table created from the Avro schema. SELECT * FROM accounts_avro LIMIT 10

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 15 Lab Chapter 8: Partition Data in Impala or Hive 1. First, I created a new empty table in Hive by using the CREATE EXTERNAL TABLE statement: CREATE EXTERNAL TABLE accounts_by_areacode ( acct_num INT, first_name STRING, last_name STRING, phone_number STRING) PARTITIONED BY (areacode STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/loudacre/accounts_by_areacode';

MIS 6346 BIG DATA 16 2. To populate the new table, I will extract the area code from the phone number. I executed the following query to do this. 3. Then I used the SELECT statement above in an INSERT INTO TABLE command to copy the specified columns to the new table, dynamically partitioning by area code. Find the screenshot below:

MIS 6346 BIG DATA 17 4. Then I ran a test query to check if the table was populated. The query I ran was: We can see in the below screenshot that the table was populated properly. 5. Using Hue, I confirmed that the index structure of the accounts_by_areacode table includes partition directories using the query: SELECT * FROM accounts_by_areacode LIMIT 10; SHOW PARTITIONS accounts_by_areacode;

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 18 Lab Chapter 9: Collect Web Server Logs with Flume Create an HDFS Directory for Flume to save ingested data. 1. I Created a directory in HDFS called /loudacre/weblogs to store the data files that the Flume ingests, using the below command in terminal: $ hdfs dfs -mkdir /loudacre/weblogs

MIS 6346 BIG DATA 19 2. Now in the file browser in Hue, I can see the new directory in /loudacre named “weblogs” Create a local directory for web server log output 3. Now I created the spool directory into which our web log simulator will store data files for Flume to ingest. On the local filesystem we create /flume/weblogs_spooldir 4. To view the created directory, navigate to /flume/weblogs_spooldir. See the screenshot below:

MIS 6346 BIG DATA 20 5. Now I give all users the permissions to write to the /flume/weblogs_spooldir directory by using the command $ sudo chmod a+w -R /flume in command line.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 21 Configure Flume Run the Agent 6. Then, I changed the directory to /training_materials/dev1/exercises/flume directory. I used the command: $cd ~/ training_materials/dev1/exercises/flume

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 22 7. Then I started the Flume agent using the configuration that was made. 8. Then, I saw the message “web server log source started” in the command line.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 23 Simulate Apache web server output. 9. I opened a separate terminal window and changed it to the exercise directory. I ran the script to place the weblog files in the /flume/weblogs_spooldir directory: 10. Then I came back to the terminal which was running the flume agent. The following output was observed when I ran the flume agent. It gave information about the files Flume is putting into HDFS: 11. Then I pressed CTRL + C to terminate the process and got the below output.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

MIS 6346 BIG DATA 24 12. Then I opened Hue browser and navigated to /loudacre/weblogs directory. We see the newly flume imported files in the HDFS.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

TestOut LabSim 12.4.4.pdf

Learning Guide Unit 3_ Introduction.pdf

WA unit1 Soft.docx

Review Quiz_ Attempt review-04.jpg

kmeans-clustering-this_1699401272058.docx

Meeting Minutes Template_BSBPMG535 (1).docx

Learning Guide Unit 1_ Learning Journal.pdf

ISSC262-Final 3.png

KNearestNeighbors.java

Document (7).docx

lab4 - part2 - ceg3585.pdf

Discussion 2.docx

Recommended textbooks for you

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781285867168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Enhanced Discovering Computers 2017 (Shelly Cashm...
Computer Science
ISBN:9781305657458
Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781285867168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Text book image

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Text book image

Enhanced Discovering Computers 2017 (Shelly Cashm...

Computer Science

ISBN:9781305657458

Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell

Publisher:Cengage Learning

Text book image

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS