IE 0015_ Homework 1 2022

docx

School

University of Pittsburgh *

*We aren’t endorsed by this school

Course

15

Subject

Industrial Engineering

Date

Dec 6, 2023

Type

docx

Pages

3

Uploaded by CorporalOkapi1224

Report
IE 0015: Homework 1 Post date: 1/30/2023 Due date : Midnight on 2/17/2023 Please record your answers in this microsoft word document. Also, please note who (if anyone) you worked with. In this homework you will import twitter data into R. For this homework you will need the file twitter.csv that is posted on canvas. To get things started, import twitter.csv into a dataframe called twitter in R. Problem 1 In this problem you will add a new variable called success_metric to the dataframe twitter. The variable captures the “successfulness” of a tweet. Let’s define this new variable by success_metric = 4*number_of_likes + 21*number_of_shares. So tweets with more likes and shares have a higher success_metric value. Please report your code below. Answer : *code on last page* Problem 2 Determine the average success_metric value of tweets authored by Taylor Swift. Please report your answer and code below. Answer : average success metric of Taylor Swift’s tweets = 350921.5 *Code on last page* Problem 3 What proportion of Taylor Swift’s tweets contain more than 60 characters? Hint: Use the function nchar(). Answer : proportion = 0.77 *Code on last page* Problem 4 In this problem we will determine how many tweets contain more than 60 characters. Note that we are interested in all tweets, not just Taylor Swift’s tweets. Of course this is an “easier” problem than Problem 3, but you must solve it in two different ways:
a. First use just one line of code. b. Next use a for loop and an if statement (nested in the for loop). Please report your code below. Answer : 38571 tweets that have more than 60 characters *Code on last page* Problem 5 : Write a function that takes a vector of strings (characters in R) as input, computes the number of hashtags (#) in each string, and then returns the average number of the hashtags as output. The function str_count() in the package stringr will be useful for this problem. For a vector of strings x and a character c (think c = #), calling str_count(x,”c”) returns a vector containing the number of times c occurs in each string in x. Please report your code below. Answer : average hashtag occurrence = 0.49 *Code on last page* Problem 6 : Use the function tapply() and the function you developed in problem 5 to answer the following questions. a. Which author uses the most hashtags on average? b. Which author uses the least hashtags on average? c. What is the average number of hashtags used (across authors). Answer : a. Rihanna b. CNNbrk c. 0.4877
twitter = read.csv("twitter.csv") #problem 1 twitter$success_metric = 4*twitter$number_of_likes+21*twitter$number_of_shares #problem 2 taylor_swift = twitter[twitter$author == "taylorswift13",] mean(taylor_swift$success_metric) #problem 3 taylor_swift$greater_than_60 = nchar(taylor_swift$content)>60 table(taylor_swift$greater_than_60) proportion = 1567/2029 #problem 4 more_than_60_a = twitter[(nchar(twitter$content)>60) == TRUE,] #sum(nchar(twitter$content)>60 ) over_60 = 0 for (i in twitter$content) { if(nchar(i)>60) { over_60 = over_60 + 1 } } #problem 5 install.packages("stringr") library(stringr) average_hashtag = function(x) { hashtag_occurance = str_count(x, "#") return(mean(hashtag_occurance)) } average_hashtag(twitter$content) #problem 6 tapply(twitter$content, twitter$author, average_hashtag) mean(tapply(twitter$content, twitter$author, average_hashtag))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help