IE 0015_ Homework 1 2022
docx
keyboard_arrow_up
School
University of Pittsburgh *
*We aren’t endorsed by this school
Course
15
Subject
Industrial Engineering
Date
Dec 6, 2023
Type
docx
Pages
3
Uploaded by CorporalOkapi1224
IE 0015: Homework 1
Post date:
1/30/2023
Due date
: Midnight on 2/17/2023
●
Please record your answers in this microsoft word document. Also, please note who (if
anyone) you worked with.
●
In this homework you will import twitter data into R.
●
For this homework you will need the file twitter.csv that is posted on canvas.
To get things started, import twitter.csv into a dataframe called twitter in R.
Problem 1
In this problem you will add a new variable called success_metric to the dataframe twitter. The
variable captures the “successfulness” of a tweet. Let’s define this new variable by
success_metric = 4*number_of_likes + 21*number_of_shares.
So tweets with more likes and shares have a higher success_metric value. Please report your
code below.
Answer
:
*code on last page*
Problem 2
Determine the average success_metric value of tweets authored by Taylor Swift. Please report
your answer and code below.
Answer
:
average success metric of Taylor Swift’s tweets = 350921.5
*Code on last page*
Problem 3
What proportion of Taylor Swift’s tweets contain more than 60 characters? Hint: Use the function
nchar().
Answer
:
proportion = 0.77
*Code on last page*
Problem 4
In this problem we will determine how many tweets contain more than 60 characters. Note that
we are interested in all tweets, not just Taylor Swift’s tweets. Of course this is an “easier”
problem than Problem 3, but you must solve it in two different ways:
a.
First use just one line of code.
b.
Next use a for loop and an if statement (nested in the for loop).
Please report your code below.
Answer
:
38571 tweets that have more than 60 characters
*Code on last page*
Problem 5
:
Write a function that takes a vector of strings (characters in R) as input, computes the number of
hashtags (#) in each string, and then returns the average number of the hashtags as output.
The function str_count() in the package stringr will be useful for this problem. For a vector of
strings x and a character c (think c = #), calling str_count(x,”c”) returns a vector containing the
number of times c occurs in each string in x.
Please report your code below.
Answer
:
average hashtag occurrence = 0.49
*Code on last page*
Problem 6
:
Use the function tapply() and the function you developed in problem 5 to answer the following
questions.
a.
Which author uses the most hashtags on average?
b.
Which author uses the least hashtags on average?
c.
What is the average number of hashtags used (across authors).
Answer
:
a.
Rihanna
b.
CNNbrk
c.
0.4877
twitter = read.csv("twitter.csv")
#problem 1
twitter$success_metric = 4*twitter$number_of_likes+21*twitter$number_of_shares
#problem 2
taylor_swift = twitter[twitter$author == "taylorswift13",]
mean(taylor_swift$success_metric)
#problem 3
taylor_swift$greater_than_60 = nchar(taylor_swift$content)>60
table(taylor_swift$greater_than_60)
proportion = 1567/2029
#problem 4
more_than_60_a = twitter[(nchar(twitter$content)>60) == TRUE,]
#sum(nchar(twitter$content)>60
)
over_60 = 0
for (i in twitter$content) {
if(nchar(i)>60) {
over_60 = over_60 + 1
}
}
#problem 5
install.packages("stringr")
library(stringr)
average_hashtag = function(x) {
hashtag_occurance = str_count(x, "#")
return(mean(hashtag_occurance))
}
average_hashtag(twitter$content)
#problem 6
tapply(twitter$content, twitter$author, average_hashtag)
mean(tapply(twitter$content, twitter$author, average_hashtag))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help