CODING LANGUAGE: R

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question
100%
 
 
 
 
 
 
 
 
 
 
 
 
 
CODING LANGUAGE: R (similar to python)
R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing.

The Behdel Test
The Bechdel test asks whether a work of fiction features at least two women who talk to each other
about something other than a man, and there must be two women named characters.
In this mini analysis we work with the data used in the FiveThirtyEight story titled:
"The Dollar-And-Cents Case Against Hollywood's Exclusion of Women"
https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-
women/
Start with loading the packages: fivethirtyeight, tidyverse
1. What information does this dataset contain? What commands did you use to see this?
For our purposes of analysis we will focus our analysis on movies released between 1990 and 2013.
bechdel90_13 <- bechdel %>%
filter(between(year, 1990, 2013))
2. How many movies are in our filtered data set?
The financial variables we'll focus on are the following:
- `budget_2013`: Budget in 2013 inflation adjusted dollars
- `domgross_2013`: Domestic gross (US) in 2013 inflation adjusted dollars
- `intgross_2013`: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars
And we'll also use the `binary` and `clean_test` variables for **grouping**.
Let's take a look at how median budget and gross vary by whether the movie passed the Bechdel test,
which is stored in the `binary` variable.
bechdel90_13 %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE))

 
 

Next, let us take a look at how median budget and gross vary by a more detailed indicator of the
Bechdel test result.
This information is stored in the `clean_test` variable, which takes on the following values:
- `ok` = passes test
- `dubious`
- `men` = women only talk about men
- `notalk` = women don't talk to each other
- `nowomen` = fewer than two women
bechdel90_13 %>%
#group_by(___) %>%
summarise(med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE))
In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test,
we'll first create a new variable called `roi` as the ratio of the gross to budget.
bechdel90_13 <- bechdel90_13 %>%
mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)
Let's see which movies have the highest return on investment.
bechdel90_13 %>%
arrange(desc(roi)) %>%
select(title, roi, year)

 

Below is a visualization of the return on investment by test result, however it's difficult to see the
distributions due to a few extreme observations.
ggplot(data = bechdel90_13,
mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(title = "Return on investment vs. Bechdel test result",
x = "Detailed Bechdel result",
y = "___",
color = "Binary Bechdel result")
3. What are those movies with *very* high returns on investment?
bechdel90_13 %>%
filter(roi > 400) %>%
select(title, budget_2013, domgross_2013, year)
Zooming in on the movies with `roi < ___` provides a better view of how the medians across the
categories compare:
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(title = "Return on investment vs. Bechdel test result",
subtitle = "___", # Something about zooming in to a certain level
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result") +
coord_cartesian(ylim = c(0, 15))

 
Expert Solution
Step 1

This is very simple.

Here I have written the full R code for the given problem statement. You need the following setup before running the code.

  • Run the code in Anaconda (line by line)
  • Press enter to see the histogram and output. Analyze it
  • Kindly install the required library for this code. Install gdata and cwhmisc library.
  • Type install.packages(c("gdata", "cwhmisc")) 
  • Make sure the CSV FILE ("movies.csv") is in the same folder where all the R file is saved.


------------------------------------------- R code -----------------------------------------------------

# Load data
rawData<-read.csv("movies.csv", na.strings="#N/A")

# Select movies pre-1990, and format $-denominated data fields
rawData<-rawData[rawData$year>1989,]

# International-only gross profits (which equal total profits minus domestic profits)
rawData$intOnly<-rawData$intgross_2013.-rawData$domgross_2013.

# Return on Investment (ROI) measures
rawData$ROI<-rawData$intgross_2013./rawData$budget_2013. #  Total ROI
rawData$ROI1<-rawData$domgross_2013./rawData$budget_2013. #  Domestic ROI
rawData$ROI2<-rawData$intOnly/rawData$budget_2013.  #  International ROI

# Divide movies into FAIL and PASS divisions
failMovies<-rawData[rawData$binary=="FAIL",]
passMovies<-rawData[rawData$binary=="PASS",]

# Include a "generous" category (which includes both "ok" and "dubious" movies)
generous<-rbind(rawData[rawData$clean_test=="ok",], rawData[rawData$clean_test=="dubious",])

# Print medians: ROI and budget
median(failMovies$ROI, na.rm=T)
median(passMovies$ROI, na.rm=T)
median(rawData$ROI, na.rm=T)

median(failMovies$budget_2013.)
median(passMovies$budget_2013.)
median(rawData$budget_2013.)

# Distributions and logs
hist(rawData$budget_2013.)
hist(log(rawData$budget_2013.))

hist(rawData$intgross_2013.)
hist(log(rawData$intgross_2013.))

hist(rawData$ROI)
hist(log(rawData$ROI))

# Linear regression models

# Movies with higher budgets make more gross revenues
summary(lm(log(intgross_2013.)~log(budget_2013.), data=rawData))

# Bechdel dummy is not significant
summary(lm(log(intgross_2013.)~log(budget_2013.)+factor(binary), data=rawData))

# Movies with higher budgets have lower ROI
summary(lm(log(ROI)~log(budget_2013.), data=rawData))

# Bechdel dummy is not significant
summary(lm(log(ROI)~log(budget_2013.)+factor(binary), data=rawData))

# ROI #1 (domestic) used in chart
median(generous$ROI1, na.rm=T)
median(rawData$ROI1[rawData$clean_test=="men"], na.rm=T)
median(rawData$ROI1[rawData$clean_test=="notalk"], na.rm=T)
median(rawData$ROI1[rawData$clean_test=="nowomen"], na.rm=T)

# ROI #2 (international) used in chart
median(generous$ROI2, na.rm=T)
median(rawData$ROI2[rawData$clean_test=="men"], na.rm=T)
median(rawData$ROI2[rawData$clean_test=="notalk"], na.rm=T)
median(rawData$ROI2[rawData$clean_test=="nowomen"], na.rm=T)

steps

Step by step

Solved in 2 steps

Blurred answer
Similar questions
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY