Rec 5 Regression with StatCrunch and Two Way Tables
docx
keyboard_arrow_up
School
Boston University *
*We aren’t endorsed by this school
Course
5
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
10
Uploaded by BaronAlbatrossMaster1137
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
Part 1: Regression with StatCrunch
Nash Information Services provides information and analytical services to the movie industry, including analyses to predict movie revenue. To study movie revenue, they chose a simple random sample of 40 movies released over a 5-year period (2003-2008), and collected data on each movie. We can use this data to predict movie revenues for future movies. This data set is listed in the Recitation 5A section of Carmen as “BoxOffice.xls”. It is an Excel file. It needs to be put into StatCrunch. Putting the Data into a Stat Crunch Spreadsheet from Another Software Package
1.
Log into MyStatLab
2.
Click on the Stat 1430 course
3.
Click on StatCrunch on the left side menu.
4.
Click on “Visit the
StatCrunch website
”
5.
Click on “OPEN STATCRUNCH” on the menu bar at the very top. This will open a blank spreadsheet for you to enter data into. 6.
Go to the BoxOffice.xls file on Carmen and COPY all the information, including the top row where the variable names are.
7.
Go to the StatCrunch empty spreadsheet, click on the top left corner where “var 1” is listed, and PASTE. The entire data set should now be in the StatCrunch spreadsheet.
The variable names (left to right) are the following. (
Note: money is in millions of dollars.
)
Title
: the name of the movie
USRelease
: the date the movie was first shown
Genre
: what type of movie is it?
Rating
: what age group can see the movie
Rating1
: whether or not there are age restrictions for the movie (1=yes,0=no)
Budget
: cost to make the movie
Opening
: total box office revenue during opening weekend
Theaters
: number of theaters showing the movie during opening weekend.
IntRevenue
: entire amount of money made outside the U.S.
USRevenue
: entire amount of money the movie made in the United States during the
entire time it was shown
WorldRevenue
: entire amount of money made both in and out of U.S.
Profit
: whether movie made a profit (1=yes, 0=no)
Let’s suppose we ultimately want to predict box Total U.S. Box Office Revenue for a movie, using data from these previous movies. We start by looking for variables with which it has a strong relationship. (Remember, we can only look for linear relationships in this class.)
1.
X and Y variables
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
a.
Why is Total U.S. Box Office Revenue considered the “Y” (dependent) variable
in this case?
i.
Total U.S Box Office Revenue is considered the Y value because it is the outcome or the variable we are trying to predict or explain. b.
We need to find an appropriate X (independent) variable to help us predict Total U.S. Box Office Revenue. Which of the variables in this data set are eligible as potential candidates? (Note only certain types of variables can qualify for this type of analysis.) i.
Potential Candidates for X include budget, opening, theatres, and IntRevenue. These are all quantitative and have a direct impact on revenue. 2.
Explain why Total World
Box Office Revenue wouldn’t be a fair variable to use to predict Total U.S
. Box Office Revenue. (You can then cross it off your list above). a.
You could not use the Total World Box Office Revenue because it includes the
U.S Box Office Revenue which is what we are trying to test for. 3.
Relationships
b.
To look for potential relationships that any of these variables have with Total U.S. Box Office Revenue, use StatCrunch to make the appropriate scatterplots. Copy/Paste them below. Be sure to include titles and axis names.
Don’t spend too much time making them perfect, but try to capture the general shape of the scatterplot calculate the appropriate correlations to quantify these relationships. Label and INTERPRET each correlation, using the 3 items we learned in lecture.
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
.
TO MAKE SCATTERPLOTS IN STATCRUNCH:
-Go to GRAPHS/SCATTER PLOT. -Choose the X (independent) variable from the drop down menu (for example Budget)
-Choose the Y (dependent) variable from the drop down menu (Total U.S. Box Office Rev.)
-Click COMPUTE!
c.
Now use StatCrunch to calculate the appropriate correlations to quantify these relationships. Label and INTERPRET each correlation, using the 3 items we learned in lecture.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
International Revenue: Shape: Very Linear Direction: Positive
Strength: r=0.911 The correlation is very strong. Theatres: Shape: Moderately Linear
Direction: Positive
Strength: r=0.653 The correlation is moderately strong. Opening: Shape: Very Linear
Direction: Positive
Strength: r=0.922 The correlation is very strong
Budget:
Shape: Not Very Linear
Direction: Positive
Strength: r=0.412 The correlation is weak. TO FIND CORRELATIONS IN STATCRUNCH:
-Go to STAT/SUMMARY STATISTICS/CORRELATION/. -CLICK on the X variable from SELECT COLUMNS menu (for example Budget)
-CTRL/CLICK on the Y variable from the SELECT COLUMNS menu (Total U.S. Box Office Rev.)
-Click COMPUTE!
d.
Based on BOTH the scatterplots AND the correlations
, which variable would do the best job of predicting Total U.S. Box Office Revenue? Justify your answer completely.
The Opening Weekend Total Box Office Revenue would be the best at predicting the Total U.S. Box Office Revenue because it has the most linear scatter plot while also having the highest r correlation value. 4.
a. The first method we discussed in lecture to find the best fitting line was to calculate 5 descriptive statistics and use them in the formulas to find the slope and y-intercept of the best fitting line. For this data set, use StatCrunch to calculate those 5 descriptive statistics
, using the variable you selected in #6 above. Write them down and label them as we did in lecture. a.
Correlation: r=0.922
b.
Stand. Dev Y: 112.29
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
c.
Stand Dev X: 35.69
d.
Mean for Y: 136.89
e.
Mean for X: 42.96
STATCRUNCH REMINDER: USE STAT/SUMMARY STATISTICS/COLUMNS and then choose the variables, the means and standard deviations. Then find the correlation using previous directions.
b.
Use those 5 descriptive statistics above to calculate the equation of the best-fitting line. Write it down and show your work. (If needed use your notes or see formula sheet on Carmen/Course Info.) Label X and Y.
Y=b0+b1x. b1= (112.29/35.69)r b0=136.89-(2.901 * 42.96) Y=2.901X+12.269 where Y is U.S total box office revenue and X is Opening weekend box office revenue.
5.
The 2
nd
method we learned to find the best fitting line was to do an actual regression
analysis
. Using StatCrunch, do a regression analysis and write down the equation of the best fitting line using the coefficients it gives you. Write down the equation, labeling X and Y. Y=2.9011x+12.268. where Y is U.S total box office revenue and X is Opening weekend box office revenue. TO DO A REGRESSION ANALYLSIS IN STATCRUNCH:
-Go to STAT/REGRESSION/SIMPLE LINEAR/. -CLICK on the X variable from SELECT X VARIABLE menu (for example Budget)
-CLICK on the Y variable from the SELECT Y VARIABLE menu (Total U.S. Box Office Rev.)
-Click COMPUTE!
6.
Line fit
a.
Using StatCrunch output and your lecture notes, what % of the variability in U.S revenue can be explained by Opening Weekend Revenue
? (Which number represents this?) Does Opening Weekend Revenue do a good job of predicting U.S. revenue? Why?
The R-Squared value is about 85.01% which means there is an 85.01% of the variability in U.S revenue can be explained by Opening Weekend Revenue. This is a high value and suggest that opening Weekend Revenue does a good job predicting the Total US Revenue. b.
For what values of Opening Revenue is it appropriate and safe to make predictions about U.S. Revenue without extrapolation? (Use StatCrunch scatterplot or statistics to help you figure it out.)
To avoid extrapolation we should only look at where the Opening data starts and ends. The minimum is 5.95 and the maximum is 151.12 so looking at numbers between this range will provide the best results.
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
7.
Find the movie Madagascar
in your data set.
a.
Find the observed U.S. Revenue for this movie from the data. (include units.)
193.20293 million dollars
b.
Find the predicted U.S. Revenue for this movie from the best fitting line you calculated in above. (How do we use X to predict Y in an equation?)
c.
The predicted Y value or U.S Revenue would be 149.27 million dollars. We predict it by plugging the X (Opening) into the best fit line we created. d.
Calculate the residual for this movie from the formula we used in lecture. The residual would be 43.94 million dollars. e.
Did this movie make MORE or LESS money than expected? Explain briefly.
This movie made more money than expected. This is shown by the positive residual value. Part 2: TWO-WAY TABLES
An insurance company has collected the following data on the gender and marital status of 300
customers. The same information can be found in the data set “Gender and Marital Status.xlsx”,
located on Carmen.
Marital Status
Gender
Single
Married
Divorced
Male
25
125
30
Female
50
50
20
1.
Make a bar graph that shows the marginal distribution of marital status.
Copy/paste it below.
2.
Interpret your results; one sentence per graph is fine.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
The marginal distribution of marital status among the insurance customers show that married
individuals make up the majority followed by single then divorced. StatCrunch Instructions for Marginal Distributions:
DOWNLOAD the data file called “Gender and Marital Status.xlsx” in the Recitation section of
Carmen.
COPY the data. Note: The data in the table has been rearranged to be in the proper format for
StatCrunch. (This is often what we have to do with data in the real world also.)
Open StatCrunch (Go to MyStatLab
/ StatCrunch
/ Visit the StatCrunch Website
/ Open
StatCrunch
on top ribbon)
PASTE the data in the upper left corner of the first empty row – no variable names here.
Make a bar chart for the entire group:
Click on: Graph/Bar Plot / With Summary (since data is summarized already)
Under Categories: click “Variable 1”
Under “Counts” click “Variable 3”
Under “TYPE” pull down “Relative Frequency” or “Percent” (same results)
Compute!
Copy/Paste your bar graph in the space below or sketch it in the space provided.
3.
Make an appropriate bar graph
using STATCRUNCH, which shows the conditional distribution of
marital status for the females
. Label clearly. Copy/paste below.
4.
Interpret your results in a sentence or two.
For females with insurance coverage with this company there are a higher proportion of single
woman than the prior graph. Married and Single are equal here. StatCrunch Instructions for Conditional Distributions:
Make a bar chart for females only:
o
Click on: Graph/Bar Plot / With Summary (since data is summarized already)
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
o
Under CATEGORIES: click “Variable 1”
o
Under COUNTS click “Variable 3”
o
Under WHERE click BUILD
. Then build the statement “Var 2 = Female” by
clicking on “Var 2”/ADD
, then “=”
sign, then under Values click “Female”/Add
.
Then click OKAY
.
o
Under “TYPE” pull down “Relative Frequency” or “Percent”
o
Compute!
o
Copy/Paste your bar graph in the space below or sketch it in the space provided
5.
Males
a.
Using the same instructions as for females, make an appropriate bar graph
using
STATCRUNCH, which shows the conditional distribution of marital status for the males
.
Label clearly. Copy/paste below. b.
Interpret your results in a sentence or two. Compared to the female graph there is a much higher proportion of married men who have insurance
coverage with this company. There are little single men with coverage with it being even
lower than divorced men. 6.
Based on your results above, are the 3 graphs different or the same for males vs females vs
everyone? Explain. The 3 graphs are all different as they show how the difference between male and female
population have that affect the first graph. It shows that men drive up the married while
bringing down the single category. The females on the other hand drove up the single category. 7.
Based on TWO OF YOUR graphs, is there a relationship between gender and marital status? If so,
what is the relationship? (Use the percentages to answer this. Note that if you are using the
distribution for marital status you have 3 groups, so you must consider all 3 of them in your
analysis, not just 2 of them.)
In the context of this insurance company’s customer base there is a small relationship between
gender and marital status. For men it is much more likely they are married than divorced or single
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
with 0.7 frequency being married. While for woman it is more likely they are married or single both
being 0.4 and divorced being lower. Data was collected on whether or not a student smokes and whether one or the other or both
of their parents smoked. The data is shown below. It can also be found in the data set “Student and Parent Smoking.xlsx” located in the recitation section of Carmen.
Neither parent smokes
One parent smokes
Both parents smoke
Student doesn’t smoke
1168
1823
1380
Student smokes
188
416
400
8. Using appropriate graphs and percentages (include them on the answer sheet), describe students’ smoking behavior based on their parents’ smoking behavior. Which variable should you break down the data by (student or parent)? Choose the variable that you think makes the results easiest to understand. Make sure you describe the relationship using percentages, and interpret the results in a way that a Lantern reader would be interested in.
IMPORTANT NOTE! DO NOT JUST COPY THE DATA TABLE AS IT IS.
IT NEEDS TO BE FORMATTED LIKE THE GENDER AND MARITAL STATUS DATA. LOOK AT THE GENDER AND MARITAL STATUS DATA FIRST, THEN SEE IF YOU CAN MAKE THIS DATA LOOK LIKE THAT. This is something we often have to do with data, get it into the proper format.
Bob collected data on a random sample of 800 students who applied to a business program. His two variables were gender, and whether the person was admitted or denied admission to the program. The data is found in the data file called BusinessAdmissions.xlsx, located in the recitation
question section of Carmen.
9.
Two way tables – Do both parts of this problem
a.
Create a 2x2 table from this data and place it below:
STATCRUNCH INSTRUCTIONS FOR CREATING A 2x2 TABLE FROM DATA:
Download the file; Copy the data set; Paste into STATCRUNCH in the upper left corner where VAR 1 is.
Click on: STAT/TABLES/CONTINGENCY/WITH DATA.
Click on the variable you want for your row variable and the variable you want for the column variable.
COMPUTE!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
STAT 1430 Recitation 5
Regression with StatCrunch and Two-way Tables
b.
Now create tables and/or graphs of the appropriate marginal and/or conditional distributions as done before, show there is a relationship
between gender and whether a student was admitted or not. Use the methods shown in the previous problems as a guide. YOU WILL HAVE TO REFORMAT YOUR DATA TABLE BEFORE DOING THE GRAPHS
LIKE YOU DID FOR THE PREVIOUS PROBLEM.
Then describe the relationship describe using percentages. Copy/paste graphs below.
Related Documents
Recommended textbooks for you
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285463247/9781285463247_smallCoverImage.gif)
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337282291/9781337282291_smallCoverImage.gif)
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
![Text book image](https://www.bartleby.com/isbn_cover_images/9781285463247/9781285463247_smallCoverImage.gif)
Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
![Text book image](https://www.bartleby.com/isbn_cover_images/9780079039897/9780079039897_smallCoverImage.jpg)
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
![Text book image](https://www.bartleby.com/isbn_cover_images/9781337282291/9781337282291_smallCoverImage.gif)
![Text book image](https://www.bartleby.com/isbn_cover_images/9781680331141/9781680331141_smallCoverImage.jpg)
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
![Text book image](https://www.bartleby.com/isbn_cover_images/9780547587776/9780547587776_smallCoverImage.jpg)
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL