Rec 5 Regression with StatCrunch and Two Way Tables

docx

School

Boston University *

*We aren’t endorsed by this school

Course

5

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

10

Uploaded by BaronAlbatrossMaster1137

Report
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables Part 1: Regression with StatCrunch Nash Information Services provides information and analytical services to the movie industry, including analyses to predict movie revenue. To study movie revenue, they chose a simple random sample of 40 movies released over a 5-year period (2003-2008), and collected data on each movie. We can use this data to predict movie revenues for future movies. This data set is listed in the Recitation 5A section of Carmen as “BoxOffice.xls”. It is an Excel file. It needs to be put into StatCrunch. Putting the Data into a Stat Crunch Spreadsheet from Another Software Package 1. Log into MyStatLab 2. Click on the Stat 1430 course 3. Click on StatCrunch on the left side menu. 4. Click on “Visit the StatCrunch website 5. Click on “OPEN STATCRUNCH” on the menu bar at the very top. This will open a blank spreadsheet for you to enter data into. 6. Go to the BoxOffice.xls file on Carmen and COPY all the information, including the top row where the variable names are. 7. Go to the StatCrunch empty spreadsheet, click on the top left corner where “var 1” is listed, and PASTE. The entire data set should now be in the StatCrunch spreadsheet. The variable names (left to right) are the following. ( Note: money is in millions of dollars. ) Title : the name of the movie USRelease : the date the movie was first shown Genre : what type of movie is it? Rating : what age group can see the movie Rating1 : whether or not there are age restrictions for the movie (1=yes,0=no) Budget : cost to make the movie Opening : total box office revenue during opening weekend Theaters : number of theaters showing the movie during opening weekend. IntRevenue : entire amount of money made outside the U.S. USRevenue : entire amount of money the movie made in the United States during the entire time it was shown WorldRevenue : entire amount of money made both in and out of U.S. Profit : whether movie made a profit (1=yes, 0=no) Let’s suppose we ultimately want to predict box Total U.S. Box Office Revenue for a movie, using data from these previous movies. We start by looking for variables with which it has a strong relationship. (Remember, we can only look for linear relationships in this class.) 1. X and Y variables
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables a. Why is Total U.S. Box Office Revenue considered the “Y” (dependent) variable in this case? i. Total U.S Box Office Revenue is considered the Y value because it is the outcome or the variable we are trying to predict or explain. b. We need to find an appropriate X (independent) variable to help us predict Total U.S. Box Office Revenue. Which of the variables in this data set are eligible as potential candidates? (Note only certain types of variables can qualify for this type of analysis.) i. Potential Candidates for X include budget, opening, theatres, and IntRevenue. These are all quantitative and have a direct impact on revenue. 2. Explain why Total World Box Office Revenue wouldn’t be a fair variable to use to predict Total U.S . Box Office Revenue. (You can then cross it off your list above). a. You could not use the Total World Box Office Revenue because it includes the U.S Box Office Revenue which is what we are trying to test for. 3. Relationships b. To look for potential relationships that any of these variables have with Total U.S. Box Office Revenue, use StatCrunch to make the appropriate scatterplots. Copy/Paste them below. Be sure to include titles and axis names. Don’t spend too much time making them perfect, but try to capture the general shape of the scatterplot calculate the appropriate correlations to quantify these relationships. Label and INTERPRET each correlation, using the 3 items we learned in lecture.
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables . TO MAKE SCATTERPLOTS IN STATCRUNCH: -Go to GRAPHS/SCATTER PLOT. -Choose the X (independent) variable from the drop down menu (for example Budget) -Choose the Y (dependent) variable from the drop down menu (Total U.S. Box Office Rev.) -Click COMPUTE! c. Now use StatCrunch to calculate the appropriate correlations to quantify these relationships. Label and INTERPRET each correlation, using the 3 items we learned in lecture.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables International Revenue: Shape: Very Linear Direction: Positive Strength: r=0.911 The correlation is very strong. Theatres: Shape: Moderately Linear Direction: Positive Strength: r=0.653 The correlation is moderately strong. Opening: Shape: Very Linear Direction: Positive Strength: r=0.922 The correlation is very strong Budget: Shape: Not Very Linear Direction: Positive Strength: r=0.412 The correlation is weak. TO FIND CORRELATIONS IN STATCRUNCH: -Go to STAT/SUMMARY STATISTICS/CORRELATION/. -CLICK on the X variable from SELECT COLUMNS menu (for example Budget) -CTRL/CLICK on the Y variable from the SELECT COLUMNS menu (Total U.S. Box Office Rev.) -Click COMPUTE! d. Based on BOTH the scatterplots AND the correlations , which variable would do the best job of predicting Total U.S. Box Office Revenue? Justify your answer completely. The Opening Weekend Total Box Office Revenue would be the best at predicting the Total U.S. Box Office Revenue because it has the most linear scatter plot while also having the highest r correlation value. 4. a. The first method we discussed in lecture to find the best fitting line was to calculate 5 descriptive statistics and use them in the formulas to find the slope and y-intercept of the best fitting line. For this data set, use StatCrunch to calculate those 5 descriptive statistics , using the variable you selected in #6 above. Write them down and label them as we did in lecture. a. Correlation: r=0.922 b. Stand. Dev Y: 112.29
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables c. Stand Dev X: 35.69 d. Mean for Y: 136.89 e. Mean for X: 42.96 STATCRUNCH REMINDER: USE STAT/SUMMARY STATISTICS/COLUMNS and then choose the variables, the means and standard deviations. Then find the correlation using previous directions. b. Use those 5 descriptive statistics above to calculate the equation of the best-fitting line. Write it down and show your work. (If needed use your notes or see formula sheet on Carmen/Course Info.) Label X and Y. Y=b0+b1x. b1= (112.29/35.69)r b0=136.89-(2.901 * 42.96) Y=2.901X+12.269 where Y is U.S total box office revenue and X is Opening weekend box office revenue. 5. The 2 nd method we learned to find the best fitting line was to do an actual regression analysis . Using StatCrunch, do a regression analysis and write down the equation of the best fitting line using the coefficients it gives you. Write down the equation, labeling X and Y. Y=2.9011x+12.268. where Y is U.S total box office revenue and X is Opening weekend box office revenue. TO DO A REGRESSION ANALYLSIS IN STATCRUNCH: -Go to STAT/REGRESSION/SIMPLE LINEAR/. -CLICK on the X variable from SELECT X VARIABLE menu (for example Budget) -CLICK on the Y variable from the SELECT Y VARIABLE menu (Total U.S. Box Office Rev.) -Click COMPUTE! 6. Line fit a. Using StatCrunch output and your lecture notes, what % of the variability in U.S revenue can be explained by Opening Weekend Revenue ? (Which number represents this?) Does Opening Weekend Revenue do a good job of predicting U.S. revenue? Why? The R-Squared value is about 85.01% which means there is an 85.01% of the variability in U.S revenue can be explained by Opening Weekend Revenue. This is a high value and suggest that opening Weekend Revenue does a good job predicting the Total US Revenue. b. For what values of Opening Revenue is it appropriate and safe to make predictions about U.S. Revenue without extrapolation? (Use StatCrunch scatterplot or statistics to help you figure it out.) To avoid extrapolation we should only look at where the Opening data starts and ends. The minimum is 5.95 and the maximum is 151.12 so looking at numbers between this range will provide the best results.
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables 7. Find the movie Madagascar in your data set. a. Find the observed U.S. Revenue for this movie from the data. (include units.) 193.20293 million dollars b. Find the predicted U.S. Revenue for this movie from the best fitting line you calculated in above. (How do we use X to predict Y in an equation?) c. The predicted Y value or U.S Revenue would be 149.27 million dollars. We predict it by plugging the X (Opening) into the best fit line we created. d. Calculate the residual for this movie from the formula we used in lecture. The residual would be 43.94 million dollars. e. Did this movie make MORE or LESS money than expected? Explain briefly. This movie made more money than expected. This is shown by the positive residual value. Part 2: TWO-WAY TABLES An insurance company has collected the following data on the gender and marital status of 300 customers. The same information can be found in the data set “Gender and Marital Status.xlsx”, located on Carmen. Marital Status Gender Single Married Divorced Male 25 125 30 Female 50 50 20 1. Make a bar graph that shows the marginal distribution of marital status. Copy/paste it below. 2. Interpret your results; one sentence per graph is fine.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables The marginal distribution of marital status among the insurance customers show that married individuals make up the majority followed by single then divorced. StatCrunch Instructions for Marginal Distributions: DOWNLOAD the data file called “Gender and Marital Status.xlsx” in the Recitation section of Carmen. COPY the data. Note: The data in the table has been rearranged to be in the proper format for StatCrunch. (This is often what we have to do with data in the real world also.) Open StatCrunch (Go to MyStatLab / StatCrunch / Visit the StatCrunch Website / Open StatCrunch on top ribbon) PASTE the data in the upper left corner of the first empty row – no variable names here. Make a bar chart for the entire group: Click on: Graph/Bar Plot / With Summary (since data is summarized already) Under Categories: click “Variable 1” Under “Counts” click “Variable 3” Under “TYPE” pull down “Relative Frequency” or “Percent” (same results) Compute! Copy/Paste your bar graph in the space below or sketch it in the space provided. 3. Make an appropriate bar graph using STATCRUNCH, which shows the conditional distribution of marital status for the females . Label clearly. Copy/paste below. 4. Interpret your results in a sentence or two. For females with insurance coverage with this company there are a higher proportion of single woman than the prior graph. Married and Single are equal here. StatCrunch Instructions for Conditional Distributions: Make a bar chart for females only: o Click on: Graph/Bar Plot / With Summary (since data is summarized already)
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables o Under CATEGORIES: click “Variable 1” o Under COUNTS click “Variable 3” o Under WHERE click BUILD . Then build the statement “Var 2 = Female” by clicking on “Var 2”/ADD , then “=” sign, then under Values click “Female”/Add . Then click OKAY . o Under “TYPE” pull down “Relative Frequency” or “Percent” o Compute! o Copy/Paste your bar graph in the space below or sketch it in the space provided 5. Males a. Using the same instructions as for females, make an appropriate bar graph using STATCRUNCH, which shows the conditional distribution of marital status for the males . Label clearly. Copy/paste below. b. Interpret your results in a sentence or two. Compared to the female graph there is a much higher proportion of married men who have insurance coverage with this company. There are little single men with coverage with it being even lower than divorced men. 6. Based on your results above, are the 3 graphs different or the same for males vs females vs everyone? Explain. The 3 graphs are all different as they show how the difference between male and female population have that affect the first graph. It shows that men drive up the married while bringing down the single category. The females on the other hand drove up the single category. 7. Based on TWO OF YOUR graphs, is there a relationship between gender and marital status? If so, what is the relationship? (Use the percentages to answer this. Note that if you are using the distribution for marital status you have 3 groups, so you must consider all 3 of them in your analysis, not just 2 of them.) In the context of this insurance company’s customer base there is a small relationship between gender and marital status. For men it is much more likely they are married than divorced or single
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables with 0.7 frequency being married. While for woman it is more likely they are married or single both being 0.4 and divorced being lower. Data was collected on whether or not a student smokes and whether one or the other or both of their parents smoked. The data is shown below. It can also be found in the data set “Student and Parent Smoking.xlsx” located in the recitation section of Carmen. Neither parent smokes One parent smokes Both parents smoke Student doesn’t smoke 1168 1823 1380 Student smokes 188 416 400 8. Using appropriate graphs and percentages (include them on the answer sheet), describe students’ smoking behavior based on their parents’ smoking behavior. Which variable should you break down the data by (student or parent)? Choose the variable that you think makes the results easiest to understand. Make sure you describe the relationship using percentages, and interpret the results in a way that a Lantern reader would be interested in. IMPORTANT NOTE! DO NOT JUST COPY THE DATA TABLE AS IT IS. IT NEEDS TO BE FORMATTED LIKE THE GENDER AND MARITAL STATUS DATA. LOOK AT THE GENDER AND MARITAL STATUS DATA FIRST, THEN SEE IF YOU CAN MAKE THIS DATA LOOK LIKE THAT. This is something we often have to do with data, get it into the proper format. Bob collected data on a random sample of 800 students who applied to a business program. His two variables were gender, and whether the person was admitted or denied admission to the program. The data is found in the data file called BusinessAdmissions.xlsx, located in the recitation question section of Carmen. 9. Two way tables – Do both parts of this problem a. Create a 2x2 table from this data and place it below: STATCRUNCH INSTRUCTIONS FOR CREATING A 2x2 TABLE FROM DATA: Download the file; Copy the data set; Paste into STATCRUNCH in the upper left corner where VAR 1 is. Click on: STAT/TABLES/CONTINGENCY/WITH DATA. Click on the variable you want for your row variable and the variable you want for the column variable. COMPUTE!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STAT 1430 Recitation 5 Regression with StatCrunch and Two-way Tables b. Now create tables and/or graphs of the appropriate marginal and/or conditional distributions as done before, show there is a relationship between gender and whether a student was admitted or not. Use the methods shown in the previous problems as a guide. YOU WILL HAVE TO REFORMAT YOUR DATA TABLE BEFORE DOING THE GRAPHS LIKE YOU DID FOR THE PREVIOUS PROBLEM. Then describe the relationship describe using percentages. Copy/paste graphs below.