stats assignment 2

pdf

School

University of Manitoba *

*We aren’t endorsed by this school

Course

1000

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

5

Uploaded by BrigadierWhale3936

Report
STAT 1000 - Assignment 2 Matthew Lizotte (7969397) 2023-02-28 Instructions To properly view the assignment questions, knit this file to .PDF and view the output. To enter your answers, add code as needed into the R code chunks given below, and, where applicable, replace the “Delete me; . . . ” and add in your own text response. Be sure when adding in text responses to never copy-paste symbols from outside of the document. Only use the symbols on your keyboard. Do not delete the question text, or modify any other part of the code except for the “author” in Line 3. All numerical and graphical answers must be done using R, unless stated otherwise. You will have a link in your email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it to .PDF and upload your output to Crowdmark. Also, upload your .Rmd file to Crowdmark where prompted. To see where your .Rmd file is saved, click File > Save As in the top-left of your screen. Make sure you set your Name and Student Number in the Author section of this document (Line 3). Do not alter the title or the date. Please note that if you do not submit a knit .PDF file, you will be given a grade of zero. After you knit your assignment to PDF, check your code chunks. If your code at any point runs off the page, find the nearest comma, click to the right of it, and press Enter (or Return if you are on a Mac). This will force a break in the code so that it goes onto the next line. All of your code must be readable in the final submission. All calculations and output must be visible in the final document, and all text responses should be in complete English sentences. Your work should be done using the same formatting, functions, and packages as in your labs and course notes, unless otherwise specified. You may speak to your class mates about ideas and what functions/optional arguments you may need to use but you may not directly show your code/output to your classmates. Your full submission is due by 11:59 p.m. on Wednesday, March 1. Crowdmark may allow you to submit late, but you will be given an automatic grade of zero if you do. If you have an issue that you can’t resolve without someone looking at your work (e.g., you get an error when knitting your document), please see the Help Centre in 311 Machray Hall. 1
Before you begin, set your name and student number in Line 3. 0. Import the Admissions dataset, available on the UMLearn page, and make sure that the object is named Admissions. Make sure you have “Heading” set to “Yes” when you import the data. [1 mark] Admissions <- read.csv( "~/Downloads/Admissions.csv" ) This dataset contains the admission information for a sample of 500 applicants to a Master’s program in a school in India (however, you will only be working with a sample of 200 of these applicants). The admission information includes the GRE score (the Graduate Record Examination, scored out of 340), the TOEFL score (Test of English as a Foreign Language, scored out of 120), and the CGPA score (Cumulative Grade Point Average, scored out of 10). The block of code below will isolate a sample of 200 of these observations, which is the dataset that you will be working with. After importing the data, replace 1111111 with your seven-digit student ID number in the set.seed function below, and click the green arrow at the top-right hand side of the code chunk. This part is not worth marks, but you will receive a five-mark deduction on your assignment if it is not completed correctly. set.seed( 7969397 ) if (!exists( "Admissions" )) Admissions = data.frame( x= rep( 0 , 200 ), y= rep( 0 , 200 )) Admissions = Admissions[sample( 1 :NROW(Admissions), 200 ), ] Make sure you complete this setup stage before beginning your assignment! Questions [24 marks] 1. Produce a scatterplot comparing the CGPA scores (X) to the GRE scores (Y). Set appropriate labels for the x- and y-axes, and set a title as well. [3 marks] plot(Admissions$CGPA, Admissions$GRE, xlab = "CGPA Scores" , ylab = "GRE Scores" , main = "CGPA and GRE Sc 2
7.0 7.5 8.0 8.5 9.0 9.5 10.0 300 310 320 330 340 CGPA and GRE Scatterplot CGPA Scores GRE Scores 2. Calculate the least squares regression equation for predicting GRE from CGPA. (You only need to produce the coefficients of the line.) [2 marks] lm(Admissions$GRE ~ Admissions$CGPA) ## ## Call: ## lm(formula = Admissions$GRE ~ Admissions$CGPA) ## ## Coefficients: ## (Intercept) Admissions$CGPA ## 195.15 14.14 3. Repeat the plot from Question 1, and overlay the regression line from Question 2. Make the regression line red to help distinguish it from the scatterplot. [2 marks] lm.GVC <- lm(Admissions$GRE ~ Admissions$CGPA) plot(Admissions$CGPA, Admissions$GRE, xlab = "CGPA Scores" , ylab = "GRE Scores" , main = "Height and Arms 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7.0 7.5 8.0 8.5 9.0 9.5 10.0 300 310 320 330 340 Height and Armspan Scatterplot CGPA Scores GRE Scores 4. Calculate the coefficient of correlation ( r ) between CGPA and GRE. [1 mark] cor(Admissions$CGPA, Admissions$GRE) ## [1] 0.7766789 5. Calculate the square of the correlation ( r 2 ) between CGPA and GRE. (Note: you can just use R as a calculator here and use your result from Question 4.) [1 mark] cor(Admissions$CGPA, Admissions$GRE)ˆ 2 ## [1] 0.6032302 6. Provide an interpretation of the value of r 2 from Question 5 in the context of this data set. [2 marks] In this dataset, the r 2 value of the approximately 0.682. . . means that about 68.2% of the variability in the GRE scores can be explained be the GCPA is a moderately strong predictor of GRE scores 7. Give a fully worded interpretation of the slope of the least squares regression equation from Question 2, in the context of this data set. [2 marks] The slope of the least squares regression equation is 15.42, which means that for every one unit increase in CGPA scores. This suggests that CGPA is a moderately strong predictor of GRE scores. 4
8. Narayan and Dhruv (not included in this dataset) have CGPAs of 5.5 and 8.3, respectively. Using the regression equation calculated earlier, determine the predicted GRE score of each of these students. [1 mark] ( 184.19 +( 15.42 * 5.5 )) ## [1] 269 ( 184.19 +( 15.42 * 8.3 )) ## [1] 312.176 9. Is one of the two predicted values in the previous question more reliable than the other? If so, which one is more reliable, and why? [2 marks] The predicted value for the CGPA of 8.3 is more reliable because the dataset does not contain only numbers around the 5.5 range 10. Priyam (a student in this sample) has a CGPA of 9.17 and a GRE of 325 (knit this file to PDF to see the values). Use the regression equation from Question 2 to calculate the calculate the residual for this applicant. [1 mark] 184.19 + ( 15.42 * 9.17 ) ## [1] 325.5914 ( 325-325.5914 ) ## [1] -0.5914 11. What does the sign of the residual in Question 10 tell us? [1 mark] The sign of the residual in question 10 shows us that actual residual of -0.5914 is lower than the predicted residual which was 9.17 12. Two new applicants are to be added to this dataset. One applicant – Radha – has a CGPA score of 8.4 and a GRE score of 339, and the other – Shanti – has a CGPA score of 6.1 and a GRE score of 290. Which of these observations, if any, would be considered influential? Why? [2 marks] none of these values would be considered very influential, this is because they havce very little effect on the regression line 13. Suppose that instead of measuring CGPA out of 10 and GRE out of 340, we measured them both as a percentage (out of 100). Without doing any calculations, what would be the value of the correlation between CGPA and GRE after making these changes? Why? [2 mark] The value of the corrlation would not change at all, this is because by changing the units of x and y, it would not change the value of r 14. Does the high value of the correlation between CGPA and GRE imply that there is a cause-and-effect relationship between the two variables? Why? If not, what else might explain the correlation between them? [2 marks] No the high value does not imply that there is a cause and effect relationship between the two variables. This is not the cause because there is a chance that there are lurking variables 5