Stats-101A-HW4

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

101A

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

5

Uploaded by ucladsp

Report
Stats 101A HW 4 Ian Zhang UID: 205702810 2023-04-28 Question 1 playbill <- read.csv( "playbill.csv" ) m1 <- lm(CurrentWeek ~ LastWeek, data = playbill) #a CI <- confint(m1, level = . 95 )[ 2 ,] #confidence interval is from 0.9514971 to 1.0126658 #this means that 1 is a plausible value for B1, because 1 is within the confidence interval #b h0 <- 10000 b0h <- coef(m1)[ 1 ] se <- summary(m1)$coef[ 1 , 2 ] tstat <- (b0h - h0)/(se) deg <- df.residual(m1) pvalue <- 2 *pt(abs(tstat), df = deg, lower.tail = FALSE) pvalue ## (Intercept) ## 0.7517807 #Since the p value is .75, which is greater than 0.05, which is the significance level, #we fail to reject the null hypothesis. this means that there is not enough evidence to conclude that #the intercept of the regression line is far different than 10000 #c predict(m1, data.frame( LastWeek = 400000 ), interval = "predict" ) ## fit lwr upr ## 1 399637.5 359832.8 439442.2 #the estimated current week if last week was 400000 is 399637.5. The 95% prediction interval is #between 359832.8 and 439442.2. #450000 is not a feasible value for the gross box office results in the current week, as it is outside o #prediction interval library(ggplot2) ggplot(playbill, aes( x = LastWeek, y = CurrentWeek)) + geom_point() + geom_abline( slope = 1 , intercept = 0 , color = "red" ) 1
300000 600000 900000 1200000 250000 500000 750000 1000000 125000 LastWeek CurrentWeek #the line seems to be a good fit with the graph, so from visual analysis it seems like a valid model. Question 1b plot(m1$residuals ~ playbill$LastWeek, xlab = "residuals" , ylab = "Last Week" ) 200000 400000 600000 800000 1000000 1200000 -20000 0 20000 residuals Last Week 2
#the model appears to be random scatter, and there is no clear pattern, which shows that model #is valid and accurate Question 2 ind <- read.table( "indicators.txt" , header = TRUE) m2 <- lm(PriceChange ~ LoanPaymentsOverdue, data = ind) m2 ## ## Call: ## lm(formula = PriceChange ~ LoanPaymentsOverdue, data = ind) ## ## Coefficients: ## (Intercept) LoanPaymentsOverdue ## 4.514 -2.249 confint(m2) ## 2.5 % 97.5 % ## (Intercept) -2.532112 11.5611000 ## LoanPaymentsOverdue -4.163454 -0.3335853 #The confidence interval for the slope is [-4.163454,-0.3335853]. #Since the confidence interval doesn ' t contain 0, we can assume that there is a #significant negative linear association predict(m2, data.frame( LoanPaymentsOverdue = 4 ), interval = "confidence" ) ## fit lwr upr ## 1 -4.479585 -6.648849 -2.310322 #the estimated interval for E(Y|X = 4) is between -6.648849 and -2.310322. #0% is not a feasible value because it is not within the interval. Question 2b plot(m2$residuals ~ ind$LoanPaymentsOverdue, xlab = "residuals" , ylab = "loan payments overdue" ) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2 3 4 5 -4 -2 0 2 4 6 residuals loan payments overdue The residual plot seems to have random scatter and no clear pattern, thus confirming that the model is valid and accurate and confirming the linear relationship Question 3 #a b0 <- 0.6417099 b0_se <- 0.122707 b0_t <- 5.248 b0_margin <- qt(. 025 , df = 29 , lower.tail = FALSE) * b0_se c_int <- c(b0 - b0_margin, b0 + b0_margin) c_int ## [1] 0.3907459 0.8926739 #confidence interval is between 0.3907459 and 0.8926739 #b h0 <- . 01 ha <- 0.0112916 b1_se <- 0.0008184 tstat2 <- (ha - h0)/(b1_se) 2 * pt(abs(tstat2), df = 29 , lower.tail = FALSE) ## [1] 0.1253666 #The p value of 0.1253666 is greater than the significance level (0.05) so we fail to reject the null. #This means that we can ' t say that the true average processing time is very different from 0.01 hours #c time <- b0 + ( 130 )*ha se <- 0.3298 4
df <- 28 rss <- (seˆ 2 ) * df mse <- rss/ 30 error <- qt(. 975 , 28 ) * sqrt(mse) * sqrt( 1 + ( 1 / 30 )) error ## [1] 0.6634459 lower <- time - error upper <- time + error lower ## [1] 1.446172 upper ## [1] 2.773064 time ## [1] 2.109618 #the estimate is 2.109618 and the 95% prediction interval is from 1.446172 to 2.773064 Question 4 D is correct. D is correct because since the RSS measures the level of variance in the residuals and the residuals of model 1 clearly have less variance than model 2, the RSS for model 1 would be less than model 2. RSS is the square of the difference between actual and predicted data, and since the difference between actual and predicted in model 1 is smaller, RSS will thus be smaller. SSreg explains how much of the variance the regression line explains, and since the line is a better fit in model 1, there is more variance that can be explained by the line, and thus SSreg will be higher for model 1. 5