ANguyen - Drills with R Week 10

docx

School

University of South Florida *

*We aren’t endorsed by this school

Course

6217

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

9

Uploaded by qinhann

Report
Week 6 – Drills with R 1 Week 10 – Drills with R An Nguyen University of the Cumberlands Statistics for Data Science (MSDS-531-M30) – Full Term Dr. Ora Denton February 14 th , 2024
Week 6 – Drills with R 2 a) - R code to do so: - Results: - From the scatter plot, it appears that the variability of the selling price (y) increases as the tax bill (x1) increases. This violates the assumption of constant variability in #Question a #Assign the Houses data to var housesData housesData <- read.table("https://stat4ds.rwth-aachen.de/data/Houses.dat", header = TRUE) #Graph scatterplot plot(housesData$ taxes, housesData$ price, xlab = “Tax Bill (in $)”, ylab = “Selling Price”, main = “Scatter Plot”)
Week 6 – Drills with R 3 the normal generalized linear model (GLM) structure. The scatter plot shows a cone-shaped pattern, with the spread of the data points increasing as the tax bill increases. This pattern indicates heteroscedasticity, where the variance of the response variable (selling price) is not constant across the values of the predictor variable (tax bill). b) - R code to do so: - Result: #Question b #Fit Normal GLM with Identity Link Function normGLM <- glm( price ~ taxes+new, data = housesData, family = gaussian(link = “identity”)) #Fit Gamma GLM with Identity Link Function gammaGLM <- glm( price ~ taxes+new, data = housesData, family = Gamma(link =”identity”)) #Get stat summary and interpret x2 effect for each model summary(normGLM) summary(gammaGLM)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Week 6 – Drills with R 4 - The coefficient estimates from normal GLM, for new is 86.200077 with a standard error of 27.244808. The associated t-value is 3.164, and the corresponding p-value is 0.00208. This shows that for the normal GLM, when all other variables are held constant, a one-unit increase in the variable new is associated with an increase in the response variable (price) by approximately $86.20. - The coefficient estimates from gamma GLM, for new is 80.545392 with a standard error of 35.971003. The associated t-value is 2.239, and the corresponding p-value is 0.0274. This shows that with the gamma GLM, when all other variables are held constant, a one-unit increase in the variable
Week 6 – Drills with R 5 new is associated with an increase in the response variable (price) by approximately $80.55. c) - R code to do so: #Question c #Examine mean selling price of $200k newHousesData1 <- data.frame(taxes=200, new=0) #Generate predictions and confidence intervals predict(normGLM, newdata = newHousesData, se.fit = TRUE, interval = “confidence” #Examine mean selling price of $150k newHousesData2 <- data.frame(taxes=150, new=0) #Generate predictions and confidence intervals predict(normGLM, newdata = newHousesData2, se.fit = TRUE, interval = “confidence” #Examine mean selling price of $300k newHousesData3 <- data.frame(taxes=300, new=0) #Generate predictions and confidence intervals predict(normGLM, newdata = newHousesData3, se.fit = TRUE, interval = “confidence” #Examine mean selling price of $450k newHousesData2 <- data.frame(taxes=450, new=0) #Generate predictions and confidence intervals predict(normGLM, newdata = newHousesData2, se.fit = TRUE, interval = “confidence”
Week 6 – Drills with R 6 - Result:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Week 6 – Drills with R 7
Week 6 – Drills with R 8 - From the result, we can see that as the mean selling price increases from $150k to $450k, there is a corresponding increase in the estimated mean selling price (fit). The standard error of the fit varies slightly across different mean selling prices, but the differences are relatively small. The residual scale remains constant across all mean selling prices, indicating that the dispersion of the response variable (price) around the fitted values is consistent regardless of the mean selling price. Overall, the estimated variability in the selling process, as captured by the standard error of the fit and residual scale, does not exhibit significant variation with changes in the mean selling price. This suggests that the model's estimation of variability in selling prices remains relatively stable across different mean selling price scenarios. d) - R code to do so: - Result: #Question d #Comparing AIC values of both models used AIC(normGLM) AIC(gammaGLM)
Week 6 – Drills with R 9 The analysis indicates that a gamma generalized linear model (GLM) might be a better fit for the data compared to a normal GLM. This conclusion is based on the Akaike Information Criterion (AIC) values. The gamma GLM has a lower AIC score (1106.705) compared to the normal GLM (1162.178). Lower AIC scores generally indicate a better balance between the model's ability to fit the data and its overall complexity.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help