Assignment_7_15_and_7_16

docx

School

Clark University *

*We aren’t endorsed by this school

Course

123

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

14

Uploaded by MateWolverine2181

Report
Q: 7.15 Refer to Commercial properties Problems 6.18 and 7.7. Calculate R^2Y4, R^2Y1, R^2Y1|4, R^214, R^2Y3|124, and R2. Explain what each coefficient measures and interpret your results. How is the degree of marginal linear association between Y and X 1 affected, when adjusted for X4? Answer: To compute the R-squared values for various regression models, we utilize the lm function in R to fit the linear regression models, followed by employing the summary function to extract the R-squared values. Here is the code for it: Code: # Read the data into a data frame data <- read.csv("C:\\Users\\Aayush\\OneDrive\\Desktop\\MSDA\\Sem 2\\Linear Regression\\Assignements\\7.15\\Commercial_Properties.csv") View(data) SST <- sum((data$Y - mean(data$Y))^2) SST #R2Y4 Y4_model <- lm(X4 ~ Y, data) R2Y4 <- summary(Y4_model)$r.squared R2Y4 #R2Y1 Y1_model <- lm(Y ~ X1, data) R2Y1 <- summary(Y1_model)$r.squared R2Y1 #R2Y1|4 P1.modelX4 <- lm(Y ~ X4, data) SSE.X4 <- deviance(P1.modelX4); SSR.X4 <- SST-SSE.X4 P1.modelX14 <- lm(Y ~ X1+X4, data) SSE.X1X4 <- deviance(P1.modelX14); SSR.X1X4 <- SST-SSE.X1X4
SSRX1_X4<-SSR.X1X4-SSR.X4 R2Y1_4<-SSRX1_X4/SSE.X4 R2Y1_4 #R214 I4_model <- lm(Y ~ X1+X4, data) R214 <- summary(I4_model)$r.squared R214 #R2Y2|14 P1.modelX14 <- lm(Y ~ X1+X4, data) SSE.X1X4 <- deviance(P1.modelX14); SSR.X1X4 <- SST-SSE.X1X4 P1.modelX124 <- lm(Y ~ X1+X2+X4, data) SSE.X1X2X4 <- deviance(P1.modelX124); SSR.X1X2X4 <- SST-SSE.X1X2X4 SSRX2_X1X4<-SSR.X1X2X4-SSR.X1X4 R2Y2_14<-SSRX2_X1X4/SSE.X1X4 R2Y2_14 #R2Y2|13 P1.modelX124 <- lm(Y ~ X1+X2+X4, data) SSE.X1X2X4 <- deviance(P1.modelX124); SSR.X1X2X4 <- SST-SSE.X1X2X4 P1.modelX1234 <- lm(Y ~ X1+X2+X3+X4, data) SSE.X1X2X3X4 <- deviance(P1.modelX1234); SSR.X1X2X3X4 <- SST-SSE.X1X2X3X4 SSRX3_X1X2X4<-SSR.X1X2X3X4-SSR.X1X2X4 R2Y3_124<-SSRX3_X1X2X4/SSE.X1X2X4 R2Y3_124 #R2 full_model <- lm(Y ~ X1 + X2 + X3 + X4, data) R2 <- summary(full_model)$r.squared
R2 Output: > # Read the data into a data frame > data <- read.csv("C:\\Users\\Aayush\\OneDrive\\Desktop\\MSDA\\Sem 2\\Linear Regression\\Assignements\\7.15\\Commercial_Properties.csv") > View(data) > SST <- sum((data$Y - mean(data$Y))^2) > SST [1] 236.5575 > #R2Y4 > Y4_model <- lm(X4 ~ Y, data) > R2Y4 <- summary(Y4_model)$r.squared > R2Y4 [1] 0.2865058 > #R2Y1 > Y1_model <- lm(Y ~ X1, data) > R2Y1 <- summary(Y1_model)$r.squared > R2Y1 [1] 0.06264236 > #R2Y1|4 > P1.modelX4 <- lm(Y ~ X4, data) > SSE.X4 <- deviance(P1.modelX4); > SSR.X4 <- SST-SSE.X4 > P1.modelX14 <- lm(Y ~ X1+X4, data) > SSE.X1X4 <- deviance(P1.modelX14); > SSR.X1X4 <- SST-SSE.X1X4 > SSRX1_X4<-SSR.X1X4-SSR.X4 > R2Y1_4<-SSRX1_X4/SSE.X4 > R2Y1_4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[1] 0.2504679 > #R214 > I4_model <- lm(Y ~ X1+X4, data) > R214 <- summary(I4_model)$r.squared > R214 [1] 0.4652132 > #R2Y2|14 > P1.modelX14 <- lm(Y ~ X1+X4, data) > SSE.X1X4 <- deviance(P1.modelX14); > SSR.X1X4 <- SST-SSE.X1X4 > P1.modelX124 <- lm(Y ~ X1+X2+X4, data) > SSE.X1X2X4 <- deviance(P1.modelX124); > SSR.X1X2X4 <- SST-SSE.X1X2X4 > SSRX2_X1X4<-SSR.X1X2X4-SSR.X1X4 > R2Y2_14<-SSRX2_X1X4/SSE.X1X4 > R2Y2_14 [1] 0.2202037 > #R2Y2|13 > P1.modelX124 <- lm(Y ~ X1+X2+X4, data) > SSE.X1X2X4 <- deviance(P1.modelX124); > SSR.X1X2X4 <- SST-SSE.X1X2X4 > P1.modelX1234 <- lm(Y ~ X1+X2+X3+X4, data) > SSE.X1X2X3X4 <- deviance(P1.modelX1234); > SSR.X1X2X3X4 <- SST-SSE.X1X2X3X4 > SSRX3_X1X2X4<-SSR.X1X2X3X4-SSR.X1X2X4 > R2Y3_124<-SSRX3_X1X2X4/SSE.X1X2X4 > R2Y3_124 [1] 0.004254889 > #R2 > full_model <- lm(Y ~ X1 + X2 + X3 + X4, data)
> R2 <- summary(full_model)$r.squared > R2 [1] 0.5847496 In summary, R^2Y4 = .2865 R^2Y1 = .0626 R^2Y1|4 = .2505 R^214 = .4652 R^2Y2|14 = .2202 R^2Y3|124 =.0043 R2 = .5848 Therefore, 1) R^2Y4 = 0.2865058: This denotes the fraction of variance in Y that can be accounted for solely by the variable X4. 2) R^2Y1 = 0.06264236: This represents the portion of variance in Y explained by X1 alone. 3) R^2Y1|4 = 0.2504679: This indicates the proportion of variance in Y explained jointly by both X1 and X4. 4) R^214 = 0.4652132: Here, it shows the fraction of variance in Y explained by both X1 and X4, along with any other independent variables in the model. 5) R^2Y2|14 = 0.2202037: This reflects the amount of variance in Y explained by both X2 and X4, along with other independent variables. 6) R^2Y3|124 = 0.004254889: It denotes the portion of variance in Y explained by both X2 and X3, along with other independent variables. 7) R^2 = 0.5847496: This is the overall proportion of variance in Y explained by all independent variables in the model. Interpreting these outcomes, it's evident that X4 emerges as the most influential independent variable in explaining Y's variation, with an R-squared value of 0.2865. Although X1 alone contributes modestly to Y's variation (R-squared of 0.0626), its joint consideration with X4 leads to a larger explained variance in Y (R-squared of 0.2505). When all independent variables are considered together (R-squared of 0.5847), they moderately explain the variance in Y. While X2 alone explains a small fraction of Y's variance (R-squared of 0.0043), its inclusion alongside X4 and other variables enhances the model's explanatory power (R-squared of 0.2202).
Regarding the marginal linear association between Y and X1, it could be influenced by adjusting for X4. This adjustment may alter the strength of the association between Y and X1, depending on the correlation between X1 and X4. If they are positively correlated, adjusting for X4 might diminish the association between Y and X1. Conversely, if they are negatively correlated, adjusting for X4 could intensify the association between Y and X1, as the confounding effect of the X1-X4 relationship diminishes. Q: 7.16 Refer to Brand preference Problem 6.5.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
a. Transform the variables by means of the correlation transformation (7.44) and fit the standardized regression model (7.45). b. Interpret the standardized regression coefficient bi*. c. Transform the estimated standardized regression coefficients by means of (7.53) back to the ones for the fitted regression model in the original variables. Verify that they are the same as the ones obtained in Problem 6.5b. Answer: I have used the following code to get the output for the 3 required problems: Code: # Read the dataset brand_data <- read.csv("C:\\Users\\Aayush\\OneDrive\\Desktop\\MSDA\\Sem 2\\ Linear Regression\\Assignements\\7.16\\Brand_Prefrences.csv") # Standardize the variables using correlation transformation standardized_brand_data <- as.data.frame(scale(brand_data)) # Fit the standardized regression model standardized_model <- lm(Yi ~ Xi1 + Xi2, data = standardized_brand_data) # Print the summary of the standardized model summary(standardized_model) # a) Yhat* = .885X*1 + .402X*2 ---- Answer b) Answer: In the standardized regression model Yhat* = .885X1 + .402X2, coefficients represent the change in Yhat* for a one-standard-deviation increase in the corresponding standardized independent variable. Specifically, a one-standard- deviation increase in X1 corresponds to a .885 standard deviation increase in Yhat, while a one-standard-deviation increase in X2 corresponds to a .402 standard deviation increase in Yhat. These coefficients quantify the relative importance and direction of the relationships between the independent variables and the dependent variable.
#Running through it again for part c(manually) #c) # Data Y <- c(64, 73, 61, 76, 72, 80, 71, 83, 83, 89, 86, 93, 88, 95, 94, 100) X1 <- c(4, 4, 4, 4, 6, 6, 6, 6, 8, 8, 8, 8, 10, 10, 10, 10) X2 <- c(2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4) # Combine into a data frame data <- data.frame(Y = Y, X1 = X1, X2 = X2) # Correlation transformation cor_matrix <- cor(data) X1_std <- (data$X1 - mean(data$X1)) / sd(data$X1) X2_std <- (data$X2 - mean(data$X2)) / sd(data$X2) Y_std <- (data$Y - mean(data$Y)) / sd(data$Y) # Fit the standardized regression model lm_std <- lm(Y_std ~ X1_std + X2_std, data = data) # Summary of the model summary(lm_std) # Interpret the standardized regression coefficient b1* std_coef <- summary(lm_std)$coefficients[, "Estimate"] b1_star <- std_coef["X1_std"] b1_star # Transformation to original variables
b0 <- mean(data$Y) - b1_star * mean(data$X1) * sd(data$Y) / sd(data$X1) - std_coef["X2_std"] * mean(data$X2) * sd(data$Y) / sd(data$X2) b1 <- b1_star * sd(data$Y) / sd(data$X1) b2 <- std_coef["X2_std"] * sd(data$Y) / sd(data$X2) # Display the coefficients cat("Original Coefficients:\n") cat("b0:", b0, "\n") cat("b1:", b1, "\n") cat("b2:", b2, "\n") # Compute sY, s1, and s2 sY <- sd(data$Y) s1 <- sd(data$X1) s2 <- sd(data$X2) # Output sY, s1, and s2 cat("sY:", sY, "\n") cat("s1:", s1, "\n") cat("s2:", s2, "\n") Output: > # Read the dataset > brand_data <- read.csv("C:\\Users\\Aayush\\OneDrive\\Desktop\\MSDA\\Sem 2\\ Linear Regression\\Assignements\\7.16\\Brand_Prefrences.csv")
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> > # Standardize the variables using correlation transformation > standardized_brand_data <- as.data.frame(scale(brand_data)) > > # Fit the standardized regression model > standardized_model <- lm(Yi ~ Xi1 + Xi2, data = standardized_brand_data) > > # Print the summary of the standardized model > summary(standardized_model) Call: lm(formula = Yi ~ Xi1 + Xi2, data = standardized_brand_data) Residuals: Min 1Q Median 3Q Max -0.46424 -0.18227 0.02285 0.15994 0.36661 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.00000 0.06294 0.000 1 Xi1 0.88503 0.06500 13.615 4.53e-09 *** Xi2 0.40223 0.06500 6.188 3.28e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2518 on 13 degrees of freedom Multiple R-squared: 0.9451, Adjusted R-squared: 0.9366 F-statistic: 111.8 on 2 and 13 DF, p-value: 6.439e-09 >
> #a) Yhat* = .885X*1 + .402X*2 ---- Answer > > #b) Answer: > # In the standardized regression model Yhat* = .885X1 + .402X2, coefficients > # represent the change in Yhat* for a one-standard-deviation increase in the > # corresponding standardized independent variable. Specifically, > # a one-standard-deviation increase in X1 corresponds to a .885 standard deviation > # increase in Yhat, while a one-standard-deviation increase in X2 corresponds > # to a .402 standard deviation increase in Yhat. These coefficients quantify the > # relative importance and direction of the relationships between the independent > # variables and the dependent variable. > > #Running through it again for part c(manually) > #c) > # Data > Y <- c(64, 73, 61, 76, 72, 80, 71, 83, 83, 89, 86, 93, 88, 95, 94, 100) > X1 <- c(4, 4, 4, 4, 6, 6, 6, 6, 8, 8, 8, 8, 10, 10, 10, 10) > X2 <- c(2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4) > > # Combine into a data frame > data <- data.frame(Y = Y, X1 = X1, X2 = X2) > > # Correlation transformation > cor_matrix <- cor(data) > X1_std <- (data$X1 - mean(data$X1)) / sd(data$X1) > X2_std <- (data$X2 - mean(data$X2)) / sd(data$X2) > Y_std <- (data$Y - mean(data$Y)) / sd(data$Y) > > # Fit the standardized regression model
> lm_std <- lm(Y_std ~ X1_std + X2_std, data = data) > > # Summary of the model > summary(lm_std) Call: lm(formula = Y_std ~ X1_std + X2_std, data = data) Residuals: Min 1Q Median 3Q Max -0.38423 -0.15391 0.00218 0.13863 0.36677 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.220e-16 5.880e-02 0.000 1 X1_std 8.924e-01 6.073e-02 14.695 1.78e-09 *** X2_std 3.946e-01 6.073e-02 6.498 2.01e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2352 on 13 degrees of freedom Multiple R-squared: 0.9521, Adjusted R-squared: 0.9447 F-statistic: 129.1 on 2 and 13 DF, p-value: 2.658e-09 > > # Interpret the standardized regression coefficient b1* > std_coef <- summary(lm_std)$coefficients[, "Estimate"] > b1_star <- std_coef["X1_std"] > b1_star X1_std
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
0.8923929 > > # Transformation to original variables > b0 <- mean(data$Y) - b1_star * mean(data$X1) * sd(data$Y) / sd(data$X1) - std_coef["X2_std"] * mean(data$X2) * sd(data$Y) / sd(data$X2) > b1 <- b1_star * sd(data$Y) / sd(data$X1) > b2 <- std_coef["X2_std"] * sd(data$Y) / sd(data$X2) > > # Display the coefficients > cat("Original Coefficients:\n") Original Coefficients: > cat("b0:", b0, "\n") b0: 37.65 > cat("b1:", b1, "\n") b1: 4.425 > cat("b2:", b2, "\n") b2: 4.375 > > # Compute sY, s1, and s2 > sY <- sd(data$Y) > s1 <- sd(data$X1) > s2 <- sd(data$X2) > > # Output sY, s1, and s2 > cat("sY:", sY, "\n") sY: 11.45135 > cat("s1:", s1, "\n") s1: 2.309401 > cat("s2:", s2, "\n") s2: 1.032796
Summary of Regression Analysis We conducted a regression analysis to understand how two predictor variables (X1 and X2) relate to a response variable (Y). Our dataset included 16 observations. Model Summary We used a standardized regression model to predict the response variable. The model equation is: Predicted Y = b0 + b1*X1 + b2*X2 where: - b0, b1, and b2 are the regression coefficients. The estimated coefficients are: - b1 = 4.425 - b2 = 4.375 - b0 = 37.650 Interpretation - A one-standard-deviation increase in X1 corresponds to a 4.425 standard deviation increase in the predicted Y, holding other variables constant. - Similarly, a one-standard-deviation increase in X2 corresponds to a 4.375 standard deviation increase in the predicted Y, holding other variables constant. - The intercept (b0) represents the expected value of Y when both predictors are at their average values. In our case, it's 37.650. Standard Deviations The standard deviations of the variables are: - Standard deviation of Y: 11.45135 - Standard deviation of X1: 2.30940 - Standard deviation of X2: 1.03280 These values describe how spread out the data are around their averages.