MAE301_HW8

pdf

School

Arizona State University, Tempe *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

9

Uploaded by ColonelMusic7302

Report
B1 a) %Load file data = readtable( 'mariokart_short.csv' ); % Get response variable and regressor total_price = data.total_pr; duration = data.duration; % Linear regression model mdl = fitlm(duration, total_price); % Display the coefficients (slope and intercept) fprintf( 'Linear regression model: Y = %.4fX + %.4f\n' , mdl.Coefficients.Estimate(2), mdl.Coefficients.Estimate(1)); Linear regression model: Y = -1.3172X + 52.3736 b) % R-squared R_squared = mdl.Rsquared.Ordinary; fprintf( 'R-squared value: %.4f\n' , R_squared); R-squared value: 0.1400 Since the R value is not close to 1 that means that a large amount of the variance cannot be explained by the model meaning that the fit is not good. c) % Residuals residuals = mdl.Residuals.Raw; figure; plot(duration, residuals, 'o' ); xlabel( 'Duration (days)' ); ylabel( 'Residuals' ); title( 'Residuals Plot' );
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Yes, because they have a constant variance and they are spread evenly along the horizontal axis. B2 a) % Variables for multivariable linear regression X = data{:, { 'id' , 'duration' , 'n_bids' , 'start_pr' , 'wheels' }}; Y = data.total_pr; % Multivariable linear regression model mdl_multi = fitlm(X, Y); % Slopes and intercept coefficients = mdl_multi.Coefficients.Estimate; fprintf( 'Multivariable linear regression model: Y = %.4f(id) + %.4f(duration) + %.4f(n_bids) + %.4f(start_pr) + %.4f(wheels) + %.4f\n' , coefficients(2), coefficients(3), coefficients(4), coefficients(5), coefficients(6), coefficients(1)); Multivariable linear regression model: Y = 0.0000(id) + -0.6388(duration) + 0.2713(n_bids) + 0.2031(start_pr) + 7.5606(wheels) + 34.7461 % Display R-squared and adjusted R-squared
fprintf( 'R-squared value: %.4f\n' , mdl_multi.Rsquared.Ordinary); R-squared value: 0.7272 fprintf( 'Adjusted R-squared value: %.4f\n' , mdl_multi.Rsquared.Adjusted); Adjusted R-squared value: 0.7171 b) % Check p values disp(mdl_multi) Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 + x5 Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 34.746 2.134 16.282 1.1847e-33 x1 4.3106e-12 4.7829e-12 0.90125 0.36906 x2 -0.63883 0.1709 -3.7381 0.00027291 x3 0.27133 0.093364 2.9061 0.0042782 x4 0.20309 0.036239 5.6044 1.1309e-07 x5 7.5606 0.5257 14.382 5.0988e-29 Number of observations: 141, Error degrees of freedom: 135 Root Mean Squared Error: 4.85 R-squared: 0.727, Adjusted R-Squared: 0.717 F-statistic vs. constant model: 72, p-value = 2.2e-36 I would remove x1 (id) becuase when performing backwards elimination you want to remove the regressor with the highest p value because it is less significant. c) % Remove regressor X_new = data{:, { 'duration' , 'n_bids' , 'start_pr' , 'wheels' }}; mdl_multi_new = fitlm(X_new, Y); % New R-squared value fprintf( 'New R-squared value after removing a regressor: %.4f\n' , mdl_multi_new.Rsquared.Ordinary); New R-squared value after removing a regressor: 0.7256 The value stayed fairly similar because the removed regressor (id) did not have much of an affect on the variance.
d) % New adjusted R-squared value fprintf( 'New adjusted R-squared value after removing a regressor: %.4f\n' , mdl_multi_new.Rsquared.Adjusted); New adjusted R-squared value after removing a regressor: 0.7175 The value stayed fairly similar because the removed regressor (id) did not have much of an affect on the variance. B3 a) % Convert categorical variables to categorical type data.cond = categorical(data.cond); % Multivariable linear regression model with stepwise regression mdl_stepwise = stepwiselm(data, 'total_pr ~ duration + n_bids + start_pr + wheels' ); 1. Adding cond, FStat = 26.054, pValue = 1.10496e-06 2. Adding seller_rate, FStat = 4.9577, pValue = 0.027644 3. Adding cond:wheels, FStat = 4.1369, pValue = 0.043945 4. Removing duration, FStat = 0.010271, pValue = 0.91943 % Results disp(mdl_stepwise) Linear regression model: total_pr ~ 1 + n_bids + start_pr + seller_rate + cond*wheels Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 35.126 2.1286 16.502 4.4225e-34 n_bids 0.17396 0.084378 2.0617 0.041167 cond_used -1.4265 1.7022 -0.838 0.40353 start_pr 0.14638 0.033631 4.3524 2.6507e-05 seller_rate 2.2525e-05 7.8243e-06 2.8789 0.0046472 wheels 9.0112 0.91305 9.8693 1.332e-17 cond_used:wheels -2.4609 1.0605 -2.3205 0.021822 Number of observations: 141, Error degrees of freedom: 134 Root Mean Squared Error: 4.32 R-squared: 0.785, Adjusted R-Squared: 0.775 F-statistic vs. constant model: 81.5, p-value = 2.85e-42
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
% Coefficients and statistics coefficients = mdl_stepwise.Coefficients.Estimate; R_squared = mdl_stepwise.Rsquared.Ordinary; R_squared_adj = mdl_stepwise.Rsquared.Adjusted; % Best linear model fprintf( 'Best linear model: Y = %.4f(duration) + %.4f(n_bids) + %.4f(start_pr) + %.4f(wheels) + %.4f\n' , ... coefficients(2), coefficients(3), coefficients(4), coefficients(5), coefficients(1)); Best linear model: Y = 0.1740(duration) + -1.4265(n_bids) + 0.1464(start_pr) + 0.0000(wheels) + 35.1264 % R-squared and adjusted R-squared values fprintf( 'R-squared value: %.4f\n' , R_squared); R-squared value: 0.7849 fprintf( 'Adjusted R-squared value: %.4f\n' , R_squared_adj); Adjusted R-squared value: 0.7752 b) The model created in part B3 is the best because its R and Radj values are the closest to 1 so they have the best fit. c) % Make plots to check conditions figure; plotResiduals(mdl_stepwise, 'probability' ); title( 'Residuals Probability Plot' );
figure; plotDiagnostics(mdl_stepwise, 'leverage' ); title( 'Leverage Plot' );
The model does not violate any conditions because the residuals are normally distributed and the leverage plot is also normal with few outliers.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help