MAE301_HW8

pdf

School

Arizona State University, Tempe *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by ColonelMusic7302

B1 a) %Load file data = readtable( 'mariokart_short.csv' ); % Get response variable and regressor total_price = data.total_pr; duration = data.duration; % Linear regression model mdl = fitlm(duration, total_price); % Display the coefficients (slope and intercept) fprintf( 'Linear regression model: Y = %.4fX + %.4f\n' , mdl.Coefficients.Estimate(2), mdl.Coefficients.Estimate(1)); Linear regression model: Y = -1.3172X + 52.3736 b) % R-squared R_squared = mdl.Rsquared.Ordinary; fprintf( 'R-squared value: %.4f\n' , R_squared); R-squared value: 0.1400 Since the R value is not close to 1 that means that a large amount of the variance cannot be explained by the model meaning that the fit is not good. c) % Residuals residuals = mdl.Residuals.Raw; figure; plot(duration, residuals, 'o' ); xlabel( 'Duration (days)' ); ylabel( 'Residuals' ); title( 'Residuals Plot' );

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Yes, because they have a constant variance and they are spread evenly along the horizontal axis. B2 a) % Variables for multivariable linear regression X = data{:, { 'id' , 'duration' , 'n_bids' , 'start_pr' , 'wheels' }}; Y = data.total_pr; % Multivariable linear regression model mdl_multi = fitlm(X, Y); % Slopes and intercept coefficients = mdl_multi.Coefficients.Estimate; fprintf( 'Multivariable linear regression model: Y = %.4f(id) + %.4f(duration) + %.4f(n_bids) + %.4f(start_pr) + %.4f(wheels) + %.4f\n' , coefficients(2), coefficients(3), coefficients(4), coefficients(5), coefficients(6), coefficients(1)); Multivariable linear regression model: Y = 0.0000(id) + -0.6388(duration) + 0.2713(n_bids) + 0.2031(start_pr) + 7.5606(wheels) + 34.7461 % Display R-squared and adjusted R-squared

fprintf( 'R-squared value: %.4f\n' , mdl_multi.Rsquared.Ordinary); R-squared value: 0.7272 fprintf( 'Adjusted R-squared value: %.4f\n' , mdl_multi.Rsquared.Adjusted); Adjusted R-squared value: 0.7171 b) % Check p values disp(mdl_multi) Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 + x5 Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 34.746 2.134 16.282 1.1847e-33 x1 4.3106e-12 4.7829e-12 0.90125 0.36906 x2 -0.63883 0.1709 -3.7381 0.00027291 x3 0.27133 0.093364 2.9061 0.0042782 x4 0.20309 0.036239 5.6044 1.1309e-07 x5 7.5606 0.5257 14.382 5.0988e-29 Number of observations: 141, Error degrees of freedom: 135 Root Mean Squared Error: 4.85 R-squared: 0.727, Adjusted R-Squared: 0.717 F-statistic vs. constant model: 72, p-value = 2.2e-36 I would remove x1 (id) becuase when performing backwards elimination you want to remove the regressor with the highest p value because it is less significant. c) % Remove regressor X_new = data{:, { 'duration' , 'n_bids' , 'start_pr' , 'wheels' }}; mdl_multi_new = fitlm(X_new, Y); % New R-squared value fprintf( 'New R-squared value after removing a regressor: %.4f\n' , mdl_multi_new.Rsquared.Ordinary); New R-squared value after removing a regressor: 0.7256 The value stayed fairly similar because the removed regressor (id) did not have much of an affect on the variance.

d) % New adjusted R-squared value fprintf( 'New adjusted R-squared value after removing a regressor: %.4f\n' , mdl_multi_new.Rsquared.Adjusted); New adjusted R-squared value after removing a regressor: 0.7175 The value stayed fairly similar because the removed regressor (id) did not have much of an affect on the variance. B3 a) % Convert categorical variables to categorical type data.cond = categorical(data.cond); % Multivariable linear regression model with stepwise regression mdl_stepwise = stepwiselm(data, 'total_pr ~ duration + n_bids + start_pr + wheels' ); 1. Adding cond, FStat = 26.054, pValue = 1.10496e-06 2. Adding seller_rate, FStat = 4.9577, pValue = 0.027644 3. Adding cond:wheels, FStat = 4.1369, pValue = 0.043945 4. Removing duration, FStat = 0.010271, pValue = 0.91943 % Results disp(mdl_stepwise) Linear regression model: total_pr ~ 1 + n_bids + start_pr + seller_rate + cond*wheels Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 35.126 2.1286 16.502 4.4225e-34 n_bids 0.17396 0.084378 2.0617 0.041167 cond_used -1.4265 1.7022 -0.838 0.40353 start_pr 0.14638 0.033631 4.3524 2.6507e-05 seller_rate 2.2525e-05 7.8243e-06 2.8789 0.0046472 wheels 9.0112 0.91305 9.8693 1.332e-17 cond_used:wheels -2.4609 1.0605 -2.3205 0.021822 Number of observations: 141, Error degrees of freedom: 134 Root Mean Squared Error: 4.32 R-squared: 0.785, Adjusted R-Squared: 0.775 F-statistic vs. constant model: 81.5, p-value = 2.85e-42

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

% Coefficients and statistics coefficients = mdl_stepwise.Coefficients.Estimate; R_squared = mdl_stepwise.Rsquared.Ordinary; R_squared_adj = mdl_stepwise.Rsquared.Adjusted; % Best linear model fprintf( 'Best linear model: Y = %.4f(duration) + %.4f(n_bids) + %.4f(start_pr) + %.4f(wheels) + %.4f\n' , ... coefficients(2), coefficients(3), coefficients(4), coefficients(5), coefficients(1)); Best linear model: Y = 0.1740(duration) + -1.4265(n_bids) + 0.1464(start_pr) + 0.0000(wheels) + 35.1264 % R-squared and adjusted R-squared values fprintf( 'R-squared value: %.4f\n' , R_squared); R-squared value: 0.7849 fprintf( 'Adjusted R-squared value: %.4f\n' , R_squared_adj); Adjusted R-squared value: 0.7752 b) The model created in part B3 is the best because its R and Radj values are the closest to 1 so they have the best fit. c) % Make plots to check conditions figure; plotResiduals(mdl_stepwise, 'probability' ); title( 'Residuals Probability Plot' );