MAE301_HW8
pdf
keyboard_arrow_up
School
Arizona State University, Tempe *
*We aren’t endorsed by this school
Course
301
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
9
Uploaded by ColonelMusic7302
B1 a) %Load file data = readtable(
'mariokart_short.csv'
); % Get response variable and regressor total_price = data.total_pr; duration = data.duration; % Linear regression model mdl = fitlm(duration, total_price); % Display the coefficients (slope and intercept) fprintf(
'Linear regression model: Y = %.4fX + %.4f\n'
, mdl.Coefficients.Estimate(2), mdl.Coefficients.Estimate(1)); Linear regression model: Y = -1.3172X + 52.3736 b) % R-squared R_squared = mdl.Rsquared.Ordinary; fprintf(
'R-squared value: %.4f\n'
, R_squared); R-squared value: 0.1400 Since the R value is not close to 1 that means that a large amount of the variance cannot be explained by the model meaning that the fit is not good. c) % Residuals residuals = mdl.Residuals.Raw; figure; plot(duration, residuals, 'o'
); xlabel(
'Duration (days)'
); ylabel(
'Residuals'
); title(
'Residuals Plot'
);
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Yes, because they have a constant variance and they are spread evenly along the horizontal axis. B2 a) % Variables for multivariable linear regression X = data{:, {
'id'
,
'duration'
, 'n_bids'
, 'start_pr'
, 'wheels'
}}; Y = data.total_pr; % Multivariable linear regression model mdl_multi = fitlm(X, Y); % Slopes and intercept coefficients = mdl_multi.Coefficients.Estimate; fprintf(
'Multivariable linear regression model: Y = %.4f(id) + %.4f(duration) + %.4f(n_bids) + %.4f(start_pr) + %.4f(wheels) + %.4f\n'
, coefficients(2), coefficients(3), coefficients(4), coefficients(5), coefficients(6), coefficients(1)); Multivariable linear regression model: Y = 0.0000(id) + -0.6388(duration) + 0.2713(n_bids) + 0.2031(start_pr) + 7.5606(wheels) + 34.7461 % Display R-squared and adjusted R-squared
fprintf(
'R-squared value: %.4f\n'
, mdl_multi.Rsquared.Ordinary); R-squared value: 0.7272 fprintf(
'Adjusted R-squared value: %.4f\n'
, mdl_multi.Rsquared.Adjusted); Adjusted R-squared value: 0.7171 b) % Check p values disp(mdl_multi) Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 + x5 Estimated Coefficients: Estimate
SE
tStat
pValue
__________
__________
_______
__________
(Intercept)
34.746 2.134 16.282 1.1847e-33 x1 4.3106e-12 4.7829e-12 0.90125 0.36906 x2 -0.63883 0.1709 -3.7381 0.00027291 x3 0.27133 0.093364 2.9061 0.0042782 x4 0.20309 0.036239 5.6044 1.1309e-07 x5 7.5606 0.5257 14.382 5.0988e-29 Number of observations: 141, Error degrees of freedom: 135 Root Mean Squared Error: 4.85 R-squared: 0.727, Adjusted R-Squared: 0.717 F-statistic vs. constant model: 72, p-value = 2.2e-36 I would remove x1 (id) becuase when performing backwards elimination you want to remove the regressor with the highest p value because it is less significant. c) % Remove regressor X_new = data{:, {
'duration'
, 'n_bids'
, 'start_pr'
, 'wheels'
}}; mdl_multi_new = fitlm(X_new, Y); % New R-squared value fprintf(
'New R-squared value after removing a regressor: %.4f\n'
, mdl_multi_new.Rsquared.Ordinary); New R-squared value after removing a regressor: 0.7256 The value stayed fairly similar because the removed regressor (id) did not have much of an affect on the variance.
d) % New adjusted R-squared value fprintf(
'New adjusted R-squared value after removing a regressor: %.4f\n'
, mdl_multi_new.Rsquared.Adjusted); New adjusted R-squared value after removing a regressor: 0.7175 The value stayed fairly similar because the removed regressor (id) did not have much of an affect on the variance. B3 a) % Convert categorical variables to categorical type data.cond = categorical(data.cond); % Multivariable linear regression model with stepwise regression mdl_stepwise = stepwiselm(data, 'total_pr ~ duration + n_bids + start_pr + wheels'
); 1. Adding cond, FStat = 26.054, pValue = 1.10496e-06 2. Adding seller_rate, FStat = 4.9577, pValue = 0.027644 3. Adding cond:wheels, FStat = 4.1369, pValue = 0.043945 4. Removing duration, FStat = 0.010271, pValue = 0.91943 % Results disp(mdl_stepwise) Linear regression model: total_pr ~ 1 + n_bids + start_pr + seller_rate + cond*wheels Estimated Coefficients: Estimate
SE
tStat
pValue
__________
__________
_______
__________
(Intercept) 35.126 2.1286 16.502 4.4225e-34 n_bids 0.17396 0.084378 2.0617 0.041167 cond_used -1.4265 1.7022 -0.838 0.40353 start_pr 0.14638 0.033631 4.3524 2.6507e-05 seller_rate 2.2525e-05 7.8243e-06 2.8789 0.0046472 wheels 9.0112 0.91305 9.8693 1.332e-17 cond_used:wheels
-2.4609 1.0605 -2.3205 0.021822 Number of observations: 141, Error degrees of freedom: 134 Root Mean Squared Error: 4.32 R-squared: 0.785, Adjusted R-Squared: 0.775 F-statistic vs. constant model: 81.5, p-value = 2.85e-42
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
% Coefficients and statistics coefficients = mdl_stepwise.Coefficients.Estimate; R_squared = mdl_stepwise.Rsquared.Ordinary; R_squared_adj = mdl_stepwise.Rsquared.Adjusted; % Best linear model fprintf(
'Best linear model: Y = %.4f(duration) + %.4f(n_bids) + %.4f(start_pr) + %.4f(wheels) + %.4f\n'
, ... coefficients(2), coefficients(3), coefficients(4), coefficients(5), coefficients(1)); Best linear model: Y = 0.1740(duration) + -1.4265(n_bids) + 0.1464(start_pr) + 0.0000(wheels) + 35.1264 % R-squared and adjusted R-squared values fprintf(
'R-squared value: %.4f\n'
, R_squared); R-squared value: 0.7849 fprintf(
'Adjusted R-squared value: %.4f\n'
, R_squared_adj); Adjusted R-squared value: 0.7752 b) The model created in part B3 is the best because its R and Radj values are the closest to 1 so they have the best fit. c) % Make plots to check conditions figure; plotResiduals(mdl_stepwise, 'probability'
); title(
'Residuals Probability Plot'
);
figure; plotDiagnostics(mdl_stepwise, 'leverage'
); title(
'Leverage Plot'
);
The model does not violate any conditions because the residuals are normally distributed and the leverage plot is also normal with few outliers.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Describe about how to place a regression line?
arrow_forward
Decide if gender of the students has a significant influence on the relationship between exam 2 and final results. Assume that the intercepts of the models for males and females are the same. Do not test for equality of intercepts. Are the slopes of the relationships between final and exam2 scores difference for males and females? Test on 10% significance. Include complex and simple model, hypotheses, test statistic, pvalue, and final regression equations for men and women
arrow_forward
For many people, breakfast cereal is an important source of fiber in their diet. Cereals also contain potassium, a mineral shown to be associated with maintaining a healthy blood pressure. And analysis of amount of fiber (in grams) and the potassium content (in milligrams)  in servings of 77 breakfast cereals produce the regression model Potassium =39+26 Fiber. If your cereal provides 10 g of fiber per serving, how much potassium does the model estimate you will get?
___Milligrams of potassium
arrow_forward
Explain how using multiple linear regression controls for confounding.
arrow_forward
how to perform a linear regression to test the relationship between an independent and dependent variable where the data consists of 5 groups with repeated measures?
arrow_forward
Create scatterplots using the data in the spreadsheet linked above and display the equation for the regression line.
What is the equation for the regression line that predicts mortgage amount using household income as the explanatory variable?
Y^=x+()
What is the interpretation of the slope?
What is the interpretation of the intercept?
here is the data from excel
Household income
Mortgage amount
64,609
176,425
58,873
120,567
63,481
131,494
57,872
138,437
57,700
132,467
71,920
175,374
56,885
139,545
59,619
139,719
59,886
162,774
59,768
122,939
56,894
151,489
63,451
138,789
72,780
224,928
51,664
138,554
73,227
251,922
74,801
179,054
72,997
239,289
62,447
237,610
63,173
145,358
66,390
185,777
63,805
147,241
51,113
141,302
48,829
129,383
62,318
185,527
53,681
188,223
57,016
175,086
51,348
126,485
43,903
151,851
81,084
252,520
43,441
122,107
50,343
162,520…
arrow_forward
Interpret an R^2 value of 0.62 for a linear regression model where X is the independent variable and Y is the dependent variable.
arrow_forward
· Develop a simple linear regression equation for starting salaries using an independent
variable that has the closest relationship with the salaries. Explain how you chose this
variable.
arrow_forward
A 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking related to the risk of strokes. The data file “Stroke.xslx” includes a portion of the data from the study. The variable “Risk of Stroke” is measured as the percentage of risk (proportion times 100) that a person will have a stroke over the next 10-year period.
Regression Analysis As Image:
1) Based on the simple regression analysis output, write the estimated regression equation.
2) What is the correlation coefficient between Risk of Stroke and Age? How do you find i
arrow_forward
Construct the equation of the regression line.
An editing firm compiled the following table which lists the number of pages contained in a piece of technical writing and the cost of proofreading and correcting them (in dollars). Assume there is a significant linear relationship between X and Y and construct the equation of the linear regression line.
Number of Pages, x
7
12
4
14
25
30
Cost, y
128
213
75
250
446
540
a.) y^= 17.9(x) + 1.6
b.) y^= 7.1(x) + 15.4
c.) y^= 15.4(x) + 7.1
d.) y^= 1.6(x) +
arrow_forward
Two variables have a positive linear correlation. Is the slope of the regression line for the variables positive or negative?
arrow_forward
Using the data below, determine the value of the slope B, in the
linear regression line ŷ = B, + B1x. Give your answer to three
decimal places.
2Ti = 1079, 4 = 1100, } = 67070
u? = 74908, C;Yi = 67546, n = 20
arrow_forward
Using SAS, draw a scatterplot between variables CRIME_RATE and PROP_CHANGE_INCOME. Attach the
scatterplot. Are those two variables good candidates to be analyzed using linear regression? Explain why or why
not.
crime_rate
150
100
50
O
15
O
O
20
O
O
O
25
O
8
O
O
8
O
O
o
O
prop_change_income
O
O
O
30
O
O
O
35
O
O
O
40
arrow_forward
Finally, the researcher considers using regression analysis to establish a linear relationship between the two variables – hours worked per week and yearly income.
(a) Estimate a simple linear regression model and present the estimated linear equation. Display the regression summary table and interpret the intercept and slope coefficient estimates of the linear model.
(b) Display and interpret the value of the coefficient of determination, R-squared (R2).
Data
Hours Per Week
Yearly Income ('000's)
Class
18
43.8
13
44.5
18
44.8
25.5
46.0
11.5
41.2
18
43.3
16
43.6
27
46.2
27.5
46.8
30.5
48.2
24.5
49.3
32.5
53.8
25
53.9
23.5
54.2
30.5
50.5
27.5
51.2
28
51.5
26
52.6
25.5
52.8
26.5
52.9
33
49.5
15
49.8
27.5
50.3
36
54.3
27
55.1
34.5
55.3
39
61.7
37
62.3
31.5
63.4
37
63.7
24.5
55.5
28
55.6
19
55.7
38.5
58.2
37.5
58.3
18.5
58.4
32
59.2
35…
arrow_forward
Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model
What is the predicted average mileage at tire pressure x = 31?
arrow_forward
Construct a scatter plot in Excel with FloorArea as the independent variable and AssessmentValue as the dependent variable. Insert the bivariate linear regression equation and r^2 in your graph. Do you observe a linear relationship between the 2 variables?
FloorArea (Sq.Ft.)
Offices
Entrances
Age
AssessedValue ($'000)
4790
4
2
8
1796
4720
3
2
12
1544
5940
4
2
2
2094
5720
4
2
34
1968
3660
3
2
38
1567
5000
4
2
31
1878
2990
2
1
19
949
2610
2
1
48
910
5650
4
2
42
1774
3570
2
1
4
1187
2930
3
2
15
1113
1280
2
1
31
671
4880
3
2
42
1678
1620
1
2
35
710
1820
2
1
17
678
4530
2
2
5
1585
2570
2
1
13
842
4690
2
2
45
1539
1280
1
1
45
433
4100
3
1
27
1268
3530
2
2
41
1251
3660
2
2
33
1094
1110
1
2
50
638
2670
2
2
39
999
1100
1
1
20
653
5810
4
3
17
1914
2560
2
2
24
772
2340
3
1
5
890
3690
2
2
15
1282
3580
3
2
27
1264
3610
2
1
8
1162
3960
3
2
17
1447
arrow_forward
Give 2 characteristics that indicate a linear model may be appropriate to model a data set
arrow_forward
The regression equation to predict the total world gross ticket sales from the opening weekend ticket sales is:
WorldGross^=9.23+6.87⋅OpeningWeekend
Interpret the y-intercept of the regression line in context.
arrow_forward
Define the different ways to use linear regression?
arrow_forward
Consider the points {(1,4), (3,11), (4,10)}, whose slope* of the linear regression model is 2.21. What would be the y-intercept* of the linear model with this same slope, but adjusted to the following data: {(14.41), (18.54), (22.67)}
arrow_forward
Does the Regression line give information about all the data points in the data set? Does the Regression line usually have all the points in the data set on it?
arrow_forward
The following table gives the data for the average temperature and the snow accumulation in several small towns for a single month. Determine the equation of the regression line, yˆ=b0+b1xy^=b0+b1x. Round the slope and y-intercept to the nearest thousandth. Then determine if the regression equation is appropriate for making predictions at the 0.01 level of significance.
Average Temperatures and Snow Accumulations
Average Temperature (℉℉)
45
34
24
45
39
20
31
19
35
44
Snow Accumulation (in.in.)
9
16
24
9
15
28
25
18
16
5
1. Regression equation: y=__________
2. Is the equation appropriate? yes or no
arrow_forward
What is the coefficient of determination in linear regression and how is it interpreted in terms of the strength and quality of the relationship between variables?
arrow_forward
please help
arrow_forward
The following linear regression model predicts a person's height (cm) from the length of their shoe print (cm)
Height^=3.5×Shoe Print+80
What is the predicted height of a person with a 30 cm shoe print? Assume extrapolation is not an issue.
185 cm
80 cm
202.5 cm
167.5 cm
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Related Questions
- Describe about how to place a regression line?arrow_forwardDecide if gender of the students has a significant influence on the relationship between exam 2 and final results. Assume that the intercepts of the models for males and females are the same. Do not test for equality of intercepts. Are the slopes of the relationships between final and exam2 scores difference for males and females? Test on 10% significance. Include complex and simple model, hypotheses, test statistic, pvalue, and final regression equations for men and womenarrow_forwardFor many people, breakfast cereal is an important source of fiber in their diet. Cereals also contain potassium, a mineral shown to be associated with maintaining a healthy blood pressure. And analysis of amount of fiber (in grams) and the potassium content (in milligrams)  in servings of 77 breakfast cereals produce the regression model Potassium =39+26 Fiber. If your cereal provides 10 g of fiber per serving, how much potassium does the model estimate you will get? ___Milligrams of potassiumarrow_forward
- Explain how using multiple linear regression controls for confounding.arrow_forwardhow to perform a linear regression to test the relationship between an independent and dependent variable where the data consists of 5 groups with repeated measures?arrow_forwardCreate scatterplots using the data in the spreadsheet linked above and display the equation for the regression line. What is the equation for the regression line that predicts mortgage amount using household income as the explanatory variable? Y^=x+() What is the interpretation of the slope? What is the interpretation of the intercept? here is the data from excel Household income Mortgage amount 64,609 176,425 58,873 120,567 63,481 131,494 57,872 138,437 57,700 132,467 71,920 175,374 56,885 139,545 59,619 139,719 59,886 162,774 59,768 122,939 56,894 151,489 63,451 138,789 72,780 224,928 51,664 138,554 73,227 251,922 74,801 179,054 72,997 239,289 62,447 237,610 63,173 145,358 66,390 185,777 63,805 147,241 51,113 141,302 48,829 129,383 62,318 185,527 53,681 188,223 57,016 175,086 51,348 126,485 43,903 151,851 81,084 252,520 43,441 122,107 50,343 162,520…arrow_forward
- Interpret an R^2 value of 0.62 for a linear regression model where X is the independent variable and Y is the dependent variable.arrow_forward· Develop a simple linear regression equation for starting salaries using an independent variable that has the closest relationship with the salaries. Explain how you chose this variable.arrow_forwardA 10-year study conducted by the American Heart Association provided data on how age, blood pressure, and smoking related to the risk of strokes. The data file “Stroke.xslx” includes a portion of the data from the study. The variable “Risk of Stroke” is measured as the percentage of risk (proportion times 100) that a person will have a stroke over the next 10-year period. Regression Analysis As Image: 1) Based on the simple regression analysis output, write the estimated regression equation. 2) What is the correlation coefficient between Risk of Stroke and Age? How do you find iarrow_forward
- Construct the equation of the regression line. An editing firm compiled the following table which lists the number of pages contained in a piece of technical writing and the cost of proofreading and correcting them (in dollars). Assume there is a significant linear relationship between X and Y and construct the equation of the linear regression line. Number of Pages, x 7 12 4 14 25 30 Cost, y 128 213 75 250 446 540 a.) y^= 17.9(x) + 1.6 b.) y^= 7.1(x) + 15.4 c.) y^= 15.4(x) + 7.1 d.) y^= 1.6(x) +arrow_forwardTwo variables have a positive linear correlation. Is the slope of the regression line for the variables positive or negative?arrow_forwardUsing the data below, determine the value of the slope B, in the linear regression line ŷ = B, + B1x. Give your answer to three decimal places. 2Ti = 1079, 4 = 1100, } = 67070 u? = 74908, C;Yi = 67546, n = 20arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
