MAT 303 Module Two Problem Set Report
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
303
Subject
Mathematics
Date
Feb 20, 2024
Type
docx
Pages
7
Uploaded by GeneralOxide15155
MAT 303 Module Two Problem Set Report
Interaction Terms and Qualitative Predictors
Brian Tynan
Brian.Tynan@snhu.edu
Southern New Hampshire University
1. Introduction I am an analyst working for a car manufacturer and I have been given a dataset to analyze a car’s fuel economy. The statistical analysis that is being performed involves the determination and association consisting of a vehicle’s fuel efficiency and a couple other variables. We are working to figure out how weight, horsepower, and gear ratio relate to fuel economy for the first model. The second model consists of us analyzing the correlation between fuel efficiency and weight, horsepower of a vehicle, and
the number of cylinders. Automobile manufacturers will use this data to calculate the efficiency of each of their vehicle models. This data will help them understand the effects that each of the factors have both separately and collectively on the fuel efficiency of their vehicles. I will be running two multiple regression models that will each have separate variables evaluated with the fuel economy in mpg. A Q-
Q plot will be created to verify the normality assumptions of the residuals and the fitted values will be determined, along with the residuals and it will have the fitted values against the residuals displayed, and it will determine the confidence intervals for each of the models. This data will be used to make a determination on whether each of the variables included in each of the models have a statistically significant influence on the fuel economy of each of the vehicles and the fuel economy will be measured
in miles per gallon.
2. Data Preparation There are key factors that I will be investigating in this data set that consists of the rear axle ratio, weight, horsepower, and fuel economy. I will be using my calculations to determine if factors such as weight, horsepower, and rear axle ration have a statistical significance impact on the fuel economy on the vehicles in this study. There are a total of 32 rows of data. There is a row for each of the vehicles in this dataset and there are 12 columns, a column for each variable pertaining to said vehicle. 3. Model with Interaction Term
Correlation Analysis The above correlation matrix shows the relationship between fuel efficiency, weight, horsepower, and the rear axle ratio. The correlation coefficient of -0.8677 seen in this matrix shows a strong inverse correlation between fuel efficiency and the weight of a vehicle. The correlation coefficient of -0.7762 seen in this matrix shows a moderately negative correlation between fuel efficiency and the horsepower
2
of a vehicle. The relationships seen here suggests that the fuel economy declines as the weight increases. Similar results are seen with the horsepower, as the horsepower increases the fuel efficiency declines. Now the rear axle ratio shows a marginal positive association in this matrix with a correlation coefficient of 0.6812.
Reporting Results The following information consists of a general description or explanation of the regression model for fuel efficiency that consists of interaction terms for weight and horsepower along with weight and rear axle ration as the predictors. E(Y)=β
0
+β
1
X
1
+ β
2
X
2
+β
3
X
3
+ β
4
X
1
X
2
+β
5
X
1
X
3
The regression model for fuel economy from the data set consists of weight, horsepower, and rear axle ratio as the predictors, in addition with the interaction terms for the two pairs of variables can be seen below:
E(Y)=75.68431−16.12967 X
1
−0.16480 X
2
−5.44987 X
3
+0.04069 X
1
X
2
+1.70650 X
1
X
3
R
2 is shown with a value of 0.8907. As seen in this regression model, the weight, horsepower, and rear axle ration for a vehicle account for 89% of the variation in the fuel economy. R
has a value of 0.8697. The value has been modified from R
2 for the number of predictors seen in this model. This model provides us with the opportunity to calculate the change in the fuel efficiency of a vehicle with a weight of 3.50 for every unit of increase in horsepower. We only use a piece of this equation due to we will be using the interaction terms.
E(Y)=75.68431−16.12967 X
1
−0.16480 X
2
−5.44987 X
3
+0.04069 X
1
X
2
+1.70650 X
1
X
3
E(Y)=−0.1648 X
2
+0.04069 X
1
X
2
E(Y)=−0.1648(1) +0.04069 (3.5) (1) E(Y)=−0.022385
Per the calculations for this model, for every unit increase in horsepower, the vehicles fuel efficiency decreases by 0.022385 for a vehicle weight of 3.50. The change in fuel efficiency of a vehicle with a weight of 3.50 and a unit increase in rear axle ration can be estimated using this same model. Again, due to we are using the interaction terms we will only be using a piece of this equation.
E(Y)=75.68431−16.12967 X
1
−0.16480 X
2
−5.44987 X
3
+0.04069 X
1
X
2
+1.70650 X
1
X
3
E(Y)=−5.44987 X
3
+1.70650 X
1
X
3
E(Y)=−5.44987 (1) +1.70650(3.5) (1) E(Y)=0.52288
Per the calculations for this model, you can see that the fuel efficiency decreases by 0.52288 for every unit increase in the rear axle ration for a vehicle with the weight of 3.5. I used this information to create
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
the Residuals against the Fitted Values plot and standard Q-Q Plot after acquiring the fitted values and residuals from the model for this data set.
This data meets the conditions for homoscedasticity due to there are no discernible patterns, as seen by the residuals against the fitted values plot, that also shows no discernible patterns. The estimates values
show that they were clearly drawn from Normal distributions in the Q-Q plot. I can conclude that the residuals in the Q-Q plot are normally distributed due to the plots do not show a significant deviation from the line.
Evaluating Model Significance An overall F-test is performed to determine whether the regression model is significant at a level of 5% significance. This is done by stating the null hypothesis and the alternative hypothesis in the following way:
H
0
: β
1
=β
2
=
⋯
=β
n
=0 H
a
: at least one β
i
≠0 for i=1, 2, ....
, n
The p-value is 1.503e-11, which is a lot less significant than the 5% level of significance. This means that we should accept the alternative hypothesis versus the null hypothesis. The results from this test shows that the fuel economy and at least of the other predictor variables are statistically correlated. Individual
bets tests will be ran on each variable to establish whether the elements in this model are significant at the 5% level of significance. Here are the alternative and null hypothesis for this data set:
H
0
: β
i
=0 for some i=1, 2, …, n H
a
: β
i
≠0
The weight has a p-value of 0.02624, the rear axle has a p-value of 0.25886, the horsepower has a p-
value of 0.00146, weight versus rear axle ratio has a p-value of 0.24447, and the weight versus horsepower has a p-value of 0.00595. The data shows that the p-values for weight, horsepower, and the interaction term for weight versus horsepower are all lower than 5%, and 5% is considered the level of significance for this data as shown by the p-values. This information allows us to conclude that each 4
of these factors influence the fuel economy of a vehicle in a statistically significant way. Due to the p-
values for the rear axle ration and weight against the rear axle ration are not lower than the 5% in can be determined that there is no statistically significant links between these factors and the fuel efficiency of a vehicle. Making Predictions Using the Model This regression model will be used to determine the fuel efficiency of a vehicle with a weight of 2.965, with a horsepower of 210, and rear axle ration of 2.91.
E(Y)=75.68431−16.12967 X
1
−0.16480 X
2
−5.44987 X
3
+0.04069 X
1
X
2
+1.70650 X
1
X
3
E(Y)=75.68431−16.12967(2.965) −0.16480(210) −5.44987(2.91) +0.04069(2.965) (210) + 1.70650(2.965) (2.91) E(Y)=17.452
It has been determined that the vehicles estimated fuel efficiency is 17.452 miles per gallon (mpg). The data set shows that the 95% prediction interval for this car is (12.4462, 22.4577). With this information it can be estimated that there is a 95% chance that a car with these attributes will have a fuel efficiency that lies within this range. The 95% confidence interval for the fuel economy for this vehicle is (15.2024,
19.7016). This allows us to believe that there is a 95% chance that the average fuel efficiency will be within this range when sample vehicles with these same attributes.
4. Model with Interaction Term and Qualitative Predictor
Reporting Results A regression model was built with an interaction term and a qualitative predictor. Below is the generalized regression model for fuel efficiency with the consideration of weight, horsepower, and interaction term for weight and horsepower, along with the number of cylinders.
E(Y)=β
0
+β
1
X
1
+β
2
X
2
+B
3
X
1
X
2
+β
4
X
3
+β
5
X
4
Below is the regression model for fuel efficiency that takes in the consideration the weight, horsepower, an interaction term for weight and horsepower, along with the number of cylinders. This information is from the imported data set.
E(Y)=47.337329−7.306337 X
1
−0.103331X
2
+0.023951 X
1
X
2
−1.25973 X
3
−1.454339 X
4
R
2 has a value of 0.888 and R
has a value of 0.8664. By utilizing the model that uses predictors for weight, horsepower, number of cylinders, and the interaction term of weight versus horsepower we are able to account for almost 89% of the variation in fuel efficiency of a vehicle, as seen by the coefficient of determination, R
2
, that is 0.888.
By using model 2, we were able to obtain the fitted values and the residuals, and we were able to plot the residuals against the fitted values and the normal Q-Q values as you can see below:
5
This data does meet the condition for homoscedasticity due to there are no discernible patterns, as you can see by the residuals against fitted values plot, which again shows no discernible patterns. There is an exception due to there is one possible outlier. The Q-Q plot shows that the quantities generally come
form Normal distributions.
Evaluating Model Significance An overall F-test is completed to determine when the regression modes is significant when at a 5% level of significance. This is done by stating the null hypothesis and alternative hypothesis accordingly:
H
0
: β
1
=β
2
=
⋯
=β
n
=0 H
a
: at least one β
i
≠0 for i=1, 2, ....
, n
The p-value is shown as 1.503e-11, which is significantly lower than the 5% level of significance. This allows us to determine that we should accept the alternative hypothesis for this data instead of the null hypothesis. These results indicate that the fuel economy and at least one of the predictor variables are statistically linked or correlated. Individual beta tests will be ran on each of the variables to establish whether the elements in this model are significant at a level of significance of 5%. Here are the alternative and null hypothesis:
H
0
: β
i
=0 for some i=1, 2, …, n H
a
: β
i
≠0
As seen weight has a p-value of 0.02624, the rear axle ratio has a p-value of 0.25886, the horsepower has a p-value of 0.00146, the weight versus axle ratio has a p-value of 0.24447, and the weight against horsepower has a p-value of 0.00595. We can see that the p-values for weight, horsepower, and the interaction term of weight versus horsepower are all lower than the 5% level of significance. This information allows us to determine that each of these variables has a statistically significant association, meaning that we should reject the null hypothesis and accept the alternative hypothesis regarding fuel efficiency of a vehicle. Now with that being said we should not rule out the null hypothesis due to the p-
value for automobiles with 6 and 8 cylinders show as being above the 5% level of significance. This 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
allows us to conclude that neither of these factors along with fuel economy show any statistically significant correlations or links.
Making Predictions Using the Model An estimate can be made for the fuel efficiency for a vehicle with a weight of 2.965, with a horsepower of 210-, and 6-cylinders using model 2. Per model 2 a car with these attributes should achieve 17.6286 miles per gallon (mpg). The vehicles fuel efficiency for this specific individual shows a 95% prediction interval of (12.3883, 22.869). This allows us to be 95% sure that a car’s fuel efficiency that is measures in miles per gallon should lie within these two amounts as long as it has a weight of 2.965, a horsepower
of 210, and it has 6 cylinders, per the prediction interval for an individual response. The prediction interval for this specific response shows as being wider than the confidence interval for the mean as it takes the regression error and the sampling uncertainty into consideration when establishing or guesstimating the regression parameters. The car’s fuel economy mean is showing to be with a 95% confidence interval of (15.2024, 19.7016). Cars with a weight of 2.965, a horsepower of 210, and a car that has 6 cylinders, should have a 95% confidence that their average fuel efficiency that is measure in miles per gallon will be within these two ranges listed above. Due to the prediction interval needing to take into consideration both the random variation of the individual values and the uncertainty while calculation the population mean it will show as being wider than the confidence interval. 5. Conclusion It is obvious that horsepower and weight does directly affect the fuel economy of a vehicle. The rear axle ration when it is entered as a second variable does not show a significant impact on the fuel economy or efficiency of said vehicle. The information does not show a statistically significant link or correlation between the variable for a vehicle with 6 and or 8 cylinders in the second model and fuel economy. With this information I would use a model that only takes into consideration the weight, horsepower, and an interaction term between the weight and horsepower. This data shows that these are the only variables that have a significant affect on the fuel efficiency of a vehicle and this data is present in both models. Additional models would need to be developed using these variables along with additional variables from this data set in addition with other interaction variables in order to determine if other variables show as have a significant link with the fuel economy of these vehicles. The data from this data set and the analyzation that has been completed provides practical information that can help auto manufacturers understand which two attributes or characteristics impact the fuel efficiency of a vehicle significantly. This will allow them to balance out the weight and horsepower of a vehicle to increase the fuel efficiency of their vehicles. 7