CQMS_442_-_Project.edited
docx
keyboard_arrow_up
School
University of Toronto *
*We aren’t endorsed by this school
Course
143
Subject
Economics
Date
Jan 9, 2024
Type
docx
Pages
16
Uploaded by MajorElementHummingbird35
1
Case Study 2 – Modeling the Sale Prices of Residential Properties in Four Neighborhoods
Group 8
Nardeen Abdulkareem (Student #500867209)
Jermaine Garrett (Student #501101292)
Dimitri Konstantopoulos (Student #500916405)
Lena Tarzi (Student #501097903)
Chen Yang (Student #501122207)
This group project was prepared for Professor Fu’s CQMS442- DJ0 Multiple Regression
for Business class and was submitted on April 8th, 2023
2
The Problem
The purpose of this case study is to examine the relationships between the mean sales price, E(y), of a property and the three following independent variables.
1.
The appraised land value of the property
2.
The appraised value of the improvements on the property
3.
Neighbourhood in which the property is listed
The objectives: This case study focuses on the correlation between the appraised value of a property and its sale price. The sale price of a property is subject to multiple factors, such as the
seller's asking price, the property's appeal to buyers, and the state of the real estate and markets within specific neighbourhoods.
1.
Determine whether or not there is sufficient evidence to indicate that these variables contribute information for predicting the sales price from the data supplied.
2.
Determine if appraisers use the same appraisal criteria for various types of neighbourhoods.
The Data – TAMSALES-ALL.xlsx
The data utilized for this case study are 351 randomly selected observations from a more extensive data set provided by the textbook. This data was supplied by the property appraiser's office of Hillsborough County, Florida, and consists of the appraised land and improvement values and sales prices for residential properties sold in Tampa, Florida, from May 2008 to June
2009. ●
Four Neighbourhoods were selected, each relatively similar but varying sociologically and in property types and values.
○
Town & City (base), Cheval, Avila, and Northdale
The subset of sales and appraisal data pertinent to these four neighbourhoods was used to develop a prediction equation to relate sales price to appraised land and improvement values. This was recorded in thousands of dollars. The Theoretical Model
This activity aims to build a model that accurately predicts value ŷ correlates and relates to the mean sales price of y. The problem we face is understanding how closely our independent variables, the appraised value of land, improvements value, and
neighbourhood location, reflect the actual selling price of the
homes on the market.
3
Our objective would be to study different models and asses which are the most useful in predicting the selling price of the 351 randomly selected homes in Tampa, Florida, from May 2008 to June 2009.
The combination of the land value and improvement value makes up the independent variable of appraised value. Ideally, the appraised value would be equal to the mean sales price of the homes. A straight line with a slope of 1
would illustrate this. Using a linear model for this problem yields us a satisfactory model. Since when we plot the scarred plot of sales price versus appraised value, we get a graph that appears to fit a straight-
line model. Observe The figure below on the section with the hypothesized regression model. A robust linear relationship exists between the sales price and appraised values in thousands of dollars. However, the variation along the trend line may be attributed to the following reasons.
●
The age of the appraised data being swayed by inflation ●
Over or under-appraised value of homes due to certain biases by the retail agent or due to the neighbourhood of the home
●
The model used by real estate appraisers to appraise the value of the homes
The research we are conducting will help us answer the following questions:
1.
If the three independent variables: appraised land value, appraised values of improvement on the property, and the neighbourhood in the home listed is a good predictor of the actual selling price of homes within Tampa, Florida?
2.
Which model is the strongest predictor of sales price while using our three independent variables
3.
Whether this relationship can be applied to appraising the value of homes in neighborhoods outside of those in Tampa, Florida.
The Hypothesized Regression Models
Our objective is to relate sale price, y, to three independent variables
●
The Qualitative Factor → Neighbourhoods (four levels)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
●
The Quantitative Factors → Appraised land value and appraised improvement value
We consider the following four models as candidates for this relationship:
Model 1 is a first-order model that will trace a dependent plane for the mean sale price as a function of appraised land value (x1) and appraised improvement value (x2).
This model assumes that the relationship between appraised land value and appraised improvement value is identical across all four neighbourhoods →, thus allowing a first-order model to be appropriate for relating the expected mean sales price to x1 and x2
Model 2 is an additional first-order model that assumes the relationship between where E(y) and x1 and x2 are still first-order but that the planes' y-intercepts differ depending on the neighbourhood.
5
The fourth neighbourhood, Town & Country, is the base level. Nonetheless, this model predicts E(y) for Town & Country when x
3
=
x
4
=
x
5
=
0
. Model 2 helps us study the change in y for increases in x1 or x2 to vary depending on the neighbourhood. There is also no interaction term between the independent variables x1 and x2 and the neighbourhood terms. Therefore, Model 2 assumes that changes in the sale price y for every dollar increase in x1 and x2 do not depend on the neighbourhood. This model has appropriate application if an appraiser established values based on a relationship between the mean sale price and x1 and x2 that differed in at least two neighbourhoods that remained constant for different values of x1 and x2. Model 3 is also a first-order model. It is similar to model 2, except we now have added interaction terms between the dummy variables, corresponding to each neighbourhood, and the quantitative variables x1 and x2. This model allows the changes in y, for increases in x1 or x2, to vary with a given neighbourhood. It takes the following form.
Model 4
saw the addition of an x1, x2 interaction term and its corresponding interactions by neighbourhood. Model 4 builds upon model 3 by adding the interaction mentioned above terms so that we can observe an interaction between the independent variables x1 and x2 to depend on one another to predict the value of the sales price. It also builds interaction between each independent variable and the neighbourhoods we are interested in.
6
●
This model assumes a first-order regression model comprising 15 beta coefficients.
○
Main effect terms of the land value and the value of the improvements (b1&b2)
○
Two-way interaction between land value and improvements value (b3)
○
Main effect terms for neighbourhoods (b4, b5, b6)
○
Two-way interaction between land value and neighbourhoods (b7, b8, b9)
○
Two-way interaction between improvements value and neighbourhoods (b10, b11, b12)
○
Three-way interaction between land, improvements value, and neighbourhoods (b13, b14, b15)
Summary of outputs
Model 1 Summary Output for Regression Statistics Multiple R
: The multiple correlation coefficient (R) indicates the strength of the linear relationship between the dependent variable and the independent variables in the model. In this case, the multiple correlation coefficient is 0.9691, which indicates a strong positive correlation.
R Square:
The coefficient of determination (R-squared) represents the proportion of the variance in the dependent variable that the independent variables in the model can explain. In this case, the R-squared value is 0.9392, which means that the independent variables in the model can explain 93.92% of the variance in the dependent variable.
Adjusted R Square:
This is a modified version of the R-squared value that considers the number of independent variables in the model. It is generally used when there are multiple independent variables in the model. In this case, the adjusted R-squared value is 0.9388. Standard Error:
The standard error measures the variability of the dependent variable around the regression line. In this case, the standard error is 63630.84.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
ANOVA Table
df:
The degrees of freedom represent the number of independent pieces of information used to estimate the model parameters. In this case, there are 2 degrees of freedom for the regression and 347 degrees of freedom for the residual, which gives a total of 349 degrees of freedom. SS:
The sum of squares is a measure of the dependent variable variation explained by the regression or residual. In this case, the sum of squares for the regression is 2.17E+13, the sum of squares for the residual is 1.40E+12, and the total sum of squares is 2.31E+13. MS:
The mean square is calculated by dividing the sum of squares by the degrees of freedom. It represents the average amount of variation explained by the regression or residual. In this case, the mean square for the regression is 1.09E+12, and the mean square for the residual is 4048883939.
F:
The F-statistic is a ratio of the regression's mean square to the residual's mean square. It tests the null hypothesis that all the regression coefficients are equal to zero, indicating that the independent variables do not affect the dependent variable. In this case, the F-statistic is 2680.0252, which suggests that the model has a significant overall fit.
Significance F:
The p-value associated with the F-statistic provides a measure of the statistical significance of the model. In this case, the p-value is 1.02E-211, which is very small and indicates that the model is highly significant. Overall, the ANOVA table suggests that the regression model has a good overall fit and that the independent variables included in the model are highly significant in explaining the variation in the dependent variable. The regression statistics and the ANOVA results show values that would lead us to believe that model 1 is statistically significant. In addition, both variables have p-values indicating statistical significance.
8
Model 2
Summary Output for Regression Statistics Multiple R:
The multiple correlation coefficient (R) indicates the strength of the linear relationship between the dependent variable and the independent variables in the model. In this case, the multiple correlation coefficient is 0.9703, which indicates a strong positive correlation.
R Square:
The coefficient of determination (R-squared) represents the proportion of the variance in the dependent variable that the independent variables in the model can explain. In this case, the R-squared value is 0.9415, which means that the independent variables in the model can explain 94.15% of the variance in the dependent variable.
Adjusted R Square:
This is a modified version of the R-squared value that considers the number of independent variables in the model. It is generally used when there are multiple independent variables in the model. In this case, the adjusted R-squared value is 0.9407.
Standard Error:
The standard error measures the variability of the dependent variable around the regression line. In this case, the standard error is 62662.43. ANOVA Table
df:
The degrees of freedom represent the number of independent pieces of information used to estimate the model parameters. In this case, there are 5 degrees of freedom for the regression and 344 degrees of freedom for the residual, which gives a total of 349 degrees of freedom. SS:
The sum of squares is a measure of the dependent variable variation explained by the regression or residual. In this case, the sum of squares for the regression is 2.18E+13, the sum of squares for the residual is 1.35E+12, and the total sum of squares is 2.31E+13. MS:
The mean square is calculated by dividing the sum of squares by the degrees of freedom. It represents the average amount of variation explained by the regression or residual. In this
9
case, the mean square for the regression is 4.35E+12, and the mean square for the residual is 3926580397.
F:
The F-statistic is a ratio of the regression's mean square to the residual's mean square. It tests the null hypothesis that all the regression coefficients are equal to zero, indicating that the independent variables do not affect the dependent variable. In this case, the F-statistic is 1108.1623, which suggests that the model has a significant overall fit.
Significance F:
The p-value associated with the F-statistic provides a measure of the statistical significance of the model. In this case, the p-value is 1.23E-209, which is very small and indicates that the model is highly significant. Overall, the ANOVA table suggests that the regression model has a good overall fit and that the independent variables included in the model are highly significant in explaining the variation in the dependent variable. The regression statistics and ANOVA table results lead us to believe that model 2 is significant. In addition, a few but not all of the beta coefficients are statistically significant. Model 3
Summary Output for Regression Statistics Multiple R:
The multiple correlation coefficient (R) indicates the strength of the linear relationship between the dependent variable and the independent variables in the model. In this case, the multiple correlation coefficient is 0.9710, which indicates a strong positive correlation.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10
R Square:
The coefficient of determination (R-squared) represents the proportion of the variance in the dependent variable that the independent variables in the model can explain. In this case, the R-squared value is 0.9428, which means that the independent variables in the model can explain 94.28% of the variance in the dependent variable.
Adjusted R Square:
This is a modified version of the R-squared value that considers the number of independent variables in the model. It is generally used when there are multiple independent variables in the model. In this case, the adjusted R-squared value is 0.9409.
Standard Error:
The standard error measures the variability of the dependent variable around the regression line. In this case, the standard error is 62552.35. ANOVA Table
df:
The degrees of freedom represent the number of independent pieces of information used to estimate the model parameters. In this case, there are 11 degrees of freedom for the regression
and 338 degrees of freedom for the residual, which gives a total of 349 degrees of freedom. SS:
The sum of squares is a measure of the dependent variable variation explained by the regression or residual. In this case, the sum of squares for the regression is 2.18E+13, the sum of squares for the residual is 1.32E+12, and the total sum of squares is 2.31E+13. MS:
The mean square is calculated by dividing the sum of squares by the degrees of freedom. It represents the average amount of variation explained by the regression or residual. In this case, the mean square for the regression is 1.98E+12, and the mean square for the residual is 3.91E+09.
F:
The F-statistic is a ratio of the regression's mean square to the residual's mean square. It is used to test the null hypothesis that all the regression coefficients are equal to zero, indicating that the independent variables do not affect the dependent variable. In this case, the F-statistic is 506.1402, which indicates that the model has a significant overall fit.
Significance F:
The p-value associated with the F-statistic provides a measure of the statistical significance of the model. In this case, the p-value is 1.85E-202, which is very small and indicates that the model is highly significant.
Overall, the ANOVA table suggests that the regression model has a good overall fit and that the independent variables included in the model are highly significant in explaining the variation in the dependent variable.
11
The regression statistics and ANOVA results lead us to believe that this model is significant. However, only one beta coefficient seems to be significant as indicated by the green highlight.
Model 4
Summary Output for Regression Statistics Multiple R:
The multiple correlation coefficient (R) indicates the strength of the linear relationship between the dependent variable and the independent variables in the model. In this case, the multiple correlation coefficient is 0.9739, which indicates a strong positive correlation.
R Square:
The coefficient of determination (R-squared) represents the proportion of the variance in the dependent variable that the independent variables in the model can explain. In this case, the R-squared value is 0.9486, which means that the independent variables in the model can explain 94.86% of the variance in the dependent variable.
Adjusted R Square:
This is a modified version of the R-squared value that considers the number of independent variables in the model. It is generally used when there are multiple independent variables in the model. In this case, the adjusted R-squared value is 0.9463.
12
Standard Error:
The standard error measures the variability of the dependent variable around the regression line. In this case, the standard error is 59650.47.
ANOVA Table
df:
The degrees of freedom represent the number of independent pieces of information used to estimate the model parameters. In this case, there are 15 degrees of freedom for the regression
and 334 degrees of freedom for the residual, which gives a total of 349 degrees of freedom.
SS:
The sum of squares is a measure of the dependent variable variation explained by the regression or residual. In this case, the sum of squares for the regression is 2.1919E+13, the sum of squares for the residual is 1.1884E+12, and the total sum of squares is 2.3107E+13. MS:
The mean square is calculated by dividing the sum of squares by the degrees of freedom. It represents the average amount of variation explained by the regression or residual. In this case, the mean square for the regression is 1.4613E+12, and the mean square for the residual is 3558178663.
F:
The F-statistic is a ratio of the regression's mean square to the residual's mean square. It is used to test the null hypothesis that all the regression coefficients are equal to zero, indicating that the independent variables do not affect the dependent variable. In this case, the F-statistic is 410.6736, which suggests that the model has a significant overall fit.
Significance F:
The p-value associated with the F-statistic provides a measure of the statistical significance of the model. In this case, the p-value is 7.3256E-205, which is very small and indicates that the model is highly significant.
Overall, the ANOVA table suggests that the regression model has a good overall fit and that the independent variables included in the model are highly significant in explaining the variation in the dependent variable.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
13
From the previous output, we know the overall model seems like a good fit for the data. However, most of the beta coefficients are not statistically significant.
●
There are several reasons why this is the case
○
The model might be overfitting
○
There might be multicollinearity between the independent variables Model Comparisons
With our four models, we use a nested model F test to compare the models and find the most appropriate model for predicting sales price. We conduct these tests at α
=
0.05
. As a result, we can hypothesize that models 3 and 4 can build redundancy and multicollinearity into their models with the number of interaction terms that may not be necessary if we have a strong and simple model. The following table shows that model has the lowest MSE and s, and the highest R
2
a
.
Model
MSE
R
2
a
s
1
4,048,883,939
0.939
63,630.8
2
3,926,580,397
0.941
62,662.4
3
3,910,000,000
0.941
62,552.4
4
3,558,178,663
0.946
59,650.5
14
Test #1: Model 1 versus Model 2 (F-Test for Comparing Nested Models)
●
H
0
:
β
3
=
β
4
=
β
5
=
0
●
H
a
:
At least
one of the β
coefficients being tested does not equal 0
○
We wish to test the null hypothesis that the type of neighbourhood does not contribute statistically significant information to the sales price
●
F
=
(
SS E
R
−
SS E
C
)/
Number of β parameters
∈
H
0
MS E
C
○
SS E
R
=
SSEof the Reduced Model
○
SS E
C
=
SSEof theComplete Model
○
MS E
C
=
MSE of the Complete Model
●
F
=
(
1404960000000
−
1350740000000
)/
3
3926580397
=
89.49432614
●
Rejection Region: (-∞, 0.0672] ∪ [7.7636, ∞)
●
Conclusion: Reject H
0
, there is significant evidence to suggest β
3
,β
4
,
∧
β
5
contribute to the prediction of y. To put into context, this result implies that the appraiser is not assigning the same appraised values to properties across all neighbourhoods, which means there is a variation in the first-order relationship between sales (y) and appraised values (x1 and x2). Test #2: Model 2 versus Model 3 ●
H
0
:
β
6
=
β
7
=
β
8
=
β
9
=
β
10
=
β
11
=
0
●
H
a
:
At least
one of the β
coefficients does not equal to 0
●
F
=
(
SS E
R
−
SS E
C
)/
Number of β parameters
∈
H
0
MS E
C
●
SS E
R
=
SSEof the Reduced Model
●
SS E
C
=
SSEof theComplete Model
●
MS E
C
=
MSE of the Complete Model
●
F
=
(
1350740000000
−
1322530000000
)/
6
3910000000
=
¿
1.202472293
●
Rejection region: (-∞ 0.1523] ∪ [4.044, ∞)
●
Conclusion: We fail to reject H
0
. There is insufficient evidence to suggest that the neighbourhood interaction terms of Model 3 contribute information for the prediction of y.
In this context, it tells us we do not have enough evidence to suggest that the effect of the land value and improvements value on the sales price depends on the type of neighbourhood.
15
Test #3: Model 3 versus Model 4
●
H
0
:
β
3
=
β
13
=
β
14
=
β
15
=
0
●
H
a
:
Thealternativehypothesis isthat at least
one of these parameters does not equal 0
●
F
=
(
SS E
R
−
SS E
C
)/
Number of β parameters
∈
H
0
MS E
C
●
SS E
R
=
SSEof the Reduced Model
●
SS E
R
=
SSEof theComplete Model
●
MS E
C
=
MSE of the Complete Model
●
F
=
(
1322530000000
−
1188430000000
)/
4
3558178663
= 9.421955212
●
Rejection Region: (-∞, 0.3003] ∪ [3.0078, ∞)
●
Conclusion: Reject H
0
, there is significant evidence to suggest th
at the X
1
, X
2
interaction terms in model 4 significantly contribute to the prediction of y. To put into context, we have enough evidence to reject the null hypothesis and accept the alternative hypothesis that concludes that the variability of the sales price is not the same for all properties across the neighbourhood.
Conclusions
●
The results of our model comparisons lead us to believe that model 2 is the best of the four depicted models in modelling the sales price of a home on the housing market. The results of the global F test show a high level of significance of the independent variables to the dependent variable. In addition, the adjusted R
2
a
was also relatively high. It indicated that ~94.1% of the variability in model 2 was explained by the quantitative variables, land value and improvement value, and the qualitative variable, neighbourhood.
Ŷ
=
33366.82971
+
1.810093704
x
1
+
0.81177063
x
2
+
38726.00396
x
3
+
40761.17325
x
4
+
16216.33924
x
5
●
Ŷ
is the predicted value of the dependent variable
●
Quantitative Independent Variables:
○
x1 is the value of the independent variable "Land_Value"
○
x2 is the value of the independent variable "Improvements_Value"
●
Qualitative Independent Variables:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
16
○
x3 is the value of the independent variable neighbourhood "CHEVAL", x3 = 1 if “CHEVAL”, x3 = 0 if not
○
x4 is the value of the independent variable neighbourhood "AVILA", x4 = 1 if “AVILA”, x4 = 0 if not
○
x5 is the value of the independent variable neighbourhood "NORTHDALE", x5 = 1 if “NORTHDALE”, x5 = 0 if not
●
The predicted sales value for a property in the Town & City neighbourhood, with a land value of $55,160, and improvements value of $148,453, is ~$253,721. Therefore, we are
95% confident that the predicted sales price value falls within the highlighted range of $130,024 to $377,418.
Related Documents
Recommended textbooks for you

Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning

Managerial Economics: A Problem Solving Approach
Economics
ISBN:9781337106665
Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:Cengage Learning




Recommended textbooks for you
- Managerial Economics: Applications, Strategies an...EconomicsISBN:9781305506381Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. HarrisPublisher:Cengage LearningManagerial Economics: A Problem Solving ApproachEconomicsISBN:9781337106665Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike ShorPublisher:Cengage Learning

Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning

Managerial Economics: A Problem Solving Approach
Economics
ISBN:9781337106665
Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:Cengage Learning



