MAT 240 Project One Template
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
240
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
8
Uploaded by HighnessWolf3860
Median Housing Price Prediction Model for D. M. Pan National Real Estate Company
1
Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company
Madison Jones
Southern New Hampshire University
Median Housing Price Model for D. M. Pan National Real Estate Company
2
Introduction
The following report will discuss and analyze the relationship between the square footage
of a home and its listing price from data reported in 2019. Currently, real estate agents use square
footage as a gauge in determining the listing price of a home. This report aims to determine if the
correlation between square feet and listing price is strong enough to predict potential listing
prices of a home give a specific square footage.
Using linear regression when analyzing a dataset is appropriate when the predictor
variable and response variable express a linear relationship. A linear relationship is observed
between these two variables when there change is proportional. Hence when using linear
regression, the scatterplot is expected to show that as the predictor variable increases so does the
response variable. The predictor variable of x is categorized as a piece of data that can predict
another piece of data. The response variable of y is categorized as data that is dependent on the
change of another piece of data. For the following report square feet is defined as the predictor
variable (x) and listing price is the response variable (y). This is because it can be reasonable
assumed that as square footage of a home changes so will the listing price thus indicating the
listing price is dependent on the change in square feet.
Data Collection
The below scatterplot exhibits the sample of 50 houses chosen from the population of
1,000. In order to obtain a truly random sample the RAND function in Microsoft Excel was used.
As seen in the Excel spread sheet the column labeled “random” contains randomly generated
numbers through the RAND function. All 1,000 homes in the population were given a randomly
generated number. The entire table was then sorted at random using the sort function in the data
Median Housing Price Model for D. M. Pan National Real Estate Company
3
tab of Microsoft Excel. The top 50 columns were then chosen as the random sample. Using the
RAND function to generate random numbers guaranteed the dataset would be sorted at random.
Predictor Variable (x)
= Square feet
Response Variable (y)
= Listing price
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
$-
$100,000.00
$200,000.00
$300,000.00
$400,000.00
$500,000.00
$600,000.00
$700,000.00
$800,000.00
$900,000.00
Square Feet and Listing Price of a Home
Square Feet
Listing Price
Data Analysis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company
4
Square Footage Interpretation
As depicted in the graph above the mean square footage is 2,225 this indicates that in the
sample of 50 the average square footage of a home is 2,225. The median is also visible
represented in the above histogram. The histogram clearly shows that the majority of the sample
fall within the first two bins ranging from 1,001 – 2,501 square feet. Overall, 41 out of 50 data
point lie within that range explaining why the median is 2,225. In the dataset the standard
deviation of the sample is 1097.804. The standard deviation is slightly high indicating that some
data point are spread further from the mean. By looking at the histogram we can see that nine
Median Housing Price Model for D. M. Pan National Real Estate Company
5
data point are further away from the main cluster of data thus leading to a larger spread of data
and higher standard deviation. However, the majority of data (41 point) lie within the main
cluster indicating the sample is generally reliable. Square footage of homes ranges from 1,001 –
6,001 and there are no gaps in the data set.
Listing Price Interpretation
The graph above shows the mean listing price of the sample is $351,416 this number
represents the average listing price for the sample data. In the histogram this information is
visual represented. Over 75% of the data lies within the first three bins ranging from $171,600 –
$611,600 which justifies why the mean listing price is $351,416. The standard deviation of
listing price is $130,100.91. Comparatively the standard deviation in listing price is small
indicating the majority of data points are close together. However, in the 50 data point there
appear to be two outliers in the listing price. These two outliers are in the very last bin and are
separated from the rest of the data by at least $110,000. Although these two values are separated
by a substantial gap in the data it has little effect on the validity of the sample. When considering
the totality of all value of listing price the majority of data falls within the range of $171,600 -
$611,600.
Sample vs National Statistics
As depicted in the above graphs the sample data is comparable to the national population
statistics. The mean listing price for the sample is $351,416 and the national mean is
$342,365.00. Similarly, the mean sample square feet is 2,225 whereas the nation square footage
mean is 2,111.
In every single area of comparison, the datasets are very similar. When
comparing the histograms between the two variable the sample data and national data appear to
have a similar shape. The only major contrast in the national data and sample data is the gap in
Median Housing Price Model for D. M. Pan National Real Estate Company
6
listing price. In the histogram for sample listing price there is a gap between $611,600 - $721,600
whereas in the national statistic there is no gap. The similarity between the datasets indicates the
randomly generated sample of 50 is a good representation of national population averages.
Develop Regression Model
Based on the scatterplot it can be determined that a regression model is appropriate. As
seen in the scatterplot above the correlation between the two variables is positive. When the x
values of square feet increase the y values of listing price increase. Majority of the data points
are plotted close to the regression line indicating a strong correlation. The only data point
considered an outlier would be (5,108, $822,200). This data point lies above the regression line
indicating that given the square footage of the home the listing price is above average. If too
many outliers appear in a dataset the correlation of the two variables could be greatly skewed.
However, in this dataset only one major outlier is present, and every other data point lies near the
regression line.
Due to this fact it would not be significantly beneficial to remove the outlier
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
$-
$100,000.00
$200,000.00
$300,000.00
$400,000.00
$500,000.00
$600,000.00
$700,000.00
$800,000.00
$900,000.00
f(x) = 102.76 x + 122742.5
R² = 0.75
Square Feet and Listing Price of a Home
Square Feet
Listing Price
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company
7
because the correlation is already very strong. Based on the dataset the correlation coefficient is
calculated to be 0.867137101 (
r
). A correlation coefficient of 0.8 or above is indicative of a
strong between the two variables. Thus, the calculated correlation coefficient supports the
observations made above that even with the included outlier the dataset supports a strong linear
relationship between square feet and listing price.
Determine the Line of Best Fit
Regression equation:
y = 102.76x + 122742; the slope is 102.76 and the intercept is 122,742
In the above regression equation the slope is 102.76 indicating that for every unit change
in square footage the listing price should increse by $102.76. The slope can be interpreted to
indcate the price of one square foot. In the regression equation the intercept is 122,742 meaning
when square feet (y) is zero the listing price is $122,742. The intercept can be interprated as just
the cost of land .
R-squared is a numerical statistic that indicates how much the variation of y can be
explained by the variation in x. In this scatterplot R-squared represents how much the variation
in square feet explain the variation in listing price. Based on the correlation coefficient R-squared
is determined to be 0.7519 indicating 75.19% of the variation in listing price can be explained by
the variation in square feet.
Assuming the square footage of a home is 1500 the regression equation can be used to
predict the potential listing price. By substituting 1500 into x, y = 102.76*1500 + 122742 is
equal to 276,882. The result of the equation indicates the potential listing price of a 1500 square
foot home is $276,882.
Conclusions
Median Housing Price Model for D. M. Pan National Real Estate Company
8
In this analysis 50 random data points were taken from a population of 1,000. With the
information obtained through the sample the intention was to determine the correlation between
square feet and listing price and if these two variable could be used to accurately judge the
potential listing price of a home. Based on the scatterplot create using the sample data it was
determined that the relationship between square feet was linear. Thus, when square feet of a
home increased so did its listing price. It was further determined through the correlation
coefficient that the relationship between these two variables was strong. The conclusion drawn
from this data indicates square feet is a good benchmark variable of potential listing price of a
home. Based on previous analysis these results were expected and meet the expectations for how
the data would behave given the variables. The finding of this analysis could significantly change
if the data was only chosen from large cities in each state. For example, if data was only taken
from Miami, Los Angeles, Las Vegas, D.C., New York, and Huston the analysis would be biased
toward larger listing prices because cities and generally more expensive then rural areas. Finally,
in follow up research it would be interesting to compare how the number of bedrooms influences
listing price and which variable is more significant square footage or number of bedrooms.
Related Documents
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt