MAT 240 Project One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

240

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

7

Uploaded by GrandChimpanzeePerson1081

Report
Median Housing Price Prediction Model for D. M. Pan National Real Estate Company 1 Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company Melanie Maxwell MAT 240 – Applied Statistics Dr. Reel Project One Southern New Hampshire University
Median Housing Price Model for D. M. Pan National Real Estate Company 2 Introduction In this report we will be using graphs, scatter plots, tables and linear regression to compile data regarding the listing price for homes in 2019. We pose the question, does the square footage of a home indicate what the listing price should be? To begin my research I will create a regression model, using the square footage as foundation to predict the median housing price. Linear regression defined as “a way to model the linear relationship between two quantitative variables using a line drawn through those variables’ data points known as a regression line”. (Zybooks, 2020). Linear regression is most appropriate when you are examining a relationship between two variables. In this instance our variables are the listing price and square footage. When using linear regression I would expect the scatterplot to reflect a best fit ascending trend line, that is specific to the data, gradually going up as the square footage increases. The difference between a x and y variable is quite simple. According to Zybooks, “In a linear regression involving two variables, the response variable is the variable being modeled or predicted, while the predictor variable is the variable used to predict the response. (Zybooks, 2020). In this instance we will use the median listing price as the response variable and the median square footage as the predictor variable. This means the listing price relies on the square footage. Data Collection My sample data was obtained by selecting 50 random counties. To ensure true randomness I first inserted a new column titles Random and entered my random formula into the excel spreadsheet. The formula =RAND() generates a random number, I then copied this formula into the entire column. Once all my random numbers populated, I removed all unnecessary data
Median Housing Price Model for D. M. Pan National Real Estate Company 3 as I only needed 50 counties for this data collection. With my new data set generated we note that the response variable will be my Y, the median listing price and the predictor variable will be my X, median square footage. Figure 1: Scatterplot Predictor Variable (X) – Median Square Footage Response Variable (Y) – Median Listing Price 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 Scatterplot of X vs. Y Median Square Feet Median Listing Price There is an overall linear pattern and this scatterplot is appropriate for linear regression. Data Analysis Figure 2: Histogram
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 4 Figure 3: Histogram Summary statistics: Variables Listing Price Square Feet Area Mean $319,111 1,946 Median $307,900 1,822 Std Dev $105947.116 1 729.2892451
Median Housing Price Model for D. M. Pan National Real Estate Company 5 The histograms pictured above for square footage and listing price display a positively skewed right distribution for our variables, X and Y. The provided table depicts the mean, median and standard deviation for the Real Estate data. Our data samples mean of $319,111 is slightly lower than the National Summary Statistics, their center price is $342,365. There are no outliers present but gaps are evident on the graph. Taking a deeper look into the statistics between our sample data collected and the National Summary Statistics and Graphs Real Estate Data I’ve noticed similarities. Our median of $307,900 at 1822 square feet is similar to the national statistics which is listed at $318,000 for 1881 square footage. Our mean of $319,111for 1946 square footage is less than the national statistics listed at $342,365 for 2111 square footage. Our standard deviations were incomparable as well, our square footage is about 730 square feet listed at $105,947, and the national statistics standard deviation is 921 square feet listed at $125,914. Develop Regression Model Figure 4: Scatterplot
Median Housing Price Model for D. M. Pan National Real Estate Company 6 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 f(x) = 110.6 x + 103852.58 R² = 0.58 Scatterplot of Y vs. X Median Square Feet Median Listing Price In order to develop a true linear regression, you’ll need a scatterplot that is indictive of two variables. To determine a positive relationship between our variables, we found the slope. Our predictor and response variables then showed a positive linear relationship, thus the above regression model was developed. There were three outliers on the scatterplot. Though the outliers followed the trendline in a positive ascension they had a negative impact on the variance. It was in the best interest to remove these outliers, so that a more linear regression could flow. For intents and purposes of this graph we removed the three outliers that were present on the graph. These outliers had an unusually high listing price with very small square footage. Further investigation proved that the outliers were in the Pacific Region as well as the Northeast Region. These regions are known to be of a higher listing price because of the location. The correlation of this scatterplot is 0.7613. The correlation coefficient of this graph is 0.5796. This indicates that about 58% of the variance in median listing price can be explained by the variance of the statistics, which is over half of the statistical set. This is a strong contributor
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 7 when pricing is considered but it shouldn’t be the only factor as the location plays a major role, as well as the condition of the home. Determine the Line of Best Fit For this random sample the regression equation is y=110.6x + 103853. The slope b1 in this equation is 110.6 and the intercept is 103853. This equation means that for every additional square foot the price will increase by $111. In this graph Y is my dependent variable and X is my independent variable. To provide further explanation, I’ll present an example using the regression equation. Our equation tells us that the property value is around $103853. If we had a square footage of 1500, the equation would be y=110.6(1500)+103853=$269,753. Conclusions In conclusion, after completing this report the data has proven that square footage is an accurate indicator for listing price. This conclusion is supported by the creation of the regression model and equation. There is a strong positive linear relationship that shows an increase in funds as the square footage increases. We can also factor in that location plays a significant role and in those specific areas we can expect a higher listing price for square footage around the same size. If we agree to continue building properties in outlying areas we may decrease our value. It would be in the best interest of our company to continue selling properties along our trendline.