MAT 240 Project One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

240

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

8

Uploaded by EarlField10342

Report
Median Housing Price Prediction Model for D. M. Pan National Real Estate Company 1 Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company Joshua D. King Southern New Hampshire University
Median Housing Price Model for D. M. Pan National Real Estate Company 2 Introduction This report has been created to evaluate the modeling of a linear-regression equation to determine if a home’s square footage is a good representation for calculating a home’s listing price. It is most appropriate to use a regression equation when there is a linear relationship between the explanatory variable and the response variable. When using a linear regression equation, a scatterplot should contain data points that form a linear pattern with a tight grouping along a line of best fit. A predictor variable is located along the x axis and is represented by an independent variable, which is a variable that can be either be controlled or be a reason to cause change in another variable. A response variable is located along the y axis and is represented by a dependent variable, which is a variable whose value is based upon the independent variable. Data Collection A random selection of 50 houses was determined from the provided list of real estate data. To establish the random 50 samples, I first labeled a column with the heading of random and used the excel function, =rand(), to assign a random value to the first home in the list. After assigning the first random value, I dragged the arrow in the lower right corner of the box and brought it to the bottom of the selected column, which allowed Excel to assign a random number to each house in the listing. I then selected all the data within the Excel sheet and selected the window for data from the top of the software. Under the data window, I selected the sort function and used the random setting to allow Excel to randomly sort the listing of homes. I then selected the data from row 2 to row 51 and removed all other data points, being left with the selected random 50 homes to use for the modeling of a regression equation.
Median Housing Price Model for D. M. Pan National Real Estate Company 3 The predictor value in this situation is the median square footage because this value should be relevant to determine a change in a home’s listing price. The response variable in this situation is the median home listing price because this value should have a change dependent upon the change in square footage. - 1,000 2,000 3,000 4,000 5,000 6,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 Median Listing Price vs Median Square Footage Median Square Footage (ft2) Median Listing Price ($) Data Analysis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 4 Using the scatterplot for reference, you can determine that as a home’s square footage increases so does the listing value. The data points in the scatterplot tend to follow a linear line with a close relationship. Most points range below 3,000 square feet and a listing price of $500,000. The square footage ranging from 4,500 to 6,500 square feet contain homes with listing prices between $550,000 to $900,000. These values range above the other data values but still show a positive correlation. The data contains a few outliers with around 3,500 square feet that Listing Price Square Feet Area Mean 382,948.00 2,418 Median 316,900.00 1,854 Std Dev 176,178.55 1299.256188
Median Housing Price Model for D. M. Pan National Real Estate Company 5 range in listing prices between $450,000 to $800,000. The data shows that home square footage would be good for determining the value of the homes listing price. The data that was obtained from the sample population represented a mean listing price of $382,948.00 with a mean square footage of 2,418. The national mean listing price is $342,365 with a mean square footage of 2,111. This data shows that the sample population is higher than the national mean averages but looking at the deviation between the values you can see they follow a similar pattern of price change versus square foot change. The median listing price of the sample population is $316,900 with a median square footage of 1,854. The median listing price of the national average is $318,000 with a median square footage of 1,881. The data shows that the median listing price and median square footage of the sample population is a very accurate representation of the national average. The standard deviation of the sample population’s listing price is $176,178.55 with a standard deviation of square footage of 1,299. The national average standard deviation of listing price is $125,914 with a standard deviation of square footage of 921. The sample population has a higher standard deviation than the national average.
Median Housing Price Model for D. M. Pan National Real Estate Company 6 Develop Regression Model - 1,000 2,000 3,000 4,000 5,000 6,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 f(x) = 120.39 x + 91895.72 Median Listing Price vs Median Square Footage Median Square Footage (ft2) Median Listing Price ($) A scatterplot was created to determine if a regression model would be appropriate to calculate if a change in a home’s square footage would be an influential factor for determining a home’s listing price. The scatterplot shows that the data can be a good fit for a linear trendline. There is one outlier that has a square footage of 3,504 with a listing price of $764,900, which is above the normal range of other data points in that range. There are also some influential points that range at square footage above 4,500 and below 6,000 and range in price from above $550,000 to below $900,000.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 7 In this situation, no outliers will be removed because more data points help to strengthen the equation and the outliers are important to the model. The one outlier with 3,504 square foot is above average but does not change the strength of the equation if removed so it is not influential. The data points that range between 4,500 and 6,000 square feet are influential outliers because they range after a large gap in data points of square footage but significantly impact a change in the regression equation. These are influential data points because there are enough data points that show a strong correlation in increased listing price versus a change in square footage and follows close to the trendline. The correlation coefficient for the sample population is r= 0.887812233. The calculated r value helps to support the noticed data in the scatterplot because the value is positive which represents an increase in listing price as the square footage increases. The r value also determines a strong correlation in data points because the value of 0.89 is close to the range of -1 to 1 in representation of strength in correlation. Determine the Line of Best Fit The regression equation for the sample population is Y=120.39x+91,896. The slope of this equation is 120.39 and the intercept is 91,896. In this situation, the slope represents a change in listing price when there is a single unit change in square feet. With the slope being positive, it also represents a positive relationship between the two variables. The intercept represents the listing price if the square footage value was 0. This would mean that the land value, with no home, would be listed at $91,896. The r-squared value of the sample population is 0.788210561. The r-squared value represents how much the variation in the response variable of the listing price can be explained
Median Housing Price Model for D. M. Pan National Real Estate Company 8 by using square footage as a predictor value. This means that 79% of the variation in the response variable can be explained by the predictor model. Using the predictor model to determine the listing price of a home with 1,500 square feet calculates a listing price of $272,481. This is determined by substituting 1,500 into the equation Y=120.39x+91,896. Y=120.39 (1,500) + 91,896. Y=272,481. Comparing this to the scatterplot shows that this value is in range with the other data points. Conclusions In conclusion, it has been determined that a linear regression model would accurately predict housing prices based on square footage. There is a strong correlation between data points that have been randomly selected and analyzed and then compared to the national averages of listing prices and square footage. These results are within what expectations I had assumed. To support different results, one would need to determine the best option of selecting which data points to use to calculate a regression equation. If a stratified sampling method was established to select a set number of data points from each region and those data points were analyzed to create a regression equation, would it still compare to the national averages?