Hinchcliff_Megan_MAT_240_Project_One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

240

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

10

Uploaded by MegaLarkMaster816

Report
Median Housing Price Prediction Model for D. M. Pan National Real Estate Company 1 Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company Megan Hinchcliff Southern New Hampshire University
Median Housing Price Model for D. M. Pan National Real Estate Company 2 Introduction This report establishes a model to predict housing prices for homes sold in 2019. The analysis includes information from the Real Estate County Data of 2019 compared to the National Statistics of housing markets. The outcome of this report is to implement a model for agents at D.M Pan National Real Estate Company to better determine the use of square footage as a benchmark for listing prices of homes. Linear regression is used to determine the relationship between two qualitative variables. The simple linear regression model is most appropriately used when establishing how strong a relationship is between two variables. This report will define if there is a linear relationship between the listing prices of homes and their square footage. When using linear regression, the scatterplot is expected to be linear, with the data points along a straight or nearly straight line demonstrating a relationship between two variables. The independent and dependent variables have a cause-and-effect relationship where the independent variable is the cause, and the dependent is the effect. The x-axis is independent of other variables controlled by the observer, and the y-axis depends on changes from the independent variable. This makes the independent variable, also known as the predictor, helpful in making predictions. The predictor variable (X) in this report is the square footage of homes because the data is not influenced by anything and is independent of the data. The listing price is dependent on and affected by the square footage, making it the response variable (Y). Data Collection To ensure the sample is random with minimal bias, an equation in excel was used. The sample was created by selecting the entire data set of homes divided into counties throughout the
Median Housing Price Model for D. M. Pan National Real Estate Company 3 country and applying the =RAND() equation. Utilizing the copy-and-paste value feature in excel for the random sample column ensures that the calculation will not change throughout the analysis. The first 50 rows of data were then selected to represent the sample of homes in the country. In the scatterplot graph below, the predictor variable (X) represents the square feet of the homes. The response variable (Y) represents the home listing prices. - 1,000 2,000 3,000 4,000 5,000 6,000 7,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 f(x) = 98.99 x + 132339.34 R² = 0.8 50 Home Sample Square Feet Listing Price Data Analysis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 4 Square Footage Listing Price Mean 2,133 Mean $343,510 Median 1,766 Median $328,950 Standard Deviation 1,222 Standard Deviation $135,213 The relationship between the predictor and response variables is illustrated in the scatterplot above. When the square footage of homes increases, the listing price also increases. This demonstrates a linear relationship between the two variables, represented in the scatterplot showing a positive trend line. An outlier in a scatterplot is a variable or variables furthest from the linear regression line. An outlier occurs when there is a natural deviation in the population or through errors and represents variables outside the linear relationship. As seen in the sample group, outliers occur five times. These outliers occurred because the square feet of the houses are significantly higher than the rest of the sample, increasing the listing price. The histograms above represent the frequency of homes based on their square footage and listing prices of the 50 homes sample. Both graphs are skewed to the right, with a high peak
Median Housing Price Model for D. M. Pan National Real Estate Company 5 of frequency data on the left and a tale of low-frequency data on the right. The data spread in the square footage is between 1,159-6,659 with five bins. In the listing price, the spread is between $165,800-$825,800 with six bins. When looking at the square footage histogram, an unusual characteristic occurs because there is a gap where zero homes are listed with square footage between 3,263-4,316. The chart above shows the mean, median, and standard deviation of the 50 sample homes. The mean is used to estimate the average number of homes in square feet and listing price, and the median estimates the middle or center of the data. The standard deviation determines the average distance between the data points and the mean. In the sample of homes, the center point or the median is 1,766 square footage, and the listing price is $328,950. The average home square footage is 2,133, with an average listing price of $328,950. The standard deviation for square footage is 1,222 and $135,213 for the listing price. This indicates a low standard deviation where the data points are clustered around the mean or average. 50-home sample group and the National Statistics comparison table Sample Group Square Feet Listing Price National Statistics Square Feet Listing Price Mean 2,133 $343,510 Mean 2,111 $342,365 Median 1,766 $328,950 Median 1,881 $318,000 Standard Deviation 1,222 $135,213 Standard Deviation 921 $125,914 The data showed minor differences when comparing the 50 homes sample to the National Statistics. The table above shows that the mean, median, and standard deviation for the square footage and listing price were very similar. This signifies that the sample data's average and middle points are a useful representation of the nation. Additionally, the average distance
Median Housing Price Model for D. M. Pan National Real Estate Company 6 between data points is low, indicating the data is clustered close to the mean and median for both the sample group and the National Statistics. The National Statistics histograms are also shown to be skewed to the right, with a high peak of frequency data on the left and a tale of low- frequency data on the right. On the other hand, the National Statistics histograms do not show gaps between the ranges or bins and have a significantly larger spread of listing prices. The listing price spread of the National Statistics ranges between $100,000-$100,000,000 compared to the sample group, which is $165,800-$825,800. Develop Regression Model - 1,000 2,000 3,000 4,000 5,000 6,000 7,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 f(x) = 98.99 x + 132339.34 R² = 0.8 Linear Regression Scatterplot Square Feet Listing Price Analyzing the scatterplot above, a relationship between the square footage and the listing price is observed with the line of best fit, also known as the trend line. With the data set showing a linear trend, a regression model would be appropriate to effectively estimate listing prices of homes based on the square footage.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 7 The relationship between the predictor value (X) and the response value (Y) is defined by determining the r correlation. When the r correlation value is between -1 and 0, it shows a negative correlation which means that as one variable increases, the other variable decreases. When the r value is between 0 and +1, it shows a positive correlation which means that as one variable increases, the other variable also increases. The =CORREL equation was used to calculate the correlation strength between the square footage and the listing price. In the analysis of sampling 50 homes nationwide, the correlation between the square footage and the listing price was r = 0.8947. The r value supports the assumption of the scatterplot above. When square footage increases, the listing price also increases. With this information from the r correlation, we can determine the strength of the relationship as a strong correlation because the value is closest to 1. The direction of the correlation is also positive because as the square footage increases, the listing price also increases, as seen in the scatterplot, with the trend line rising to the right. An outlier occurs when there is a natural deviation in the population or through errors and represents variables outside the linear relationship. As seen in the sample group scatterplot, an outlier occurs five times in the analysis. The outliers occurred for homes with a square footage of 4,479, 5,290, 8,817, 5,894, and 6,420. These outliers occurred because the square feet of the houses are significantly higher than the rest of the sample, increasing the listing price. These homes affect the relationship between square footage and the listing price by weakening the correlation and making the data more scattered, leading to the r correlation closer to 0. By removing the outliers, the data will have a stronger relationship between square footage and the listing price and increase the r correlation. In this analysis, the r correlation is r =
Median Housing Price Model for D. M. Pan National Real Estate Company 8 0.8947, which strongly correlates the two variables with the outliers present. It is recommended to keep the outliers in this data set since they will have a minimal effect on the assessment. Determine the Line of Best Fit As mentioned earlier in this report, the predictor variable (X) represents the square feet of the homes. The response variable (Y) represents the home listing prices. In the scatterplot of 50 homes nationwide, the regression equation is exhibited as y = 98.992x + 132339. The slope of a regression line indicates the rate of change in listing price per unit change in the square footage. The Y-intercept represents the point where the regression line crosses the Y-axis when the X value is 0. These two values show the relationship between the square footage and the listing price of homes. The slope of the regression line in the above scatterplot is 98.992, and the intercept is 132339. Looking at the slope, we can conclude that when the square footage increases by one, the listing price will increase by $98.99. When looking at the intercept, we can also conclude that when the square footage of a home is 0, the listing price of the land is $132,339. The intercept makes sense based on the line of best fit because it shows the listing price of land when a house is not present. This information can be used as a model for agents at D.M Pan National Real Estate Company to better determine the use of square footage as a benchmark for listing prices of homes. The R-squared Coefficient, also known as the coefficient of determination, determines the amount of variation in the listing price that the square footage can explain. Which shows how well the data fits in the regression line. The Coefficient of determination for the sample of 50 homes nationwide is R² = 0.8006. By moving the decimal point over two units, we can determine
Median Housing Price Model for D. M. Pan National Real Estate Company 9 the percentage of variation, which is 80.06%. This indicates that the variation in the square footage explains 80.06% of the variation in the listing price. The regression equation can be used to predict what price to list a house based on square footage. For example, a house that is 1,500 square feet would be listed at $280,827. The predicted home cost of a 1,500-square-foot house is calculated by the linear regression equation provided by the 50-home nationwide sample group, y = 98.992x + 132339. To use this equation, input 1,500 for x, y = 98.992(1,500) + 132339, where y equals 280,827. This equation provides a scientific calculation to estimate home prices and housing costs. Conclusions The outcome of this report is to implement a model for agents at D.M Pan National Real Estate Company to better determine the use of square footage as a benchmark for listing prices of homes. Collecting a sample group of 50 homes nationwide, an analysis was conducted to create a regression model that demonstrated a positive linear correlation between the square footage of homes and the listing price. This model indicates that as the square footage increases, the listing prices also increase. When comparing the information from the sample group to the National Statistics, we found the sample group to be a valuable representation of the entire nation. After conducting the analysis, the results were as expected. The correlation showed a strong relationship between square footage and listing prices, with an r correlation of 0.8947. Within all sample groups, there is an expectation of data that does not fit in the average. This is shown with five outliers of homes because they have greater square footage than most homes in the sample. As anticipated, the regression model and equation were found to be useful in
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 10 determining the listing prices based on a home’s square footage. Agents can effectively implement them at D.M Pan National Real Estate. When looking at the data provided on homes in all counties in the United States, it shows a difference in listing price compared to square footage. This is due to differences in the cost of living based on which county home is being sold. To better determine an appropriate listing price, a change in the model could be used by creating a regression model for each county instead of a sample of 50 homes nationwide. Although having multiple regression models will be time-consuming, the data will better illustrate a suitable listing price based on the cost of living and differences in market values. A home in a small town in the Midwest will cost considerably less than a home in New York City. An interesting question to further this research would be how will the square footage of the entire property in addition to the home affect the listing price.