MAT 240 Project One Template

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

240

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

8

Uploaded by HighnessWolf3860

Report
Median Housing Price Prediction Model for D. M. Pan National Real Estate Company 1 Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company Madison Jones Southern New Hampshire University
Median Housing Price Model for D. M. Pan National Real Estate Company 2 Introduction The following report will discuss and analyze the relationship between the square footage of a home and its listing price from data reported in 2019. Currently, real estate agents use square footage as a gauge in determining the listing price of a home. This report aims to determine if the correlation between square feet and listing price is strong enough to predict potential listing prices of a home give a specific square footage. Using linear regression when analyzing a dataset is appropriate when the predictor variable and response variable express a linear relationship. A linear relationship is observed between these two variables when there change is proportional. Hence when using linear regression, the scatterplot is expected to show that as the predictor variable increases so does the response variable. The predictor variable of x is categorized as a piece of data that can predict another piece of data. The response variable of y is categorized as data that is dependent on the change of another piece of data. For the following report square feet is defined as the predictor variable (x) and listing price is the response variable (y). This is because it can be reasonable assumed that as square footage of a home changes so will the listing price thus indicating the listing price is dependent on the change in square feet. Data Collection The below scatterplot exhibits the sample of 50 houses chosen from the population of 1,000. In order to obtain a truly random sample the RAND function in Microsoft Excel was used. As seen in the Excel spread sheet the column labeled “random” contains randomly generated numbers through the RAND function. All 1,000 homes in the population were given a randomly generated number. The entire table was then sorted at random using the sort function in the data
Median Housing Price Model for D. M. Pan National Real Estate Company 3 tab of Microsoft Excel. The top 50 columns were then chosen as the random sample. Using the RAND function to generate random numbers guaranteed the dataset would be sorted at random. Predictor Variable (x) = Square feet Response Variable (y) = Listing price 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 $- $100,000.00 $200,000.00 $300,000.00 $400,000.00 $500,000.00 $600,000.00 $700,000.00 $800,000.00 $900,000.00 Square Feet and Listing Price of a Home Square Feet Listing Price Data Analysis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 4 Square Footage Interpretation As depicted in the graph above the mean square footage is 2,225 this indicates that in the sample of 50 the average square footage of a home is 2,225. The median is also visible represented in the above histogram. The histogram clearly shows that the majority of the sample fall within the first two bins ranging from 1,001 – 2,501 square feet. Overall, 41 out of 50 data point lie within that range explaining why the median is 2,225. In the dataset the standard deviation of the sample is 1097.804. The standard deviation is slightly high indicating that some data point are spread further from the mean. By looking at the histogram we can see that nine
Median Housing Price Model for D. M. Pan National Real Estate Company 5 data point are further away from the main cluster of data thus leading to a larger spread of data and higher standard deviation. However, the majority of data (41 point) lie within the main cluster indicating the sample is generally reliable. Square footage of homes ranges from 1,001 – 6,001 and there are no gaps in the data set. Listing Price Interpretation The graph above shows the mean listing price of the sample is $351,416 this number represents the average listing price for the sample data. In the histogram this information is visual represented. Over 75% of the data lies within the first three bins ranging from $171,600 – $611,600 which justifies why the mean listing price is $351,416. The standard deviation of listing price is $130,100.91. Comparatively the standard deviation in listing price is small indicating the majority of data points are close together. However, in the 50 data point there appear to be two outliers in the listing price. These two outliers are in the very last bin and are separated from the rest of the data by at least $110,000. Although these two values are separated by a substantial gap in the data it has little effect on the validity of the sample. When considering the totality of all value of listing price the majority of data falls within the range of $171,600 - $611,600. Sample vs National Statistics As depicted in the above graphs the sample data is comparable to the national population statistics. The mean listing price for the sample is $351,416 and the national mean is $342,365.00. Similarly, the mean sample square feet is 2,225 whereas the nation square footage mean is 2,111. In every single area of comparison, the datasets are very similar. When comparing the histograms between the two variable the sample data and national data appear to have a similar shape. The only major contrast in the national data and sample data is the gap in
Median Housing Price Model for D. M. Pan National Real Estate Company 6 listing price. In the histogram for sample listing price there is a gap between $611,600 - $721,600 whereas in the national statistic there is no gap. The similarity between the datasets indicates the randomly generated sample of 50 is a good representation of national population averages. Develop Regression Model Based on the scatterplot it can be determined that a regression model is appropriate. As seen in the scatterplot above the correlation between the two variables is positive. When the x values of square feet increase the y values of listing price increase. Majority of the data points are plotted close to the regression line indicating a strong correlation. The only data point considered an outlier would be (5,108, $822,200). This data point lies above the regression line indicating that given the square footage of the home the listing price is above average. If too many outliers appear in a dataset the correlation of the two variables could be greatly skewed. However, in this dataset only one major outlier is present, and every other data point lies near the regression line. Due to this fact it would not be significantly beneficial to remove the outlier 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 $- $100,000.00 $200,000.00 $300,000.00 $400,000.00 $500,000.00 $600,000.00 $700,000.00 $800,000.00 $900,000.00 f(x) = 102.76 x + 122742.5 R² = 0.75 Square Feet and Listing Price of a Home Square Feet Listing Price
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 7 because the correlation is already very strong. Based on the dataset the correlation coefficient is calculated to be 0.867137101 ( r ). A correlation coefficient of 0.8 or above is indicative of a strong between the two variables. Thus, the calculated correlation coefficient supports the observations made above that even with the included outlier the dataset supports a strong linear relationship between square feet and listing price. Determine the Line of Best Fit Regression equation: y = 102.76x + 122742; the slope is 102.76 and the intercept is 122,742 In the above regression equation the slope is 102.76 indicating that for every unit change in square footage the listing price should increse by $102.76. The slope can be interpreted to indcate the price of one square foot. In the regression equation the intercept is 122,742 meaning when square feet (y) is zero the listing price is $122,742. The intercept can be interprated as just the cost of land . R-squared is a numerical statistic that indicates how much the variation of y can be explained by the variation in x. In this scatterplot R-squared represents how much the variation in square feet explain the variation in listing price. Based on the correlation coefficient R-squared is determined to be 0.7519 indicating 75.19% of the variation in listing price can be explained by the variation in square feet. Assuming the square footage of a home is 1500 the regression equation can be used to predict the potential listing price. By substituting 1500 into x, y = 102.76*1500 + 122742 is equal to 276,882. The result of the equation indicates the potential listing price of a 1500 square foot home is $276,882. Conclusions
Median Housing Price Model for D. M. Pan National Real Estate Company 8 In this analysis 50 random data points were taken from a population of 1,000. With the information obtained through the sample the intention was to determine the correlation between square feet and listing price and if these two variable could be used to accurately judge the potential listing price of a home. Based on the scatterplot create using the sample data it was determined that the relationship between square feet was linear. Thus, when square feet of a home increased so did its listing price. It was further determined through the correlation coefficient that the relationship between these two variables was strong. The conclusion drawn from this data indicates square feet is a good benchmark variable of potential listing price of a home. Based on previous analysis these results were expected and meet the expectations for how the data would behave given the variables. The finding of this analysis could significantly change if the data was only chosen from large cities in each state. For example, if data was only taken from Miami, Los Angeles, Las Vegas, D.C., New York, and Huston the analysis would be biased toward larger listing prices because cities and generally more expensive then rural areas. Finally, in follow up research it would be interesting to compare how the number of bedrooms influences listing price and which variable is more significant square footage or number of bedrooms.