MAT 240 Project One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

240

Subject

Economics

Date

Feb 20, 2024

Type

docx

Pages

9

Uploaded by GeneralPencil9656

Report
Median Housing Price Prediction Model for D. M. Pan National Real Estate Company 1 Report: Housing Price Prediction Model for D. M. Pan National Real Estate Company Jessie Busch Southern New Hampshire University MAT 240: Applied Statistics Chuck Holbrook February 4, 2024
Median Housing Price Model for D. M. Pan National Real Estate Company 2 Introduction This report will examine the correlation between the selling price and the square footage of properties in the United States. The data is appropriate for a linear regression since it shows a connection between the listing price and the square footage of a house. It also makes it possible to predict future sales prices for a property with a given square footage. When using linear regression, you would expect the scatterplot to show a linear relationship between the two variables. This means that the data points will tend to cluster around the regression straight line. The scatterplot will show a positive relationship; as the square footage in a house or the predictor variable (X) increases, so does the home's listing price, the response variable (Y), and the line will slope upwards. The same could be said if the relationship is negative; the line will slope downwards when one variable increases and the other decreases. Additionally, the degree of scatter indicates the strength of the linear relationship. Less scatter means a stronger linear relationship. Data Collection I selected 50 samples at random from all different regions located in the United States. This was generated by using the random data generator formula =RAND() in Microsoft Excel. The formula mixed all the regions into a random order. Next, I selected 50 samples from the list, making sure that they were chosen without bias and at random. The X variable is square feet, the predictor, and the Y variable is the listing price, the response.
Median Housing Price Model for D. M. Pan National Real Estate Company 3 1,000 2,000 3,000 4,000 5,000 6,000 7,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 Scatter Plot Regions in the US Square Footage Listing Price Data Analysis
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 4   Square Footage Listing Price Mean 2,085 334,398 Median 1,852 308,500 Std Dev 923.1340378 144912.514 Looking at each of these graphs, we can see that most of the listing prices and square footage are remarkably close then showing a gap and three data points that are further from the others. In the scatter plot, the three dots furthest from all the others are considered outliers. They are shown at 4,736 sq ft with a listing price of $633,300, 4,929 sq ft with a listing price of $854,40, and 5,774 sq ft with a listing of $869,200. The three outliers cause the gap because those properties show extremely high square footage and listing prices from the other data points. Overall, the scatterplot shows an upward positive linear pattern. The gap and outliers can also be seen on the histograms for square feet and listing price. The histogram shows the data to be right positive skewed for both the square feet and listing price. Meaning the peak of the data is on the left side of the center. This occurs because most of the data points are located on the left side of the histogram, with fewer extremely high data points appearing as you move to the right. Regarding the spread, they show that the square footage and listing prices are more tightly clustered around the mean. Displaying that most of the properties have similar sizes and prices.
Median Housing Price Model for D. M. Pan National Real Estate Company 5 When comparing the square-foot histograms, our sample shows the center or mean is 2,085, and the national mean is 2,111. This determines a difference of 26 square feet, about 1.24% smaller square footage than the national average. When comparing the histograms for listing prices for our sample, it is $334,398, and the national is $342,365. That is a difference of $7,967, which is about 2.35% less dollars than the national average for a listing price. Regarding the spread, the standard deviation for the listing price nationally is 125,914, and for our sample listing price is 144,912. This means that our sample listing prices are slightly more varied than the national listing prices because the spread of listing prices around the average price is greater in our sample than nationally. This could be due to factors such as listings in regions with very high or low prices. The standard deviation for the square footage of a property nationally is 921, and for the sample is 923. Determining that the square footage of properties in our sample is slightly more spread out from the mean than that of properties nationally. However, the difference is very small, suggesting that the spread of square footage values in our sample are similar to the spread nationally. Both histograms for our sample and the national are right positive skewed.
Median Housing Price Model for D. M. Pan National Real Estate Company 6 Develop Regression Model 1,000 2,000 3,000 4,000 5,000 6,000 7,000 - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 f(x) = 133.14 x + 56840.73 R² = 0.72 Scatter Plot Regions in the US Square Footage Listing Price A regression model is appropriate in this situation because as shown in the scatter plot above when the square footage for a home increases so does the listing price. Since the line slopes upwards, this confirms a positive, linear association. The slope of the line indicates the rate at which the listing price increases for each
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Median Housing Price Model for D. M. Pan National Real Estate Company 7 additional square foot of property. There are three outliers that are located near the top of the line and at the top of the line. These are considered outliers because of how far they are from the other data points. It means that these properties are significantly larger and more expensive than most of other properties in the data set. It could be due to a variety of factors, such as the property having unique features, being in a particularly desirable location. Calculate r : To calculate the correlation coefficient or  r , I used the formula =CORREL in Microsoft Excel. With this formula and the data provided, it was determined that  r  is 0.84813844. This measures the direction and strength of a linear relationship between two variables on the scatter plot. Due to the  r  being close to 1, it is considered a strong linear relationship due to 0.80<|R|≤ 1.00. The motion of the data points based on the scatter plot is upward, which shows that there is a positive relationship between the two variables. Determine the Line of Best Fit For this sample the regression equation is y = 133.14x + 56841. The square foot is X and listing price is Y. The slope is 133.14 and the y-intercept is 56841; these are the results of solving the regression equation. Based on this data, we can calculate that for every square foot increase, a property's listing price will go up by $133.14. Due to the fact that X equals zero and the y-axis is not crossed, the y-intercept, which is the point at which the line crosses the y-axis on a scatter plot, is not applicable in this particular scenario. Because the slope is positive, there is a positive correlation between the two variables. Based on the data in the scatter plot above, the R squared 0.7193. Determining that about 72% of the variation in the listing price of properties in these regions can be explained by the square footage of those properties. Using the regression
Median Housing Price Model for D. M. Pan National Real Estate Company 8 equation y =133.14(1500) + 56841, you can predict a listing price of a property with 1,500 square feet to be $256,551. Conclusions From this report, we can see that the data provided shows us the national average in square feet is similar to our data sample. There is only a 1.24% difference, with our sample being at 2,085, which is slightly smaller than the national sample at 2,111. There is also a slight difference of about 2.35% less dollars between the listing prices. Our sample shows $334,386, and the national average is $342,365. A linear regression was used to show a relationship between two or more variables. In this case, the dependent or response variable is the listing price (y), and the independent or predictor variable is the square footage (x). Due to the data, we expect the scatterplot to have a positive correlation between the X and Y variables. This means that as the square footage of a property increases, so does the property's listing price. Because of this, a positive, strong linear line was displayed on our scatterplot. Using the slope helped determine how much a listing price would increase according to the given data. With the information provided, we can use any range of square feet to determine the listing price by using the equation y= 133.14 * (any square footage) + 56841. Overall, given the small percentage of differences, it is safe to assume that when comparing both samples, ours is similar to the national sample. These small differences could be due to factors such as location, property type, or market conditions. The results are what I expected and were not too surprising. I found this information helpful for solving the relationship between the selling price of a property and the square footage. Changes that could support different results are if the data points formed a curve or cluster, then the results would not be linear. If the data points form a curve, it suggests that as one variable changes, the other variable
Median Housing Price Model for D. M. Pan National Real Estate Company 9 does not change. A cluster indicates that there isn't a clear relationship between the variables. One question that would be interesting for follow-up research is: How does the average square footage of homes correlate with the average listing prices in different regions over a 10-year period? This question would allow for an in-depth analysis of the housing market trends over time and the data could be used to identify any correlations or trends. This could provide valuable insights into the housing market and be useful for real estate companies and home buyers.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help