Beteta - MAT 303 Project One

docx

School

Southern New Hampshire University *

*We aren’t endorsed by this school

Course

303

Subject

Mathematics

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by HighnessProton14584

MAT 303 Project One Summary Report Diego Beteta diego.beteta@snhu.edu Southern New Hampshire University 1

1. Introduction In this project, we are analyzing a dataset of historical housing sales in Seattle to understand how different attributes of a house, such as its size, age, and location, affect its selling price. The primary goal is to create regression models that can accurately predict a house's selling price based on these attributes. This is crucial for our real estate company to set appropriate listing prices for homes. We will perform various statistical analyses, including multiple regression, to assess the combined effect of several factors and explore interactions between qualitative variables (like the presence of a backyard or renovation status) to see how they jointly influence price. Additionally, we might use quadratic regression to examine non-linear relationships, like how increases in square footage might lead to a disproportionate increase in price. The results of these analyses will help our company make informed decisions about pricing homes, ensuring competitiveness and profitability in the real estate market. 2. Data Preparation Several important variables stand out for analyzing and predicting house prices in this dataset, which consists of 2,692 rows and 23 columns. In this project, we'll focus on a subset of key variables from the dataset, each representing different aspects of a home and its surroundings, which are crucial for predicting its sale price: 1. price : This is the home's sale price, the primary variable we aim to predict. 2. bedrooms : The number of bedrooms in the home, indicating its accommodation capacity. 3. bathrooms : The number of bathrooms contributing to the home's convenience and comfort. 4. sqft_living : The size of the living area in square feet, a direct measure of the home's size. 5. sqft_above : The size of the upper level in square feet, giving an idea of the additional living space apart from the main level. 6. sqft_lot : The size of the lot on which the house sits, in square feet, indicating the amount of outdoor space. 7. age : The age of the home, which can impact its style, condition, and appeal. 8. grade : A measure of craftsmanship and the quality of materials used in the home, reflecting its overall build quality. 9. appliance_age : The average age of all appliances in the home, indicating the need for updates or replacements. 10. crime : The crime rate per 100,000 people in the area, a factor that can influence the desirability of a neighborhood. 11. backyard : Indicates whether the home has a backyard (1) or not (0), an important feature for many buyers. 12. school_rating : The average rating of schools in the area is often a significant consideration for families. 13. view : Describes whether the home backs out to a lake (2), trees (1), or a road (0), affecting its aesthetic and potentially its value. Analyzing how these variables interact and influence the home's sale price will be central to developing effective predictive models in this project. 2

3. Model #1 - First Order Regression Model with Quantitative and Qualitative Variables Correlation Analysis The scatterplot of home prices versus living area in square feet reveals a positive trend, indicating that as the size of the living area increases, the price of the home tends to increase as well. This suggests a correlation between larger living spaces and higher home prices. However, while the general trend shows this positive relationship, there is considerable variability in prices for homes with similar living areas, which could be influenced by other factors not displayed in this plot, such as location, home condition, or additional amenities. The scatterplot also shows a concentration of data points at the lower end of living area sizes, implying that most of the homes in the dataset have smaller living spaces, and their prices vary widely within this range. The correlation coefficient between the price of homes and their living area (sqft_living) is approximately 0.69. This indicates a moderate to strong positive correlation, meaning as the living area of a home increases, its price also tends to increase. However, while this correlation is significant, it could be better, implying that other factors also play a role in determining the home's price. A correlation coefficient closer to 1 would indicate a stronger, more direct relationship, but 0.69 suggests that living area is a notable, yet not exclusive, predictor of home prices. 3

Your preview ends here