The following dataset is a historic record of 14 houses that were sold in a small town in BC. The dataset is used to predict whether a new house in the same town will be sold in 10 days if listed with a specific price based on certain attributes. We are considering only four attributes (price, number of bedrooms, size, and distance to bus stop) just to simplify the calculations in this assignment but more attributes should be considered in real applications. House Price Number of Bedrooms Size (sqft) Distance to Bus-Stop House sold in 10 days? House 1 $300,000 1 3,500 sqft far No House 2 $300,000 1 3,500 sqft near No House 3 $250,000 1 3,500 sqft far Yes House 4 $350,000 2 3,500 sqft far Yes House 5 $350,000 3 5,000 sqft far Yes House 6 $350,000 3 5,000 sqft near No House 7 $250,000 3 5,000 sqft near Yes House 8 $300,000 2 3,500 sqft far No House 9 $300,000 3 5,000 sqft far Yes House 10 $350,000 2 5,000 sqft far Yes House 11 $300,000 2 5,000 sqft near Yes House 12 $250,000 2 3,500 sqft near Yes House 13 $250,000 1 5,000 sqft far Yes House 14 $350,000 2 3,500 sqft near No Build a decision tree to predict whether a new house listing in the same town will be sold in 10 days based on the given attributes. Use ID3 algorithm. To answer this question, you need to complete the following steps: Calculate the entropy of the whole dataset. Identify the first attribute to split on such that the corresponding information gain is maximal. To do that, you need to calculate the information gain for each one of the four attributes. Draw the final tree.
The following dataset is a historic record of 14 houses that were sold in a small town in BC. The dataset is used to predict whether a new house in the same town will be sold in 10 days if listed with a specific price based on certain attributes. We are considering only four attributes (price, number of bedrooms, size, and distance to bus stop) just to simplify the calculations in this assignment but more attributes should be considered in real applications.
House |
Price |
Number of Bedrooms |
Size (sqft) |
Distance to Bus-Stop |
House sold in 10 days? |
House 1 |
$300,000 |
1 |
3,500 sqft |
far |
No |
House 2 |
$300,000 |
1 |
3,500 sqft |
near |
No |
House 3 |
$250,000 |
1 |
3,500 sqft |
far |
Yes |
House 4 |
$350,000 |
2 |
3,500 sqft |
far |
Yes |
House 5 |
$350,000 |
3 |
5,000 sqft |
far |
Yes |
House 6 |
$350,000 |
3 |
5,000 sqft |
near |
No |
House 7 |
$250,000 |
3 |
5,000 sqft |
near |
Yes |
House 8 |
$300,000 |
2 |
3,500 sqft |
far |
No |
House 9 |
$300,000 |
3 |
5,000 sqft |
far |
Yes |
House 10 |
$350,000 |
2 |
5,000 sqft |
far |
Yes |
House 11 |
$300,000 |
2 |
5,000 sqft |
near |
Yes |
House 12 |
$250,000 |
2 |
3,500 sqft |
near |
Yes |
House 13 |
$250,000 |
1 |
5,000 sqft |
far |
Yes |
House 14 |
$350,000 |
2 |
3,500 sqft |
near |
No |
Build a decision tree to predict whether a new house listing in the same town will be sold in 10 days based on the given attributes. Use ID3 algorithm.
To answer this question, you need to complete the following steps:
-
Calculate the entropy of the whole dataset.
-
Identify the first attribute to split on such that the corresponding information gain is maximal. To do that, you need to calculate the information gain for each one of the four attributes.
-
Draw the final tree.
The ID3 (Iterative Dichotomiser 3) algorithm is a classic decision tree learning algorithm used for classification tasks in machine learning and data mining. It was developed by Ross Quinlan in the 1980s and is one of the foundational algorithms in the field of supervised learning. Decision trees are a popular type of model used for both classification and regression tasks.
Step by step
Solved in 9 steps with 16 images