Ensemble Asg
docx
keyboard_arrow_up
School
Utah Valley University *
*We aren’t endorsed by this school
Course
4130
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
docx
Pages
2
Uploaded by MagistrateScorpion10499
Dongdi Zhao
Ensemble Analysis Report
Introduction
This analysis focuses on predicting the FY04Giving amount using a regression
dataset. The dataset includes information about individuals, such as gender, class,
year, marital status, major, next degree, and historical giving amounts from 2000 to
2003. The target variable, FY04Giving, represents the giving amount in the fiscal
year 2004.
Data Preprocessing
Handling Missing Values:
Fortunately, there are no missing values in the provided dataset.
Encoding Categorical Variables:
Categorical variables like gender, class, marital status, major, and next degree were
encoded using one-hot encoding for model compatibility.
Feature Scaling:
Numerical features were scaled to ensure uniform contribution to the model.
Exploratory Data Analysis (EDA)
Analyzed the distribution of the target variable and relationships between features.
Outliers were identified, and their potential impact on the model was considered.
Model Selection
Ensemble methods were chosen to maximize predictive power:
Random Forest (RF):
The RF model utilizes decision trees and is known for robustness against overfitting.
Training RMSE: $3,176.62
Testing RMSE: $5,578.57
XGBoost (Gradient Boosting):
XGBoost is an efficient gradient boosting algorithm.
Training RMSE: $2,952.44
Testing RMSE: $5,811.93
Stacking:
Combined predictions from RF and XGBoost.
Stacking RMSE: $4,927.28
Results and Discussion
Random Forest (RF):
Predicts FY04Giving with an RMSE of $5,578.57 on the testing set.
XGBoost:
Achieves an RMSE of $5,811.93 on the testing set.
Stacking:
Combining RF and XGBoost predictions results in an RMSE of $4,927.28.
Ensemble models consistently outperform individual models, indicating their
effectiveness in predicting FY04Giving.
Conclusion
The ensemble approach, particularly stacking RF and XGBoost, proves effective in
predicting FY04Giving. Further optimization and feature engineering may enhance
model accuracy. The choice of regression is suitable for predicting the actual
monetary values in FY04Giving. The provided models can assist in understanding
and forecasting donation amounts for the fiscal year 2004 based on historical data.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help