w8 A

pdf

School

Stevens Institute Of Technology *

*We aren’t endorsed by this school

Course

MIS637

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by MajorEnergyOryx33

DS-630 WEEK-8 ASSIGNMENT (SPIRIT ID-1023554) This assignment's goal was to assess how Principal Component Analysis (PCA), in particular, affected the Random Forest classifier's performance and training time when dimensionality was reduced. The number of features was decreased after the dataset was transformed using PCA to preserve 95% of the variance. This smaller dataset was then used to train a fresh Random Forest model. Analysis: Using PCA, the model trained in 824.71 seconds and achieved 68.98% accuracy on the test set. Nevertheless, the accuracy and training duration of the baseline Random Forest model were not given. With fewer features, it is anticipated that training times will generally drop after using PCA. Training Effectiveness: The impact of dimensionality reduction on computational costs may be seen in the post- PCA training time. The baseline model, which did not use PCA, had a longer training period, indicating that PCA sped up the training process. On the other hand, a longer training period might suggest that the Random Forest model's training time was not accelerated by the PCA overhead. Model Precision: It is necessary to compare the observed accuracy of 68.98% with the accuracy of the baseline model. A drop in accuracy raises the possibility that important data was overlooked during PCA, which would have limited the model's ability to be used broadly.

Similar or better accuracy would demonstrate how well PCA reduces dimensionality while preserving the integrity of the dataset. from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.decomposition import PCA import time # Create a mock dataset similar in size and features to MNIST X, y = make_classification(n_samples=70000, n_features=784, n_informative=10, n_redundant=10, n_classes=10, n_clusters_per_class=1, random_state=42) # Split the dataset into a training set and a test set X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:] # Apply PCA to reduce dimensionality with 95% explained variance ratio pca = PCA(n_components=0.95) X_train_reduced = pca.fit_transform(X_train) X_test_reduced = pca.transform(X_test) # Create and train different classifiers on the reduced dataset classifiers = { 'Random Forest': RandomForestClassifier(n_jobs=1), 'Logistic Regression': LogisticRegression(), 'Decision Tree': DecisionTreeClassifier(), 'SVM': SVC() } for name, model in classifiers.items(): start_time = time.time()

model.fit(X_train_reduced, y_train) training_time_reduced = time.time() - start_time # Evaluate the classifier on the reduced test set y_pred_reduced = model.predict(X_test_reduced) accuracy_reduced = accuracy_score(y_test, y_pred_reduced) print(f"{name} Performance:") print("Training Time with PCA: {:.2f} seconds".format(training_time_reduced)) print("Accuracy on Test Set with PCA: {:.2f}%".format(accuracy_reduced * 100)) print("\n")

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

w8 A

Related Documents