251

jpg

School

Govt. College for the Elementary Teachers, Kasur *

*We aren’t endorsed by this school

Course

MISC

Subject

Industrial Engineering

Date

Nov 24, 2024

Type

jpg

Pages

Uploaded by hdhdbdn

False False True False False False False True False False True False False True False True False False False False False False True False True False False False False True False True False False False False True True False True False False False False] 0 10 20 30 40 50 60 70 o= m . NN NN BN NN SN S SEEEE— N SNBSS E S— Sample index Figure 4-9. Features selected by SelectPercentile As you can see from the visualization of the mask, most of the selected features are the original features, and most of the noise features were removed. However, the recovery of the original features is not perfect. Let's compare the performance of logistic regression on all features against the performance using only the selected features: In[41]: from sklearn.linear_model import LogisticRegression # transform test data X_test_selected = select.transform(X_test) lr = LogisticRegression() lr.fit(X_train, y_train) print("Score with all features: {:.3f}".format(lr.score(X_test, y_test))) lr.fit(X_train_selected, y_train) print("Score with only selected features: {:.3f}".format( 1lr.score(X_test_selected, y_test))) Out[41]: Score with all features: 0.930 Score with only selected features: 0.940 In this case, removing the noise features improved performance, even though some of the original features were lost. This was a very simple synthetic example, and out- comes on real data are usually mixed. Univariate feature selection can still be very helpful, though, if there is such a large number of features that building a model on them is infeasible, or if you suspect that many features are completely uninformative. Model-Based Feature Selection Model-based feature selection uses a supervised machine learning model to judge the importance of each feature, and keeps only the most important ones. The supervised model that is used for feature selection doesn't need to be the same model that is used for the final supervised modeling. The feature selection model needs to provide some measure of importance for each feature, so that they can be ranked by this measure. Decision trees and decision tree-based models provide a feature_importances_ 238 | Chapter 4: Representing Data and Engineering Features

Discover more documents: Sign up today!

Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

251

Related Documents