report_lab01

docx

School

University of Victoria *

*We aren’t endorsed by this school

Course

403

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

docx

Pages

7

Uploaded by CorporalMousePerson1028

Report
1. Objective In this experiment, I am trying to build a linear regression model to predict fuel consumption for automobiles based on different features. To train such a model, a dataset consisting of 392 samples is used. Each sample is described with 6 input features and 1 output measure. (see Table 1 ): Table 1- Features in D_mpg dataset # Name Description 1 cylinders multi-valued discrete 2 displacement Continuous 3 horsepower Continuous 4 weight Continuous 5 acceleration Continuous 6 model year multi-valued discrete 7(output) Fuel Consumption Continuous To evaluate the model, the dataset is split randomly into two parts, training (80%) and test (20%) set. After training the model, the mean square error (MSE) is computed to evaluate the model. 2. Introduction Linear Regression is the simplest form of machine learning to find the relationship between a continuous dependent variable and one or more independent variables. The independent variables are called features and the dependent variable is called the output. Suppose that we have ? ∈ ? 𝑑×? , which represents N data sample and each sample has d features. Also, we can represent the output Y which is the dependent variable. We will define a linear relationship between these two variables as follows: ? = 𝑤? + ? This equation implies that the dependent variable is a linear combination of the dependent variables. In this formula, w and b are the parameters of the linear model. w defines the weight of each independent variable and b is the bias. The formula can be written in matrix form: ? = ? ? ̂ In which: ? = [𝑤, ?], ? ̂ = [?; 1 ] To find the optimal parameters, we can use a linear equation system. ? = ( ? ̂ 𝑡? ? ̂ 𝑡? ? ? −1 𝑡? ? 𝑡? )
3. Implementation and Results 3.1. Implementation There are 7 steps in implementing linear regression. 1. Loading Data 2. Seperating dependent and independent variables. 3. Adding feature 1 as the last feature (for bias) 4. Computing best parameters 5. Computing the prediction using optimal parameters 6. Computing MSE for train and test data 7. Visualization 3.2. MATLAB code The code for my implementation is as below: . % Loading data D_mpg = load( "D_mpg.mat" ); D_mpg = D_mpg.D_mpg; % Extracting output column y = D_mpg(7,:)'; % Number of Samples M = length(y); % Number of training samples P = 314; % Number of test samples T = M - P; % Extract Features X = D_mpg(1:6,:); % Add a rows on 1 Xh = [X; ones(1,M)]; % Splitting data to test and train set Xh_tr = Xh(:,1:P); y_tr = y(1:P); Xh_te = Xh(:,P+1:M); y_te = y(P+1:M);
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
% Computing the prediction for train and test data y_pred_tr = w_star'*Xh_tr; y_pred_te = w_star'*Xh_te; % Computig MSE tr_error = y_tr - y_pred_tr'; te_error = y_te - y_pred_te'; mse_tr = sqrt((tr_error'*tr_error)/length(y_pred_tr)); mse_te = sqrt((te_error'*te_error)/length(y_pred_te)); % Display output fprintf( "==========\nMSE REPORT\n==========\n" ) fprintf( "MSE for Train Data : %f\n" ,mse_tr); fprintf( "MSE for Test Data : %f\n" ,mse_te); plot(1:length(y_pred_te), y_pred_te , '-bo' ); hold on plot(1:length(y_pred_te), y_te , '-r*' ); legend( 'True Lable' , 'Prediction' ); title( 'Prediction on D_mpg Dataset' ); figure; [hr,bins] = hist(tr_error,10); bar(bins, hr, 'r' ) hold on [he,bins] = hist(te_error,10); bar(bins, he, 'g' ) legend( 'Train' , 'Test' ) C = cov(y_te, y_pred_te); pearson_coeff = C(2)/(std(y_te)*std(y_pred_te)); figure; plot(y_te,y_pred_te, 'o' ) hold on plot([min(y_te)-5,max(y_te)+5],[min(y_te)-5,max(y_te)+5], '--' ) xlabel( 'Y true label' ) ylabel( 'Y Predicted' ) title(sprintf( 'C = %f' ,pearson_coeff)) figure; bar(w_star(1:end- 1)) title( "Feature Importance" ) ylabel( "Weight" )
3.3. Results After training the model (finding the optimum parameters), the prediction for the test data is calculated. Figure 1 shows the targets as well as predicted values. It seems that the accuracy of the model is acceptable. Figure 1- The target and predicted values Also, I computed the MSE for train and test data which are shown in table2. MSE for Train Set MSE for Test Set 3.566768 2.666201 In this case, the error for training data is higher than the error for test data. Also the error distribution for train and test data is presented in Figure 2 .As can be seen, for around 50% of the samples, the absolute of the error is less than 3 and the maximum error is about 10. Furthermore, Figure 3 shows the pearson coefficient and regression line between target values and predicted values. In this figure, the more data point lays on the x=y line, the higher accuracy the model has. xlabel( "Feature" ) xticklabels({ "#cylinders" , "displacement" , "horsepower" , " weig ht" , "acceleration" , "model year" })
Figure 2- Distribution of error for train and test data Figure 3- Correlation line between target and predicted values
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4. Discussion The result of the model shows that the model is trained and can accurately predict the dependent variable based on the features. In general, the trained model has better performance on trained data. But in this case, the model has lower MSE on test data which is probably due to the simplicity of test samples. After finding the model parameters, the weight for each feature can be seen as its importance in predicting the output. Figure 4 shows the weight of the optimal model. We can see that the most important feature is model year and the second most important is the number of cylinders and after that the acceleration is the most contributor to predicting the output. Figure 4- Feature Importance 5. Conclusion In a conclusion the linear model can model the relationship between fuel consumption and the other six features which implies that there is a linear relationship between them. After training the model, the parameter