report_lab01
docx
keyboard_arrow_up
School
University of Victoria *
*We aren’t endorsed by this school
Course
403
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
docx
Pages
7
Uploaded by CorporalMousePerson1028
1. Objective
In this experiment, I am trying to build a linear regression model to predict fuel consumption for
automobiles based on different features. To train such a model, a dataset consisting of 392
samples is used. Each sample is described with 6 input features and 1 output measure. (see Table
1
):
Table 1- Features in D_mpg dataset
#
Name
Description
1
cylinders
multi-valued discrete
2
displacement
Continuous
3
horsepower
Continuous
4
weight
Continuous
5
acceleration
Continuous
6
model year
multi-valued discrete
7(output)
Fuel Consumption
Continuous
To evaluate the model, the dataset is split randomly into two parts, training (80%) and test
(20%) set. After training the model, the mean square error (MSE) is computed to evaluate the
model.
2. Introduction
Linear Regression is the simplest form of machine learning to find the relationship between a
continuous dependent variable and one or more independent variables. The independent
variables are called features and the dependent variable is called the output.
Suppose that we have ? ∈ ?
𝑑×?
, which represents N data sample and each sample has d features.
Also, we can represent the output Y which is the dependent variable. We will define a linear
relationship between these two variables as follows:
? = 𝑤? + ?
This equation implies that the dependent variable is a linear combination of the dependent
variables. In this formula, w and b are the parameters of the linear model. w defines the weight
of each independent variable and b is the bias. The formula can be written in matrix form:
? = ? ?
̂
In which:
? = [𝑤, ?], ?
̂ = [?; ⃗
1
]
To find the optimal parameters, we can use a linear equation system.
?
∗
= (
?
̂
𝑡?
?
̂
𝑡?
?
?
−1
𝑡?
?
𝑡?
)
3.
Implementation and Results
3.1. Implementation
There are 7 steps in implementing linear regression.
1.
Loading Data
2.
Seperating dependent and independent variables.
3.
Adding feature 1 as the last feature (for bias)
4.
Computing best parameters
5.
Computing the prediction using optimal parameters
6.
Computing MSE for train and test data
7.
Visualization
3.2. MATLAB code
The code for my implementation is as below: .
% Loading data
D_mpg = load(
"D_mpg.mat"
); D_mpg
= D_mpg.D_mpg;
% Extracting output column y = D_mpg(7,:)';
% Number of Samples M = length(y);
% Number of training samples P = 314;
% Number of test samples T = M - P;
% Extract Features
X = D_mpg(1:6,:);
% Add a rows on 1
Xh = [X; ones(1,M)];
% Splitting data to test and train set
Xh_tr = Xh(:,1:P);
y_tr = y(1:P);
Xh_te = Xh(:,P+1:M);
y_te = y(P+1:M);
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
% Computing the prediction for train and test data y_pred_tr = w_star'*Xh_tr;
y_pred_te = w_star'*Xh_te;
% Computig MSE
tr_error = y_tr - y_pred_tr'; te_error = y_te - y_pred_te';
mse_tr = sqrt((tr_error'*tr_error)/length(y_pred_tr)); mse_te = sqrt((te_error'*te_error)/length(y_pred_te));
% Display output
fprintf(
"==========\nMSE REPORT\n==========\n"
)
fprintf(
"MSE for Train Data : %f\n"
,mse_tr); fprintf(
"MSE for Test Data : %f\n"
,mse_te);
plot(1:length(y_pred_te), y_pred_te , '-bo'
); hold on
plot(1:length(y_pred_te), y_te ,
'-r*'
);
legend(
'True Lable'
, 'Prediction'
); title(
'Prediction on D_mpg Dataset'
);
figure;
[hr,bins] = hist(tr_error,10); bar(bins, hr,
'r'
)
hold on
[he,bins] = hist(te_error,10); bar(bins, he,
'g'
)
legend(
'Train'
,
'Test'
)
C = cov(y_te, y_pred_te);
pearson_coeff = C(2)/(std(y_te)*std(y_pred_te));
figure; plot(y_te,y_pred_te,
'o'
) hold on
plot([min(y_te)-5,max(y_te)+5],[min(y_te)-5,max(y_te)+5],
'--'
) xlabel(
'Y true label'
)
ylabel(
'Y Predicted'
) title(sprintf(
'C = %f'
,pearson_coeff))
figure; bar(w_star(1:end-
1))
title(
"Feature Importance"
) ylabel(
"Weight"
)
3.3. Results
After training the model (finding the optimum parameters), the prediction for the test data is
calculated. Figure 1 shows the targets as well as predicted values. It seems that the accuracy of
the model is acceptable.
Figure 1- The target and predicted values
Also, I computed the MSE for train and test data which are shown in table2.
MSE for Train Set
MSE for Test Set
3.566768
2.666201
In this case, the error for training data is higher than the error for test data. Also the error
distribution for train and test data is presented in Figure 2 .As can be seen, for around 50% of the
samples, the absolute of the error is less than 3 and the maximum error is about 10. Furthermore,
Figure 3 shows the pearson coefficient and regression line between target values and predicted
values. In this figure, the more data point lays on the x=y line, the higher accuracy the model has.
xlabel(
"Feature"
) xticklabels({
"#cylinders"
,
"displacement"
,
"horsepower"
,
"
weig ht"
,
"acceleration"
,
"model year"
})
Figure 2- Distribution of error for train and test data
Figure 3- Correlation line between target and predicted values
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4. Discussion
The result of the model shows that the model is trained and can accurately predict the dependent
variable based on the features. In general, the trained model has better performance on trained
data. But in this case, the model has lower MSE on test data which is probably due to the
simplicity of test samples.
After finding the model parameters, the weight for each feature can be seen as its importance in
predicting the output. Figure 4
shows the weight of the optimal model. We can see that the most
important feature is model year and the second most important is the number of cylinders and
after that the acceleration is the most contributor to predicting the output.
Figure 4- Feature Importance
5. Conclusion
In a conclusion the linear model can model the relationship between fuel consumption and the
other six features which implies that there is a linear relationship between them.
After training the model, the parameter