hw2
pdf
keyboard_arrow_up
School
University of California, Riverside *
*We aren’t endorsed by this school
Course
240
Subject
Electrical Engineering
Date
Apr 3, 2024
Type
Pages
5
Uploaded by LieutenantCapybara2644
EE 240: Pattern Recognition and Machine Learning
Homework 2
Due date:
May 6, 2023
Description:
These questions will explore multiple topics in logistic regression, perceptron learning, and
support vector machines.
Reading assignment:
ESL: Ch. 2, 3, 4, & 12, AML: Ch. 3 & 8
Homework and lab assignment submission policy:
All homework and lab assignments must be submitted online via
https://eLearn.ucr.edu
.
Submit your homeworks as a single Python notebook that is free of any typos or errors. Talk to your
TA to make sure that the Python version match.
Homework solutions should be written and submitted individually, but discussions among students are
encouraged.
All assignments should be submitted by the due date. You will incur 25% penalty for every late day.
H2.1
Logistic regression: Suppose we have labeled data
(
x
i
, y
i
)
, where
x
i
∈
R
d
,
y
∈ {
0
,
1
}
. We want to
compute
w
that minimizes the following negative log-likelihood function:
minimize
w
N
i
=1
y
i
log(1 + exp(
-
w
T
x
i
)) + (1
-
y
i
) log(1 + exp(
w
T
x
i
))
.
(1)
Let us denote the objective of this function as
J
(
w
)
. (Note that we could also define
y
i
∈ {-
1
,
1
}
and rewrite the objective in (1) as
J
(
w
) =
∑
N
i
=1
log(1 + exp(
-
y
i
w
T
x
i
)
.)
(a) Show the the following log-logistic function is convex:
(
10 pts
)
f
(
w
) = log(1 + exp(
-
w
T
x
))
.
You may assume that
x
is a scalar and show that the second derivative of
f
(
w
)
w.r.t.
w
is
always positive.
(b)
Next you will write a Python script for logistic regression that solves
(1)
using gradient descent
algorithm with a fixed step size
α
. You will then use your code to learn to classify images of
digits from the MNIST dataset.
(
40 pts
)
You need to implement the logistic regression routine using basic Python functions.
Do
not use any built-in function for logistic regression.
You can show that the expression for
∇
w
J
=
-
∑
N
i
=1
(
y
i
-
h
w
(
x
i
))
x
i
, where
h
w
(
x
) =
1
1 +
e
-
w
T
x
.
MNIST data set contains 70,000 handwritten digits, each of size
28
×
28
pixels. You can learn
more about this dataset at
http://yann.lecun.com/exdb/mnist/
. The first 60,000 samples
are training data and the last 10,000 are test data. You can load the data and divide into
training and test sets using code below.
from sklearn.datasets import fetch_openml
# load data
# mnist = fetch_openml(’mnist_784’)
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)
1
# note that images are stored as rows in the data array (70000,784)
print(X.shape)
print(y.shape)
# split data into training and testing set
train_x = X[0:60000,:]
train_y = y[0:60000]
test_x = X[60000:70000,:]
test_y = y[60000:70000]
Then choose the data corresponding to digits "0" and "1".
# Choose only two digits
class_0 = ’3’
class_1 = ’1’
X0 = train_x[train_y==class_0,:]
X1 = train_x[train_y==class_1,:]
y0 = np.zeros(X0.shape[0],int)
y1 = np.ones(X1.shape[0],int)
train_x = np.concatenate((X0,X1),axis=0)
train_y = np.concatenate((y0,y1),axis=0)
print(’Number of training samples is {0} with {1} labels 0 and {2} labels 1’
.format(train_x.shape[0],y0.shape[0],y1.shape[0]))
X0 = test_x[test_y==class_0,:]
X1 = test_x[test_y==class_1,:]
y0 = np.zeros(X0.shape[0],int)
y1 = np.ones(X1.shape[0],int)
test_x = np.concatenate((X0,X1),axis=0)
test_y = np.concatenate((y0,y1),axis=0)
print(’Number of test samples is {0} with {1} labels 0 and {2} labels 1’
.format(test_x.shape[0],y0.shape[0],y1.shape[0]))
After that you need to append
1
before the feature vectors.
# append 1 for the constant term b (remember our model is w^T x + b)
train_x = np.insert(train_x,0,1,axis = 1)
test_x = np.insert(test_x,0,1,axis = 1)
Then write your codes for logistic regression.
# Define a sigmoid function
def sigmoid(x):
return 1/(1+np.exp(-x))
sigmoid_vec = np.vectorize(sigmoid)
# Define the gradient function
Your code goes here ...
2
# Write your code for logistic regression
# with gradient descent with a fixed step size
Your code goes here ...
# w is the output of logistic regression (includes weights and constant term)
# length of w for MNIST case will be 785
predict = np.round(sigmoid_vec(np.matmul(test_x.astype(float),w)))
acc =100.0*np.sum(test_y == predict)/test_y.shape[0]
print(acc)
H2.2
Perceptron learning algorithm (PLA) and kernel extension: In this problem you will implement PLA
to build a simple binary classification system. Suppose we have labeled data
(
x
i
, y
i
)
, where
x
i
∈
R
d
+1
with
x
i
(1) = 1
and
y
∈ {-
1
,
+1
}
. Let us define
h
w
(
x
i
) =
sign
w
T
x
i
. We want to compute
w
using
PLA.
(
50 pts
)
(a)
Data Processing: Generate two examples of 2D
linearly separable
dataset with
N
= 100
samples
each. (To do this, you will first generate a weight vector and constant term,
w
, and then assign
±
1
labels to your data samples as
y
i
=
h
w
(
x
i
)
.) Let us call the two datasets “Data1” and
“Data2”. For Data1, randomly select
80%
of the samples for training and the remaining
20%
for
testing on Data1 (80/20). For Data2, randomly select
30%
of the samples for training and the
remaining
70%
for testing (30/70).
(b)
Implementation: Write a script for PLA in Python by initializing
w
= 0
and using the following
update rule:
for i=1,...,N
w
=
w
+
1
2
(
y
i
-
h
w
(
x
i
))
x
i
.
(c)
Training: Use your PLA implementation to train two linear classifiers for Data1 and Data2
using the respective training samples. Plot the training samples, test samples, and separation
lines for both examples in two separate plots.
(d)
Testing: Using the test samples in each example, compute precision, recall, and f1-score for
your classifiers.
Precision is defined as
tp
tp
+
fp
;
tp
and
fp
denote the number of
true
and
false
positives.
Recall is defined as
tp
tp
+
fn
;
fn
denotes the number of
false
negatives.
f1-score is defined as
2
precision
×
recall
precision
+
recall
; it can be viewed as a measure of accuracy and ranges
between 0 (worst) and 1 (best).
(e)
Kernel perceptron: Note that we can define the weight vector as
w
=
∑
N
i
=1
α
i
y
i
x
i
. We can view
this as the dual perceptron in which we initialize
α
i
= 0
for
i
= 1
, . . . , N
and update them as
follows.
for j=1,...,N
ˆ
y
= sign
N
i
=1
α
i
y
i
x
T
i
x
j
if
ˆ
y
=
y
j
,
update
α
j
=
α
j
+ 1
Implement the kernel perceptron and demonstrate that the classifier you learn is identical to
the one in step (b).
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
H2.3
Use a second degree polynomial kernel to classify four samples in 2D space, distributed in an XOR
pattern (i.e., points on opposite diagonals of a rectangle have same labels, as shown in the table and
figure below.)
(
40 pts
)
y
x
(1)
x
(2)
1
-1
-1
-1
-1
1
-1
1
-1
1
1
1
Table 1:
Table for class labels and sample
values in an XOR pattern
Figure 1:
Blue diamonds and red circles
denote samples for two classes
(a)
Map 2D feature vectors
x
=
x
(1)
x
(2)
T
using kernel function
K
(
x
1
,
x
2
) = (1+
x
T
1
x
2
)
2
, which
results in 6D feature vectors:
φ
(
x
) =
1
√
2
x
(1)
√
2
x
(2)
x
(1)
2
x
(2)
2
√
2
x
(1)
x
(2)
T
1
.
Show that the features vectors are linearly separable in the 5D space.
(b) The dual objective for SVM can be written as
L
(
α
) =
-
1
2
α
T
Gα
+
4
i
=1
α
i
,
where
G
is a
4
×
4
matrix with
G
(
i, j
) =
y
i
y
j
φ
(
x
i
)
T
φ
(
x
j
)
.
Compute
G
matrix and
α
that
maximize
L
(
α
)
.
Hint: You will get a simple linear equation that the optimal solution must satisfy. Compute the
G
matrix and solve the linear problem to get your answer.
(c)
Show that
α
computed in the previous step satisfies the following constrains:
α
i
≥
0
and
∑
4
i
=1
α
i
y
i
= 0
.
(d)
Compute the SVM weight vector
w
=
∑
i
α
i
y
i
φ
(
x
i
)
and an expression for
h
w
(
x
) =
w
T
φ
(
x
) =
∑
i
α
i
y
i
φ
(
x
i
)
T
φ
(
x
)
. Plot
w
and discuss which features in the 6D space have highest effect on
classification.
H2.4
Support vector machines (SVM):
(
60 pts
)
In this problem you will use SVMs to build a simple text classification system. This data set contains
70,000 handwritten digits, each of size
28
×
28
pixels
2
. To begin, you need to get the MNIST dataset.
from sklearn.datasets import fetch_openml
# mnist = fetch_openml(’mnist_784’)
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, as_frame=False)
Each row of
X
corresponds to one image. You can use the following to plot the
j
th
image:
import matplotlib.pyplot as plt
plt.title(’The jth image is a {label}’.format(label=int(y[j])))
plt.imshow(X[j].reshape((28,28)), cmap=’gray’)
plt.show()
1
You
can
also
use
kernel
function
K
(
x
1
,
x
2
)
=
(
x
T
1
x
2
)
2
,
which
results
in
3D
feature
vectors:
φ
(
x
)
=
x
(1)
2
x
(2)
2
√
2
x
(1)
x
(2)
T
.
2
You can learn more about this dataset at http://yann.lecun.com/exdb/mnist/.
4
In this problem you will build a classifier to classify between the (sometimes very similar) images of
the digits “4” and “9”. To get just this data, you can use
X4 = X[y==’4’,:]
X9 = X[y==’9’,:]
There are a little under 7,000 examples from each class. You should begin by using this data to form
three distinct datasets: the first 4,000 points from each class will be used for designing the classifier
(this is the
training
set), and the remainder will be used to evaluate the performance of our classifier
(this is the
testing
set).
Since SVMs involve tuning a parameter
C
, you will also need to set this parameter. We will do this
in a principled way using the so-called “holdout method”. This means that you take the training set
and divide it into two parts: you use the first to fit the classifier, and the second—which we call the
holdout
set—to gauge performance for a given value of
C
. You will want to decide on a finite set of
“grid points” on which to test
C
(I would suggest a logarithmic grid of values to test both
C
1
and
C
1
). For each value of
C
, you will train an SVM on the first part of the set, and then compute
the error rate on the holdout set. In the end, you can then choose the value of
C
that results in a
classifier that gives the smallest error on the holdout set
Note:
Once you have selected
C
, you may then retrain on all the entire training set (including the
holdout set), but you should never use the testing set until every parameter in your algorithm is set.
These are used exclusively for computing the final test error.
To train the SVM , you can use the built in solver from scikit-learn. As an example, to train a linear
SVM on the full dataset with a value of
C
= 1
, we would use
from sklearn import svm
clf= svm.SVC(C=1.0,kernel=’linear’)
clf.fit(X,y)
Once you have trained the classifier, you can calculate the probability of error for the resulting
classifier on the full dataset via
Pe = 1 - clf.score(X,y)
(a)
Train an SVM using the kernels
k
(
u, v
) = (
u
T
v
+ 1)
p
,
p
= 1
,
2
, that is, the inhomogeneous
linear and quadratic kernel. To do this, you will want to set the parameters
kernel=’poly’
and
degree=1
or
degree=2
.
For each kernel, report the best value of
C
, the test error, and the number of data points that
are support vectors (returned via
clf.support_vectors_
). Turn in your code.
(b)
Repeat the above using the radial basis function kernel
k
(
u, v
) =
e
-
γ u
-
v
2
(by setting the
parameters
kernel=’rbf’, gamma=gamma
. You will now need to determine the best value for
both
C
and
γ
. Report the best value of
C
and
γ
, the test error, and the number of support
vectors.
(c)
For each kernel, turn in a
4
×
4
subplot showing images of the 16 support vectors that violate
the margin by the greatest amount (these are, in a sense, the “hardest” examples to classify),
and explain how these are determined. Above each subplot, indicate the true label (4 or 9).
To help get started with this, you can use the following code:
f, axarr = plt.subplots(4, 4)
axarr[0, 0].imshow(X[j].reshape((28,28)), cmap=’gray’)
axarr[0, 0].set_title(’{label}’.format(label=int(y[j])))
...
plt.show()
Maximum points: 200
5
Related Documents
Related Questions
Question 1 Digital Electronics and Combinational Logic
la) Analog and Digital Electronics
i. Write either "digital" or "analog" in this to indicate whether the property in that row is
typical of digital electronics or analog electronics. The first row has been completed as an
example.
Property
Difficult, manual circuit design
Continuous valued signals
Tolerant of electrical noise
Circuit state tends to leak
Intolerant of component variations
Digital/Analog
Analog
ii. In older cars the timing of the electrical pulses to the spark plugs was controlled by a
mechanical distributor. This contained a rotating contact that was mechanically linked to the
rotation of the engine. Newer cars use electronic ignition. Electrical sensors detect the position
and speed of the motor and a digital controller sends ignition pulses to the spark plugs.
Briefly describe 2 likely benefits of the digital electronic ignition system over the mechanical
one. An example is given first.
More flexible control: the…
arrow_forward
Please, I do not want a theoretical solution or using artificial intelligence. I want a solution on paper using the mathematical laws of the topic
arrow_forward
I need the solution to chapter 10, #19
this is from Matlab a practical introduction to programming and problem solving
arrow_forward
Q1: A sequential circuit has one inputs X and one output (Z) is used to detect the sequence 1011. The output Z=1 when
the circuit detect the sequence 1011, otherwise Z=0. Sketch the stated diagrams of the following specifications:
1. Mealy model and the overlap is allowed.
2. Mealy model and the overlap is not allowed.
I need to solve the question step by step and in clear handwriting.
arrow_forward
I can't tell from your solution how the signal was graphed. How do you get from the mathematical expressions to the graph?
This follow up question was rejected and no explanation was given as to why. Do you need more information?
arrow_forward
HW Matlab 1) Create a variable fiemp to store a temperature in degrees Fahrenheit (F). Write m-file to convert this to degrees Celsius and store the result in a variable ctemp. The conversion factor is C = (F —32) * 5/9. 2) Write m-file to generate a matrix of random integers of size 100 by 100 their values between 15 to 80. 3) Free fall of objects is given by y =5mgt? where a is the acceleration, v is the velocity, y is the distance, m is the mass of the object, g is the gravitational acceleration. Plot the distance and velocity of the object for 15 seconds after its fall from rest (y = 0). Take m = 0.2 kg.
arrow_forward
You want to design a grading machine. The input is 3-bit score (0-7). The output is a 7-seg controller that shows the grade. You want to display
A. when the score is 7
B. when the score is 5,6
C. when the score is 3,4,
D. when the score is 1,2
F. when the score is 0
arrow_forward
3. The stability of the plant depands on its:
a. Gain matrix.
b. Sate matrix.
c. Input matrix.
d. Control effort.
e. All above.
4. In control systems, there are
problems.
a. two
b. three
c. four
d. five
types of optimal control
arrow_forward
please solve number 1
arrow_forward
Please help solve with explanation
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

EBK ELECTRICAL WIRING RESIDENTIAL
Electrical Engineering
ISBN:9781337516549
Author:Simmons
Publisher:CENGAGE LEARNING - CONSIGNMENT

Power System Analysis and Design (MindTap Course ...
Electrical Engineering
ISBN:9781305632134
Author:J. Duncan Glover, Thomas Overbye, Mulukutla S. Sarma
Publisher:Cengage Learning
Related Questions
- Question 1 Digital Electronics and Combinational Logic la) Analog and Digital Electronics i. Write either "digital" or "analog" in this to indicate whether the property in that row is typical of digital electronics or analog electronics. The first row has been completed as an example. Property Difficult, manual circuit design Continuous valued signals Tolerant of electrical noise Circuit state tends to leak Intolerant of component variations Digital/Analog Analog ii. In older cars the timing of the electrical pulses to the spark plugs was controlled by a mechanical distributor. This contained a rotating contact that was mechanically linked to the rotation of the engine. Newer cars use electronic ignition. Electrical sensors detect the position and speed of the motor and a digital controller sends ignition pulses to the spark plugs. Briefly describe 2 likely benefits of the digital electronic ignition system over the mechanical one. An example is given first. More flexible control: the…arrow_forwardPlease, I do not want a theoretical solution or using artificial intelligence. I want a solution on paper using the mathematical laws of the topicarrow_forwardI need the solution to chapter 10, #19 this is from Matlab a practical introduction to programming and problem solvingarrow_forward
- Q1: A sequential circuit has one inputs X and one output (Z) is used to detect the sequence 1011. The output Z=1 when the circuit detect the sequence 1011, otherwise Z=0. Sketch the stated diagrams of the following specifications: 1. Mealy model and the overlap is allowed. 2. Mealy model and the overlap is not allowed. I need to solve the question step by step and in clear handwriting.arrow_forwardI can't tell from your solution how the signal was graphed. How do you get from the mathematical expressions to the graph? This follow up question was rejected and no explanation was given as to why. Do you need more information?arrow_forwardHW Matlab 1) Create a variable fiemp to store a temperature in degrees Fahrenheit (F). Write m-file to convert this to degrees Celsius and store the result in a variable ctemp. The conversion factor is C = (F —32) * 5/9. 2) Write m-file to generate a matrix of random integers of size 100 by 100 their values between 15 to 80. 3) Free fall of objects is given by y =5mgt? where a is the acceleration, v is the velocity, y is the distance, m is the mass of the object, g is the gravitational acceleration. Plot the distance and velocity of the object for 15 seconds after its fall from rest (y = 0). Take m = 0.2 kg.arrow_forward
- You want to design a grading machine. The input is 3-bit score (0-7). The output is a 7-seg controller that shows the grade. You want to display A. when the score is 7 B. when the score is 5,6 C. when the score is 3,4, D. when the score is 1,2 F. when the score is 0arrow_forward3. The stability of the plant depands on its: a. Gain matrix. b. Sate matrix. c. Input matrix. d. Control effort. e. All above. 4. In control systems, there are problems. a. two b. three c. four d. five types of optimal controlarrow_forwardplease solve number 1arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- EBK ELECTRICAL WIRING RESIDENTIALElectrical EngineeringISBN:9781337516549Author:SimmonsPublisher:CENGAGE LEARNING - CONSIGNMENTPower System Analysis and Design (MindTap Course ...Electrical EngineeringISBN:9781305632134Author:J. Duncan Glover, Thomas Overbye, Mulukutla S. SarmaPublisher:Cengage Learning

EBK ELECTRICAL WIRING RESIDENTIAL
Electrical Engineering
ISBN:9781337516549
Author:Simmons
Publisher:CENGAGE LEARNING - CONSIGNMENT

Power System Analysis and Design (MindTap Course ...
Electrical Engineering
ISBN:9781305632134
Author:J. Duncan Glover, Thomas Overbye, Mulukutla S. Sarma
Publisher:Cengage Learning