FALL2023_HW3_Student_Bharat
pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
4641
Subject
Computer Science
Date
Feb 20, 2024
Type
Pages
71
Uploaded by BailiffElk4095
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
1/71
Fall 2023 CS4641/CS7641 A Homework 3
Instructor: Dr. Mahdi Roozbahani
Deadline: Friday, November 10th, 11:59 pm EST
No unapproved extension of the deadline is allowed. Submission past our 48-hour
penalized acceptance period will lead to 0 credit.
Discussion is encouraged on Ed as part of the Q/A. However, all assignments should be
done individually.
Plagiarism is a serious offense
. You are responsible for completing your own work. You
are not allowed to copy and paste, or paraphrase, or submit materials created or
published by others, as if you created the materials. All materials submitted must be
your own.</font>
All incidents of suspected dishonesty, plagiarism, or violations of the Georgia Tech
Honor Code will be subject to the institute’s Academic Integrity procedures. If we
observe any (even small) similarities/plagiarisms detected by Gradescope or our TAs,
WE WILL DIRECTLY REPORT ALL CASES TO OSI
, which may, unfortunately, lead to a
very harsh outcome. Consequences can be severe, e.g., academic probation or
dismissal, grade penalties, a 0 grade for assignments concerned, and prohibition
from withdrawing from the class.
</font>
Instructions for the assignment
This assignment consists of both programming and theory questions.
Unless a theory question explicitly states that no work is required to be shown, you
must provide an explanation, justification, or calculation for your answer.
To switch between cell for code and for markdown, see the menu -> Cell -> Cell Type
You can directly type Latex equations into markdown cells.
If a question requires a picture, you could use this syntax <img src="" style="width: 300px;"/>
to include them within your ipython notebook.
Your write up must be submitted in PDF form. You may use either Latex, markdown, or
any word processing software. We will **NOT** accept handwritten work. Make sure
that your work is formatted correctly, for example submit instead of
\text{sum_{i=0} x_i}
When submitting the non-programming part of your assignment, you must correctly
map pages of your PDF to each question/subquestion to reflect where they appear.
**Improperly mapped questions may not be graded correctly and/or will result in point
deductions for the error.**
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
2/71
All assignments should be done individually, and each student must write up and
submit their own answers.
Graduate Students
: You are required to complete any sections marked as Bonus for
Undergrads
Using the autograder
Grads will find three assignments on Gradescope that correspond to HW3: "Assignment
3 Programming", "Assignment 3 - Non-programming" and "Assignment 3 Programming
- Bonus for all". Undergrads will find an additional assignment called "Assignment 3
Programming - Bonus for Undergrads".
You will submit your code for the autograder in the Assignment 3 Programming
sections. Please refer to the Deliverables and Point Distribution section for what parts
are considered required, bonus for undergrads, and bonus for all.
We provided you different .py files and we added libraries in those files please DO NOT
remove those lines and add your code after those lines. Note that these are the only
allowed libraries that you can use for the homework.
You are allowed to make as many submissions until the deadline as you like.
Additionally, note that the autograder tests each function separately, therefore it can
serve as a useful tool to help you debug your code if you are not sure of what part of
your implementation might have an issue.
For the "Assignment 3 - Non-programming" part, you will need to submit to
Gradescope a PDF copy of your Jupyter Notebook with the cells ran. See this
EdStem Post for multiple ways on to convert your .ipynb into a .pdf file.
Please
refer to the Deliverables and Point Distribution
section for an outline of the non-
programming questions.
When submitting to Gradescope, please make sure to mark the page(s)
corresponding to each problem/sub-problem. The pages in the PDF should be of
size 8.5" x 11", otherwise there may be a deduction in points for extra long sheets.
Using the local tests
For some of the programming questions we have included a local test using a small toy
dataset to aid in debugging. The local test sample data and outputs are stored in .py
files in the local_tests_folder
. The actual local tests are stored in localtests.py
.
There are no points associated with passing or failing the local tests, you must still pass
the autograder to get points.
It is possible to fail the local test and pass the autograder
since the autograder has a
certain allowed error tolerance while the local test allowed error may be smaller.
Likewise, passing the local tests does not guarantee passing the autograder.
You do not need to pass both local and autograder tests to get points, passing the
Gradescope autograder is sufficient for credit.
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
3/71
It might be helpful to comment out the tests for functions that have not been
completed yet.
It is recommended to test the functions as it gets completed instead of completing the
whole class and then testing. This may help in isolating errors. Do not solely rely on the
local tests, continue to test on the autograder regularly as well.
Deliverables and Points Distribution
Q1: Image Compression [30pts]
Deliverables: imgcompression.py and printed results
1.1 Image Compression
[20 pts] - programming
svd [4pts]
compress [4pts]
rebuild_svd [4pts]
compression_ratio [4pts]
recovered_variance_proportion [4pts]
1.2 Black and White
[5 pts] non-programming
1.3 Color Image
[5 pts] non-programming
Q2: Understanding PCA [20pts]
Deliverables: pca.py and written portion
2.1 PCA Implementation
[10 pts] - programming
fit [5pts]
transform [2pts]
transform_rv [3pts]
2.2 Visualize
[5 pts] programming and non-programming
2.3 PCA Reduced Facemask Dataset Analysis
[5 pts] non-programming
2.4 PCA Exploration
[0 pts]
Q3: Regression and Regularization [80pts: 50pts + 20pts
Bonus for Undergrads + 12pts Bonus for All]
Deliverables: regression.py and Written portion
3.1 Regression and Regularization Implementations
[50pts: 30pts + 20pts Bonus for
Undergrad] - programming
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
4/71
RMSE [5pts]
Construct Poly Features 1D [2pts]
Construct Poly Features 2D [3pts]
Prediction [5pts]
Linear Fit Closed Form [5pts]
Ridge Fit Closed Form [5pts]
Ridge
Cross Validation [5pts]
Linear Gradient Descent [5pts] Bonus for Undergrad
Linear Stochastic Gradient Descent [5pts] Bonus for Undergrad
Ridge Gradient Descent [5pts] Ridge
Bonus for Undergrad
Ridge Stochastic Gradient Descent [5pts] Ridge
Bonus for Undergrad
3.2 About RMSE
[3 pts] non-programming
3.3 Testing: General Functions and Linear Regression
[5 pts] non-programming
3.4 Testing: Ridge Regression
Ridge
[5 pts + 2 pts Bonus for All] non-programming
3.5 Cross Validation
[7 pts] non-programming
3.6 Noisy Input Samples in Linear Regression
[10 pts] non-programming
BONUS FOR
ALL
Q4: Naive Bayes and Logistic Regression [35pts]
Deliverables: logistic_regression.py and Written portion
4.1 Llama Breed Problem using Naive Bayes
[5 pts] non-programming
4.2 News Data Sentiment Classification Using Logistic Regression
[30 pts] -
programming
sigmoid [2 pts]
bias_augment [3 pts]
predict_probs [5 pts]
predict_labels [2 pts]
loss [3 pts]
gradient [3 pts]
accuracy [2 pts]
evaluate [5 pts]
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
5/71
fit [5 pts]
Q5: Noise in PCA and Linear Regression [15pts]
Deliverables: Written portion
5.1 Slope Functions
[5 pts] non-programming
5.2 Error in Y and Error in X and Y
[5 pts] non-programming
5.3 Analysis
[5 pts] non-programming
Q6: Feature Reduction.py [25pts Bonus for All]
Deliverables: feature_reduction.py and Written portion
6.1 Feature Reduction
[18 pts] - programming
forward_selection [9pts]
backward_elimination [9pts]
6.2 Feature Selection - Discussion
[7 pts] non-programming
Q7: Movie Recommendation with SVD [10pts Bonus for All]
Deliverables: svd_recommender.py and Written portion
7.1 SVD Recommender
recommender_svd [5pts]
predict [5pts]
7.2 Visualize Movie Vectors
[0pts]
0 Set up
This notebook is tested under python 3.
.
, and the corresponding packages can be
downloaded from miniconda
. You may also want to get yourself familiar with several
packages:
jupyter notebook
numpy
matplotlib
sklearn
There is also a VS Code and Anaconda Setup Tutorial on Ed under the "Links" category
Please implement the functions that have raise NotImplementedError
, and after you
finish the coding, please delete or comment out raise NotImplementedError
.
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
6/71
Library imports
Version information
python: 3.11.4 (main, Jul 5 2023, 08:54:11) [Clang 14.0.6 ]
matplotlib: 3.7.1
numpy: 1.24.3
Q1: Image Compression [30 pts] **[P]** | **[W]**
Load images data and plot
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# This is cell which sets up some of the modules you might need # Please do not change the cell or import any additional packages. import
numpy as
np
import
pandas as
pd
import
matplotlib
from
matplotlib import
pyplot as
plt
from
sklearn.feature_extraction import
text
from
sklearn.datasets import
load_diabetes
, load_breast_cancer
, load_iris
from
sklearn.linear_model import
LogisticRegression
from
sklearn.model_selection import
train_test_split
from
sklearn.metrics import
mean_squared_error
, accuracy_score
import
warnings
import
sys
print
(
'Version information'
)
print
(
'python: {}'
.
format
(
sys
.
version
))
print
(
'matplotlib: {}'
.
format
(
matplotlib
.
__version__
))
print
(
'numpy: {}'
.
format
(
np
.
__version__
))
warnings
.
filterwarnings
(
'ignore'
)
%
matplotlib
inline
%
load_ext
autoreload
%
autoreload
2
STUDENT_VERSION =
1
EO_TEXT
, EO_FONT
, EO_COLOR =
'TA VERSION'
, 'Chalkduster'
, 'gray'
, EO_ALPHA
, EO_SIZE
, EO_ROT =
0.7
, 90
, 40
# Render types : 'browser', 'png', 'plotly_mimetype', 'jupyterlab', pdf
rndr_type =
'png'
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# load Image
image =
plt
.
imread
(
"./data/hw3_image_compression.jpeg"
) /
255
# plot image
fig =
plt
.
figure
(
figsize
=
(
10
, 10
))
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
7/71
<matplotlib.image.AxesImage at 0x10772b450>
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
plt
.
imshow
(
image
)
Out[ ]:
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
def
rgb2gray
(
rgb
):
return
np
.
dot
(
rgb
[
...
, :
3
], [
0.299
, 0.587
, 0.114
])
fig =
plt
.
figure
(
figsize
=
(
10
, 10
))
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
8/71
<matplotlib.image.AxesImage at 0x14e299050>
1.1 Image compression [20pts] **[P]**
SVD is a dimensionality reduction technique that allows us to compress images by throwing
away the least important information.
Higher singular values capture greater variance and, thus, capture greater information from
the corresponding singular vector. To perform image compression, apply SVD on each
matrix and get rid of the small singular values to compress the image. The loss of
information through this process is negligible, and the difference between the images can
be hardly spotted.
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
# plot several images
plt
.
imshow
(
rgb2gray
(
image
), cmap
=
plt
.
cm
.
bone
)
Out[ ]:
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
9/71
For example, the proportion of variance captured by the first component is
where is the singular value.
In the imgcompression.py
file, complete the following functions:
svd
: You may use np.linalg.svd
in this function, and although the function defaults
this parameter to true, you may explicitly set full_matrices=True
using the optional
full_matrices
parameter. Hint 2 may be useful.
compress
rebuild_svd
compression_ratio
: Hint 1 may be useful
recovered_variance_proportion
: Hint 1 may be useful
HINT 1:
http://timbaumann.info/svd-image-compression-demo/ is a useful article on image
compression and compression ratio. You may find this article useful for implementing the
functions compression_ratio and recovered_variance_proportion
HINT 2:
If you have never used np.linalg.svd
, it might be helpful to read Numpy's SVD
documentation and note the particularities of the matrix and that it is returned already
transposed.
HINT 3:
The shape of resulting from SVD may change depending on if N > D, N < D, or N
= D. Therefore, when checking the shape of , note that min(N,D) means the value should
be equal to whichever is lower between N and D.
1.1.1 Local Tests for Image Compression Black and White Case [No Points]
You may test your implementation of the functions contained in imgcompression.py
in the
cell below. Feel free to comment out tests for functions that have not been completed yet.
See Using the Local Tests for more details.
UnitTest passed successfully for "SVD calculation - black and white images"!
UnitTest passed successfully for "Image compression - black and white images"!
UnitTest passed successfully for "SVD reconstruction - black and white images"!
UnitTest passed successfully for "Compression ratio - black and white images"!
UnitTest passed successfully for "Recovered variance proportion - black and white images"!
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestImgCompression
unittest_ic =
TestImgCompression
()
unittest_ic
.
test_svd_bw
()
unittest_ic
.
test_compress_bw
()
unittest_ic
.
test_rebuild_svd_bw
()
unittest_ic
.
test_compression_ratio_bw
()
unittest_ic
.
test_recovered_variance_proportion_bw
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
10/71
1.1.2 Local Tests for Image Compression Color Case [No Points]
You may test your implementation of the functions contained in imgcompression.py
in the
cell below. Feel free to comment out tests for functions that have not been completed yet.
See Using the Local Tests for more details.
UnitTest passed successfully for "SVD calculation - color images"!
UnitTest passed successfully for "Image compression - color images"!
UnitTest passed successfully for "SVD reconstruction - color images"!
UnitTest passed successfully for "Compression ratio - color images"!
UnitTest passed successfully for "Recovered variance proportion - color images"!
1.2.1 Black and white [5 pts] **[W]**
This question will use your implementation of the functions from Q1.1 to generate a set of
images compressed to different degrees. You can simply run the below cell without making
any changes to it, assuming you have implemented the functions in Q1.1.
Make sure these images are displayed when submitting the PDF version of the Juypter
notebook as part of the non-programming submission of this assignment.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestImgCompression
unittest_ic =
TestImgCompression
()
unittest_ic
.
test_svd_color
()
unittest_ic
.
test_compress_color
()
unittest_ic
.
test_rebuild_svd_color
()
unittest_ic
.
test_compression_ratio_color
()
unittest_ic
.
test_recovered_variance_proportion_color
()
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
imgcompression import
ImgCompression
imcompression =
ImgCompression
()
bw_image =
rgb2gray
(
image
)
U
, S
, V =
imcompression
.
svd
(
bw_image
)
component_num =
[
1
, 2
, 5
, 10
, 20
, 40
, 80
, 160
, 256
]
fig =
plt
.
figure
(
figsize
=
(
18
, 18
))
# plot several images
i =
0
for
k in
component_num
:
U_compressed
, S_compressed
, V_compressed =
imcompression
.
compress
(
U
, S
, V
, k
)
img_rebuild =
imcompression
.
rebuild_svd
(
U_compressed
, S_compressed
, V_compresse
c =
np
.
around
(
imcompression
.
compression_ratio
(
bw_image
, k
), 4
)
r =
np
.
around
(
imcompression
.
recovered_variance_proportion
(
S
, k
), 3
)
ax =
fig
.
add_subplot
(
3
, 3
, i +
1
, xticks
=
[], yticks
=
[])
ax
.
imshow
(
img_rebuild
, cmap
=
plt
.
cm
.
bone
)
ax
.
set_title
(
f"{
k
} Components"
)
if
not
STUDENT_VERSION
:
ax
.
text
(
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
11/71
1.2.2 Black and White Compression Savings [No Points]
This question will use your implementation of the functions from Q1.1 to compare the
number of bytes required to represent the SVD decomposition for the original image to the
compressed image using different degrees of compression. You can simply run the below
cell without making any changes to it, assuming you have implemented the functions in
Q1.1.
0.5
,
0.5
,
EO_TEXT
,
transform
=
ax
.
transAxes
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
ax
.
set_xlabel
(
f"Compression: {
c
},\nRecovered Variance: {
r
}"
)
i =
i +
1
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
12/71
Running this cell is primarily for your own understanding of the compression process.
1 components: Original Image: 10.986 MB -> Compressed Image: 18.758 KB, Savings: 1
0.968 MB, Compression Ratio 599.8:1
2 components: Original Image: 10.986 MB -> Compressed Image: 37.516 KB, Savings: 1
0.95 MB, Compression Ratio 299.9:1
5 components: Original Image: 10.986 MB -> Compressed Image: 93.789 KB, Savings: 1
0.895 MB, Compression Ratio 120.0:1
10 components: Original Image: 10.986 MB -> Compressed Image: 187.578 KB, Savings: 10.803 MB, Compression Ratio 60.0:1
20 components: Original Image: 10.986 MB -> Compressed Image: 375.156 KB, Savings: 10.62 MB, Compression Ratio 30.0:1
40 components: Original Image: 10.986 MB -> Compressed Image: 750.312 KB, Savings: 10.254 MB, Compression Ratio 15.0:1
80 components: Original Image: 10.986 MB -> Compressed Image: 1.465 MB, Savings: 9.521 MB, Compression Ratio 7.5:1
160 components: Original Image: 10.986 MB -> Compressed Image: 2.931 MB, Savings: 8.055 MB, Compression Ratio 3.7:1
256 components: Original Image: 10.986 MB -> Compressed Image: 4.689 MB, Savings: 6.297 MB, Compression Ratio 2.3:1
1.3.1 Color image [5 pts] **[W]**
This section will use your implementation of the functions from Q1.1 to generate a set of
images compressed to different degrees. You can simply run the below cell without making
any changes to it, assuming you have implemented the functions in Q1.1.
Make sure these images are displayed when submitting the PDF version of the Juypter
notebook as part of the non-programming submission of this assignment.
NOTE:
You might get warning "Clipping input data to the valid range for imshow with RGB
data ([0..1] for floats or [0..255] for integers)." This warning is acceptable since some of the
pixels may go above 1.0 while rebuilding. You should see similar images to original even
with such clipping.
HINT 1:
Make sure your implementation of recovered_variance_proportion
returns an
array of 3 floats for a color image.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
imgcompression import
ImgCompression
imcompression =
ImgCompression
()
bw_image =
rgb2gray
(
image
)
U
, S
, V =
imcompression
.
svd
(
bw_image
)
component_num =
[
1
, 2
, 5
, 10
, 20
, 40
, 80
, 160
, 256
]
# Compare memory savings for BW image
for
k in
component_num
:
og_bytes
, comp_bytes
, savings =
imcompression
.
memory_savings
(
bw_image
, U
, S
, V
,
comp_ratio =
og_bytes /
comp_bytes
og_bytes =
imcompression
.
nbytes_to_string
(
og_bytes
)
comp_bytes =
imcompression
.
nbytes_to_string
(
comp_bytes
)
savings =
imcompression
.
nbytes_to_string
(
savings
)
print
(
f"{
k
} components: Original Image: {
og_bytes
} -> Compressed Image: {
comp_byt
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
13/71
HINT 2:
Try performing SVD on the individual color channels and then stack the individual
channel , , matrices.
HINT 3:
You may need separate implementations for a color or grayscale image in the same
function.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
imgcompression import
ImgCompression
imcompression =
ImgCompression
()
image_rolled =
np
.
moveaxis
(
image
, -
1
, 0
)
U
, S
, V =
imcompression
.
svd
(
image_rolled
)
component_num =
[
1
, 2
, 5
, 10
, 20
, 40
, 80
, 160
, 256
]
fig =
plt
.
figure
(
figsize
=
(
18
, 18
))
# plot several images
i =
0
for
k in
component_num
:
U_compressed
, S_compressed
, V_compressed =
imcompression
.
compress
(
U
, S
, V
, k
)
img_rebuild =
np
.
clip
(
imcompression
.
rebuild_svd
(
U_compressed
, S_compressed
, V_compressed
), 0
, 1
)
img_rebuild =
np
.
moveaxis
(
img_rebuild
, 0
, -
1
)
c =
np
.
around
(
imcompression
.
compression_ratio
(
image_rolled
, k
), 4
)
r =
np
.
around
(
imcompression
.
recovered_variance_proportion
(
S
, k
), 3
)
ax =
fig
.
add_subplot
(
3
, 3
, i +
1
, xticks
=
[], yticks
=
[])
ax
.
imshow
(
img_rebuild
)
ax
.
set_title
(
f"{
k
} Components"
)
if
not
STUDENT_VERSION
:
ax
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
ax
.
transAxes
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
ax
.
set_xlabel
(
f"Compression: {
np
.
around
(
c
,
4
)
},\nRecovered Variance: R: {
r
[
0
]
} G: {
r
[
1
]
}
)
i =
i +
1
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
14/71
1.3.2 Color Compression Savings [No Points]
This question will use your implementation of the functions from Q1.1 to compare the
number of bytes required to represent the SVD decomposition for the original image to the
compressed image using different degrees of compression. You can simply run the below
cell without making any changes to it, assuming you have implemented the functions in
Q1.1.
Running this cell is primarily for your own understanding of the compression process.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
imgcompression import
ImgCompression
imcompression =
ImgCompression
()
U
, S
, V =
imcompression
.
svd
(
image_rolled
)
component_num =
[
1
, 2
, 5
, 10
, 20
, 40
, 80
, 160
, 256
]
# Compare the memory savings of the color image
i =
0
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
15/71
1 components: Original Image: 32.959 MB -> Compressed Image: 56.273 KB, Savings: 3
2.904 MB, Compression Ratio 599.8:1
2 components: Original Image: 32.959 MB -> Compressed Image: 112.547 KB, Savings: 32.849 MB, Compression Ratio 299.9:1
5 components: Original Image: 32.959 MB -> Compressed Image: 281.367 KB, Savings: 32.684 MB, Compression Ratio 120.0:1
10 components: Original Image: 32.959 MB -> Compressed Image: 562.734 KB, Savings: 32.409 MB, Compression Ratio 60.0:1
20 components: Original Image: 32.959 MB -> Compressed Image: 1.099 MB, Savings: 3
1.86 MB, Compression Ratio 30.0:1
40 components: Original Image: 32.959 MB -> Compressed Image: 2.198 MB, Savings: 3
0.761 MB, Compression Ratio 15.0:1
80 components: Original Image: 32.959 MB -> Compressed Image: 4.396 MB, Savings: 2
8.563 MB, Compression Ratio 7.5:1
160 components: Original Image: 32.959 MB -> Compressed Image: 8.793 MB, Savings: 24.166 MB, Compression Ratio 3.7:1
256 components: Original Image: 32.959 MB -> Compressed Image: 14.068 MB, Savings: 18.891 MB, Compression Ratio 2.3:1
Q2: Understanding PCA [20 pts] **[P]** | **[W]**
Principal Component Analysis (PCA) is another dimensionality reduction technique that
reduces dimensions or features while still preserving the maximum (or close-to) amount of
information. This is useful when analyzing large datasets that contain a high number of
dimensions or features that may be correlated. PCA aims to eliminate features that are
highly correlated and only retain the important/uncorrelated ones that can describe most or
all the variance in the data. This enables better interpretability and visualization of the multi-
dimensional data. In this problem, we will investigate how PCA can be used to improve
features for regression and classification tasks and how the data itself affects the behavior of
PCA.
Here, we will employ Singular Value Decomposition (SVD) for PCA. In PCA, we first center the
data by subtracting the mean of each feature. SVD is well suited for this task since each
singular value tells us the amount of variance captured in each component for a given matrix
(e.g. image). Hence, we can use SVD to extract data only in directions with high variances
using either a threshold of the amount of variance or the number of bases/components.
Here, we will reduce the data to a set number of components.
Recall from class that in PCA, we project the original matrix into new components, each
one corresponding to an eigenvector of the covariance matrix . We know that SVD
decomposes X into three matrices U, S, and V^T. We can find the SVD decomposition of
X^T*X using the decomposition for X as follows:
for
k in
component_num
:
og_bytes
, comp_bytes
, savings =
imcompression
.
memory_savings
(
image_rolled
, U
, S
, V
, k
)
comp_ratio =
og_bytes /
comp_bytes
og_bytes =
imcompression
.
nbytes_to_string
(
og_bytes
)
comp_bytes =
imcompression
.
nbytes_to_string
(
comp_bytes
)
savings =
imcompression
.
nbytes_to_string
(
savings
)
print
(
f"{
k
} components: Original Image: {
og_bytes
} -> Compressed Image: {
comp_byt
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
16/71
This means two important things for us:
The matrix , often referred to as the right singular vectors
of , is equivalent to the
eigenvectors
of .
is equivalent to the eigenvalues
of .
So the first -principal components are obtained by projecting by the first vectors from
. Similarly, gives a measure of the variance retained.
2.1 Implementation [10 pts] **[P]**
Implement PCA. In the pca.py
file, complete the following functions:
fit
: You may use np.linalg.svd
. Set full_matrices=False
. Hint 1 may be useful.
transform
transform_rv
: You may find np.cumsum
helpful for this function.
Assume a dataset is composed of N datapoints, each of which has D features with D < N.
The dimension of our data would be D. However, it is possible that many of these
dimensions contain redundant information. Each feature explains part of the variance in our
dataset, and some features may explain more variance than others.
HINT 1:
Make sure you remember to first center your data by subtracting the mean of each
feature.
2.1.1 Local Tests for PCA [No Points]
You may test your implementation of the functions contained in pca.py
in the cell below.
Feel free to comment out tests for functions that have not been completed yet. See Using
the Local Tests for more details.
UnitTest passed successfully for "PCA fit"!
UnitTest passed successfully for "PCA transform"!
UnitTest passed successfully for "PCA transform with recovered variance"!
2.2 Visualize [5 pts] **[W]**
PCA is used to transform multivariate data tables into smaller sets so as to observe the
hidden trends and variations in the data. It can also be used as a feature extractor for
images. Here you will visualize two datasets using PCA, first is the iris dataset and then a
dataset of masked and unmasked images.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestPCA
unittest_pca =
TestPCA
()
unittest_pca
.
test_pca
()
unittest_pca
.
test_transform
()
unittest_pca
.
test_transform_rv
()
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
17/71
In the pca.py
, complete the following function:
visualize
: Use your implementation of PCA and reduce the datasets such that they
contain only two features. Using Plotly's Express make a 2d and 3d scatterplot of the
data points using these features. Make sure to differentiate the data points according to
their true labels using color. We recommend converting the data to a pandas dataframe
before plotting.
The datasets have already been loaded for you in the subsequent cells.
NOTE:
Here, we won't be testing for accuracy. Even with correct implementations of PCA,
the accuracy can differ from the TA solution. That is fine as long as the visualizations come
out similar.
Iris Dataset
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Use PCA for visualization of iris dataset
from
pca import
PCA
iris_data =
load_iris
(
return_X_y
=
True
)
X =
iris_data
[
0
]
y =
iris_data
[
1
]
fig_title =
"Iris Dataset with Dimensionality Reduction"
PCA
()
.
visualize
(
X
, y
, fig_title
)
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
18/71
0.5
1
Iris Dataset with Dimensionality Reduction (2D)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
19/71
2.3 PCA Reduced Facemask Dataset Analysis [5 pts] **[W]**
Facemask Dataset
The masked and unmasked dataset is made up of grayscale images of human faces facing
forward. Half of these images are faces that are completely unmasked, and the remaining
images show half of the face covered with an artificially generated face mask. The images
have already been preprocessed, they are also reduced to a small size of 64x64 pixels and
then reshaped into a feature vector of 4096 pixels. Below is a sample of some of the images
in the dataset.
Iris Dataset with Dimensionality Reduction (3D)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
X =
np
.
load
(
"./data/smallflat_64.npy"
)
y =
np
.
load
(
"./data/masked_labels.npy"
)
.
astype
(
"int"
)
i =
0
fig =
plt
.
figure
(
figsize
=
(
18
, 18
))
for
idx in
[
0
, 1
, 2
, 150
, 151
, 152
]:
ax =
fig
.
add_subplot
(
6
, 6
, i +
1
, xticks
=
[], yticks
=
[])
image =
(
np
.
rot90
(
X
[
idx
]
.
reshape
(
64
, 64
), k
=
1
)
if
idx %
2 ==
1 and
idx <
150
else
X
[
idx
]
.
reshape
(
64
, 64
)
)
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
20/71
m_status =
"Unmasked" if
idx <
150 else
"Masked"
ax
.
imshow
(
image
, cmap
=
"gray"
)
ax
.
set_title
(
f"{
m_status
} Image at i = {
idx
}"
)
i +=
1
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Use PCA for visualization of masked and unmasked images
X =
np
.
load
(
"./data/smallflat_64.npy"
)
y =
np
.
load
(
"./data/masked_labels.npy"
)
fig_title =
"Facemask Dataset Visualization with Dimensionality Reduction"
PCA
()
.
visualize
(
X
, y
, fig_title
)
print
(
"*In this plot, the 0 points are unmasked images and the 1 points are masked im
)
4
6
8
Facemask Dataset Visualization with Dimensionality Reduct
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
21/71
*In this plot, the 0 points are unmasked images and the 1 points are masked image
s.
What do you think of this 2 dimensional plot, knowing that the original dataset was
originally a set of flattened image vectors that had 4096 pixels/features?.
1. Look at the 2-dimensional plot above. If the facemask
dataset that has been reduced to
2 features was fed into a classifier, do you think the classifier would produce high
accuracy or low accuracy in comparison to the original dataset which had 4096
pixels/features? Why? You can refer to the 2D visualization made above (One or two
sentences will suffice for this question) (3 pts)
Answer
- The 2D plot shows some distinct clustering, which suggests that the two
features derived from dimensionality reduction may capture significant aspects for
classification. However, because dimensionality reduction from 4096 to 2 features loses
a considerable amount of detail, a classifier using only these 2 features will have lower
accuracy compared to one using all original features.
1. Assuming an equal rate of accuracy, what do you think is the main advantage in feeding
a classifier a dataset with 2 features vs a dataset with 4096 features? (One sentence will
suffice for this question.) (2 pts)
Answer
- The main advantage of feeding a classifier a dataset with 2 features versus
4096 features is the significant reduction in computational complexity and resource
Facemask Dataset Visualization with Dimensionality Reduct
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
22/71
usage, leading to faster training and prediction times.
2.4 PCA Exploration [No Points]
Note
The accuracy can differ from the TA solution and this section is not graded.
Emotion Dataset [No Points]
Now you will use PCA on an actual real-world dataset. We will use your implementation of
PCA function to reduce the dataset with 99% retained variance and use it to obtain the
reduced features. On the reduced dataset, we will use logistic and linear regression to
compare results between PCA and non-PCA datasets. Run the following cells to see how
PCA works on regression and classification tasks.
The first dataset we will use is an emotion dataset made up of grayscale images of human
faces faces that are visibly happy and visibly sad. Note how Accuracy increases after
reducing the number of features used.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
X =
np
.
load
(
"./data/emotion_features.npy"
)
y =
np
.
load
(
"./data/emotion_labels.npy"
)
.
astype
(
"int"
)
i =
0
fig =
plt
.
figure
(
figsize
=
(
18
, 18
))
for
idx in
[
0
, 1
, 2
, 150
, 151
, 152
]:
ax =
fig
.
add_subplot
(
6
, 6
, i +
1
, xticks
=
[], yticks
=
[])
image =
(
np
.
rot90
(
X
[
idx
]
.
reshape
(
64
, 64
), k
=
1
)
if
idx %
2 ==
1 and
idx <
150
else
X
[
idx
]
.
reshape
(
64
, 64
)
)
m_status =
"Unmasked" if
idx <
150 else
"Masked"
ax
.
imshow
(
image
, cmap
=
"gray"
)
m_status =
"Sad" if
idx <
150 else
"Happy"
ax
.
set_title
(
f"{
m_status
} Image at i = {
idx
}"
)
i +=
1
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
X =
np
.
load
(
"./data/emotion_features.npy"
)
y =
np
.
load
(
"./data/emotion_labels.npy"
)
.
astype
(
"int"
)
print
(
"Not Graded - Data shape before PCA "
, X
.
shape
)
pca =
PCA
()
pca
.
fit
(
X
)
X_pca =
pca
.
transform_rv
(
X
, retained_variance
=
0.99
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
23/71
Not Graded - Data shape before PCA (600, 4096)
Not Graded - Data shape with PCA (600, 150)
Not Graded - Accuracy before PCA: 0.95000
Not Graded - Accuracy after PCA: 0.95556
Now we will explore sklearn's Diabetes dataset using PCA dimensionality reduction and
regression. Notice the RMSE score reduction after we apply PCA.
print
(
"Not Graded - Data shape with PCA "
, X_pca
.
shape
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Train, test splits
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X
, y
, test_size
=
0.3
, stratify
=
y
, random_state
=
42
)
# Use logistic regression to predict classes for test set
clf =
LogisticRegression
()
clf
.
fit
(
X_train
, y_train
)
preds =
clf
.
predict_proba
(
X_test
)
print
(
"Not Graded - Accuracy before PCA: {:.5f}"
.
format
(
accuracy_score
(
y_test
, preds
.
argmax
(
axis
=
1
))
)
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Train, test splits
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X_pca
, y
, test_size
=
0.3
, stratify
=
y
, random_state
=
42
)
# Use logistic regression to predict classes for test set
clf =
LogisticRegression
()
clf
.
fit
(
X_train
, y_train
)
preds =
clf
.
predict_proba
(
X_test
)
print
(
"Not Graded - Accuracy after PCA: {:.5f}"
.
format
(
accuracy_score
(
y_test
, preds
.
argmax
(
axis
=
1
))
)
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
sklearn.linear_model import
RidgeCV
Ridge
def
apply_regression
(
X_train
, y_train
, X_test
):
ridge
ridge =
RidgeCV
Ridge
(
alphas
=
[
1e-3
, 1e-2
, 1e-1
, 1
])
clf =
ridge
ridge
.
fit
(
X_train
, y_train
)
y_pred =
ridge
ridge
.
predict
(
X_test
)
return
y_pred
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
24/71
(442, 10) (442,)
Not Graded - data shape with PCA (442, 7)
Not Graded - RMSE score using Ridge Regression before PCA: 53.101
Ridge
Not Graded - RMSE score using Ridge Regression after PCA: 53.024
Ridge
Q3 Polynomial regression and regularization [80pts:
50pts + 20pts Bonus for Undergrads + 10pts Bonus
for All] **[P]** | **[W]**
###############################
# load the dataset
diabetes =
load_diabetes
()
X =
diabetes
.
data
y =
diabetes
.
target
print
(
X
.
shape
, y
.
shape
)
pca =
PCA
()
pca
.
fit
(
X
)
X_pca =
pca
.
transform_rv
(
X
, retained_variance
=
0.9
)
print
(
"Not Graded - data shape with PCA "
, X_pca
.
shape
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Train, test splits
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X
, y
, test_size
=
0.3
, random_state
=
42
)
# Ridge regression without PCA
Ridge
y_pred =
apply_regression
(
X_train
, y_train
, X_test
)
# calculate RMSE
rmse_score =
np
.
sqrt
(
mean_squared_error
(
y_pred
, y_test
))
print
(
"Not Graded - RMSE score using Ridge Regression before PCA: Ridge
{:.5}"
.
format
(
rmse_score
)
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Ridge regression with PCA
Ridge
X_train
, X_test
, y_train
, y_test =
train_test_split
(
X_pca
, y
, test_size
=
0.3
, random_state
=
42
)
# use Ridge Regression for getting predicted labels
Ridge
y_pred =
apply_regression
(
X_train
, y_train
, X_test
)
# calculate RMSE
rmse_score =
np
.
sqrt
(
mean_squared_error
(
y_pred
, y_test
))
print
(
"Not Graded - RMSE score using Ridge Regression after PCA: Ridge
{:.5}"
.
format
(
rmse_s
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
25/71
3.1 Regression and regularization implementations [50pts: 30
pts + 20 pts bonus for CS 4641] **[P]**
We have three methods to fit linear and ridge regression models: 1) closed form solution; 2)
ridge
gradient descent (GD); 3) stochastic gradient descent (SGD). Some of the functions are
bonus, see the below function list on what is required to be implemented for graduate and
undergraduate students. We use the term weight in the following code. Weights and
parameters (
) have the same meaning here. We used parameters (
) in the lecture slides.
In the regression.py
file, complete the Regression class by implementing the listed functions
below. We have provided the Loss function, , associated with the GD and SGD function for
Linear and Ridge Regression for deriving the gradient update.
Ridge
rmse
construct_polynomial_feats
predict
linear_fit_closed
: You should use np.linalg.pinv
in this function
linear_fit_GD
(bonus for undergrad, required for grad
):
linear_fit_SGD
(bonus for undergrad, required for grad
):
ridge_fit_closed
ridge
: You should adjust your I
matrix to handle the bias term differently
than the rest of the terms
ridge_fit_GD
ridge
(bonus for undergrad, required for grad
):
ridge_fit_SGD
ridge
(bonus for undergrad, required for grad
):
ridge_cross_validation
ridge
: Use ridge_fit_closed
ridge
for this function
IMPORTANT NOTE:
Use your RMSE function to calculate actual loss when coding GD and SGD, but use the
loss listed above to derive the gradient update.
In ridge_fit_GD
ridge
and ridge_fit_SGD
ridge
, you should avoid applying regularization to the
bias term in the gradient update.
The points for each function is in the Deliverables and Points Distribution
section.
3.1.1 Local Tests for Helper Regression Functions [No Points]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
26/71
You may test your implementation of the functions contained in regression.py
in the cell
below. Feel free to comment out tests for functions that have not been completed yet. See
Using the Local Tests for more details.
UnitTest passed successfully for "RMSE"!
UnitTest passed successfully for "Polynomial feature construction"!
UnitTest passed successfully for "Linear regression prediction"!
3.1.2 Local Tests for Linear Regression Functions [No Points]
You may test your implementation of the functions contained in regression.py
in the cell
below. Feel free to comment out tests for functions that have not been completed yet. See
Using the Local Tests for more details.
UnitTest passed successfully for "Closed form linear regression"!
3.1.3 Local Tests for Ridge Regression Functions [No Points]
Ridge
You may test your implementation of the functions contained in regression.py
in the cell
below. Feel free to comment out tests for functions that have not been completed yet. See
Using the Local Tests for more details.
UnitTest passed successfully for "Closed form ridge regression"!
ridge
UnitTest passed successfully for "Ridge regression cross validation"!
Ridge
3.1.4 Local Tests for Gradient Descent and SGD (Bonus for Undergrad Tests)
[No Points]
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestRegression
unittest_reg =
TestRegression
()
unittest_reg
.
test_rmse
()
unittest_reg
.
test_construct_polynomial_feats
()
unittest_reg
.
test_predict
()
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestRegression
unittest_reg =
TestRegression
()
unittest_reg
.
test_linear_fit_closed
()
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestRegression
unittest_reg =
TestRegression
()
unittest_reg
.
test_ridge_fit_closed
ridge
()
unittest_reg
.
test_ridge_cross_validation
ridge
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
27/71
You may test your implementation of the functions contained in regression.py
in the cell
below. Feel free to comment out tests for functions that have not been completed yet. See
Using the Local Tests for more details.
UnitTest passed successfully for "Gradient descent linear regression"!
UnitTest passed successfully for "Stochastic gradient descent linear regression"!
UnitTest passed successfully for "Gradient descent ridge regression"!
ridge
3.2 About RMSE [3 pts] **[W]**
What is a good RMSE value?
If we normalize our labels such that the true labels and the model outputs can only be
between 0 and 1, what does it mean when the RMSE = 1? Please provide an example with
your explanation.
Answer
- In general, a "good" RMSE value is one that is low relative to the range of the
dependent variable in your dataset. It's ideally closer to 0, which would indicate that the
predicted values are very close to the actual values.
When both the true labels and the model outputs are normalized to be between 0 and
1, an RMSE value of 1 indicates the worst possible prediction accuracy. This means that on
average, the predicted values are at the maximum possible error distance from the actual
values.
For example, if the true value of an observation is 0 (the minimum in the normalized scale),
and the model predicts 1 (the maximum in the normalized scale), the error for this
prediction is 1 - 0 = 1. If every prediction made by the model is off by the maximum amount
(1 in this case), then the RMSE would be 1. This scenario would occur if the model always
predicts the exact opposite of the true value, which indicates that the model has essentially
learned nothing and is performing as poorly as possible.
3.3 Testing: General Functions and Linear Regression [5 pts] **
[W]**
In this section. we will test the performance of the linear regression. As long as your test
RMSE score is close to the TA's answer (TA's answer ), you can get full points. Let's first
construct a dataset for polynomial regression.
In this case, we construct the polynomial features up to degree 5. Each data sample consists
of two features . We compute the polynomial features of both and in order to yield
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestRegression
unittest_reg =
TestRegression
()
unittest_reg
.
test_linear_fit_GD
()
unittest_reg
.
test_linear_fit_SGD
()
unittest_reg
.
test_ridge_fit_GD
ridge
()
#unittest_reg.test_ridge_fit_SGD()
ridge
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
28/71
the vectors and . We train our model with
the cartesian product of these polynomial features. The cartesian product generates a new
feature vector consisting of all polynomial combinations of the features with degree less
than or equal to the specified degree.
For example, if degree = 2, we will have the polynomial features and for
the datapoint . The cartesian product of these two vectors will be . We
do not generate and since their degree is greater than 2 (specified degree).
x_all: 700 (rows/samples) 2 (columns/features)
y_all: 700 (rows/samples) 1 (columns/features)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
regression import
Regression
from
plotter import
Plotter
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Generate a sample regression dataset with polynomial features
# using the student's regression implementation.
POLY_DEGREE =
8
reg =
Regression
()
plotter =
Plotter
(
regularization
=
reg
,
poly_degree
=
POLY_DEGREE
,
student_version
=
STUDENT_VERSION
,
eo_params
=
(
EO_TEXT
, EO_FONT
, EO_COLOR
, EO_ALPHA
, EO_SIZE
, EO_ROT
),
)
x_all
, y_all
, p
, x_all_feat
, x_cart_flat =
plotter
.
create_data
()
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Visualize simulated regression data
plotter
.
plot_all_data
(
x_all
, y_all
, p
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
29/71
In the figure above, the red curve is the true fuction we want to learn, while the blue dots are
the noisy data points. The data points are generated by , where are i.i.d. generated noise.
Now let's split the data into two parts, the training set and testing set. The yellow dots are
for training, while the black dots are for testing.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
xtrain
, ytrain
, xtest
, ytest
, train_indices
, test_indices =
plotter
.
split_data
(
x_all
, y_all
)
plotter
.
plot_split_data
(
xtrain
, xtest
, ytrain
, ytest
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
30/71
Now let us train our model using the training set and see how our model performs on the
testing set. Observe the red line, which is our model's learned function.
Linear (closed) RMSE: 1.0072
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Required for both Grad and Undergrad
weight =
reg
.
linear_fit_closed
(
x_all_feat
[
train_indices
], y_all
[
train_indices
])
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all
[
test_indices
])
y_pred =
reg
.
predict
(
x_all_feat
, weight
)
print
(
"Linear (closed) RMSE: %.4f" %
test_rmse
)
plotter
.
plot_linear_closed
(
xtrain
, xtest
, ytrain
, ytest
, x_all
, y_pred
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
31/71
HINT:
If your RMSE is off, make sure to follow the instruction given for
linear_fit_closed
in the list of functions to implement above.
Now let's use our linear gradient descent function with the same setup. Observe that the
trendline is now less optimal, and our RMSE increased. Do not be alarmed.
Linear (GD) RMSE: 5.8381
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Required for Grad Only
# This cell may take more than 1 minute
weight
, _ =
reg
.
linear_fit_GD
(
x_all_feat
[
train_indices
], y_all
[
train_indices
], epochs
=
50000
, learning_rate
=
1e
)
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all
[
test_indices
])
print
(
"Linear (GD) RMSE: %.4f" %
test_rmse
)
y_pred =
reg
.
predict
(
x_all_feat
, weight
)
y_pred =
np
.
reshape
(
y_pred
, (
y_pred
.
size
,))
plotter
.
plot_linear_gd
(
xtrain
, xtest
, ytrain
, ytest
, x_all
, y_pred
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
32/71
We must tune our epochs and learning_rate. As we tune these parameters our trendline will
approach the trendline generated by the linear closed form solution. Observe how we slowly
tune (increase) the epochs and learning_rate below to create a better model.
Note that the closed form solution will always give the most optimal/overfit results. We
cannot outperform the closed form solution with GD. We can only approach closed forms
level of optimality/overfitness. We leave the reasoning behind this as an exercise to the
reader.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Required for Grad Only
# This cell may take more than 1 minute
learning_rates =
[
1e-8
, 1e-6
, 1e-4
]
weights =
np
.
zeros
((
3
, POLY_DEGREE
**
2 +
2
))
for
ii in
range
(
len
(
learning_rates
)):
weights
[
ii
, :] =
reg
.
linear_fit_GD
(
x_all_feat
[
train_indices
],
y_all
[
train_indices
],
epochs
=
50000
,
learning_rate
=
learning_rates
[
ii
],
)[
0
]
.
ravel
()
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weights
[
ii
, :]
.
reshape
((
POLY_DEGREE
**
2 +
2
, 1
))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
33/71
Linear (GD) RMSE: 5.8381 (learning_rate=1e-08)
Linear (GD) RMSE: 2.7259 (learning_rate=1e-06)
Linear (GD) RMSE: 1.2373 (learning_rate=0.0001)
And what if we just use the first 10 data points to train?
Linear Closed 10 Samples
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all
[
test_indices
])
print
(
"Linear (GD) RMSE: %.4f (learning_rate=%s)" %
(
test_rmse
, learning_rates
[
plotter
.
plot_linear_gd_tuninglr
(
xtrain
, xtest
, ytrain
, ytest
, x_all
, x_all_feat
, learning_rates
, weights
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
rng =
np
.
random
.
RandomState
(
seed
=
3
)
y_all_noisy =
np
.
dot
(
x_cart_flat
, np
.
zeros
((
POLY_DEGREE
**
2 +
2
, 1
))) +
rng
.
randn
(
x_all_feat
.
shape
[
0
], 1
)
sub_train =
train_indices
[
10
:
20
]
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Required for both Grad and Undergrad
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
34/71
Linear (closed) 10 Samples RMSE: 2207393.1407
Did you see a worse performance? Let's take a closer look at what we have learned.
3.4 Testing: Testing ridge regression [5 pts] ridge
**[W]**
3.4.1 [3pts] **[W]**
Now let's try ridge regression. Like before, undergraduate students need to implement the
ridge
closed form, and graduate students need to implement all three methods. We will call the
prediction function from linear regression part. As long as your test RMSE score is close to
the TA's answer (TA's answer ), you can get full points.
Again, let's see what we have learned. You only need to run the cell corresponding to
your specific implementation.
weight =
reg
.
linear_fit_closed
(
x_all_feat
[
sub_train
], y_all_noisy
[
sub_train
])
y_pred =
reg
.
predict
(
x_all_feat
, weight
)
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all_noisy
[
test_indices
])
print
(
"Linear (closed) 10 Samples RMSE: %.4f" %
test_rmse
)
plotter
.
plot_linear_closed_10samples
(
x_all
, y_all_noisy
, sub_train
, y_pred
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
35/71
Ridge Regression (closed) RMSE: 1.8283
Ridge
HINT:
Make sure to follow the instruction given for ridge_fit_closed
ridge
in the list of
functions to implement above.
# Required for both Grad and Undergrad
weight =
reg
.
ridge_fit_closed
ridge
(
x_all_feat
[
sub_train
], y_all_noisy
[
sub_train
], c_lambda
=
10
)
y_pred =
reg
.
predict
(
x_all_feat
, weight
)
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all_noisy
[
test_indices
])
print
(
"Ridge Regression (closed) RMSE: Ridge
%.4f" %
test_rmse
)
plotter
.
plot_ridge_closed_10samples
ridge
(
x_all
, y_all_noisy
, sub_train
, y_pred
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Required for Grad Only
weight
, _ =
reg
.
ridge_fit_GD
ridge
(
x_all_feat
[
sub_train
], y_all_noisy
[
sub_train
], c_lambda
=
20
, learning_rate
=
1e-5
)
y_pred =
reg
.
predict
(
x_all_feat
, weight
)
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all_noisy
[
test_indices
])
print
(
"Ridge Regression (GD) RMSE: Ridge
%.4f" %
test_rmse
)
plotter
.
plot_ridge_gd_10samples
ridge
(
x_all
, y_all_noisy
, sub_train
, y_pred
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
36/71
Ridge Regression (GD) RMSE: 1.0416
Ridge
Ridge Regression (SGD) RMSE: 1.0433
Ridge
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Required for Grad Only
weight
, _ =
reg
.
ridge_fit_SGD
ridge
(
x_all_feat
[
sub_train
], y_all_noisy
[
sub_train
], c_lambda
=
20
, learning_rate
=
1e-5
)
y_pred =
reg
.
predict
(
x_all_feat
, weight
)
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all_noisy
[
test_indices
])
print
(
"Ridge Regression (SGD) RMSE: Ridge
%.4f" %
test_rmse
)
plotter
.
plot_ridge_sgd_10samples
ridge
(
x_all
, y_all
, sub_train
, y_pred
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
37/71
Linear vs. Ridge Regression
Ridge
Regression technique comparison
Analyze the difference in performance between the linear and ridge regression methods
ridge
given the output RMSE from the testing on 10 samples
and their corresponding
approximation plots.
3.4.2 Why does ridge regression achieve a lower RMSE than linear regression on 10
ridge
sample points? [1pts] **[W]**
3.4.3 Describe and contrast two scenarios (real life applications): One where linear is more
suitable than ridge, and one in which ridge is better choice than linear. Explain why. [1 pts]
ridge
ridge
**[W]**
3.4.4 [Bonus pts] What is the impact of having some highly correlated features on the
data set in terms of linear algebra? Mathematically explain (include expressions) how
ridge has an advantage on this in comparison to linear regression. Include the idea of
ridge
numerical stability. [2pts Bonus For All] **[W]**
3.4.2. Answer
- Ridge regression achieves a lower RMSE than linear regression on a small
Ridge
dataset of 10 sample points because it includes a regularization term that prevents
overfitting. This regularization makes the model more robust to noise and fluctuations in the
data, leading to better generalization and lower error on unseen data.
3.4.3. Answer
-
Linear Regression Suitable Scenario
: Linear regression is well-suited for predicting a
company's sales based on straightforward, linearly related variables such as advertising
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
38/71
spend and market size. In this scenario, the relationship is direct and clear, with minimal risk
of overfitting due to the simplicity of the factors involved.
Ridge Regression Suitable Scenario
Ridge
: Ridge regression is more appropriate for financial risk
Ridge
modeling where many correlated variables (like various market indicators and economic
factors) might influence the outcome. Here, the regularization in ridge regression helps
ridge
manage the multicollinearity and complexity of the data, providing more reliable
predictions.
3.4.4. Answer
Linear Regression
In linear regression, the coefficient calculation uses the formula:
where:
: Feature matrix.
: Target vector.
: Coefficients.
Issues arise when features in are highly correlated, leading to becoming nearly
singular (determinant close to zero). This causes numerical instability in computing
and results in unreliable coefficient estimates.
Ridge Regression
Ridge
Ridge regression modifies this formula to:
Ridge
where:
: Regularization term (
is the regularization parameter, is the identity matrix).
This addition prevents from being singular by ensuring that remains
invertible. It increases numerical stability and reduces the impact of multicollinearity.
So in summary, Ridge regression's regularization term Ridge
enhances numerical stability in the
presence of correlated features, making the model estimation more robust compared to
standard linear regression.
3.5 Cross validation [7 pts] **[W]**
Let's use Cross Validation to search for the best value for c_lambda
in ridge regression.
ridge
Imagine we have a dataset of 10 points [1,2,3,4,5,6,7,8,9,10] and we want to do 5-fold cross
validation.
The first iteration we would train with [3,4,5,6,7,8,9,10] and test (validate) with [1,2]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
39/71
The second iteration we would train with [1,2,5,6,7,8,9,10] and test (validate) with [3,4]
The third iteration we would train with [1,2,3,4,7,8,9,10] and test (validate) with [5,6]
The fourth iteration we would train with [1,2,3,4,5,6,9,10] and test (validate) with [7,8]
The fifth iteration we would train with [1,2,3,4,5,6,7,8] and test (validate) with [9,10]
We provided a list of possible values for , and you will use them in cross validation. For
cross validation, use 10-fold
method and only use it for your training data (you already have
the train_indices
to get training data). For the training data, split them in 10 folds which
means that use 10 percent of training data for test and 90 percent for training. For each ,
you will have calculated 10 RMSE values. Compute the mean of the 10 RMSE values. Then
pick the with the lowest mean RMSE.
HINTS:
np.concatenate
is your friend
Make sure to follow the instruction given for ridge_fit_closed
ridge
in the list of
functions to implement above.
To use the 10-fold method, that would include looping over all the data 10 times, where
we split a different 10% of the data at every iteration. So the first iteration extracts the
first 10% to testing and the remaining 90% for training.The second iteration splits the
second 10% of data for testing and the (different) remaining 90% for testing. If we have
the array of elements 1 - 10, the second iteration would extract the number "2" because
that's in the second 10% of the array.
The hyperparameter_search
function will handle averaging the errors, so don't
average the errors in ridge_cross_validation
ridge
. We've done this so you can see your
error across every fold when using the gradescope tests.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
lambda_list =
[
0.0001
, 0.001
, 0.1
, 1
, 5
, 10
, 50
, 100
, 1000
, 10000
]
kfold =
10
best_lambda
, best_error
, error_list =
reg
.
hyperparameter_search
(
x_all_feat
[
train_indices
], y_all
[
train_indices
], lambda_list
, kfold
)
for
lm
, err in
zip
(
lambda_list
, error_list
):
print
(
"Lambda: %.4f" %
lm
, "RMSE: %.6f" %
err
)
print
(
"Best Lambda: %.4f" %
best_lambda
)
weight =
reg
.
ridge_fit_closed
ridge
(
x_all_feat
[
train_indices
], y_all_noisy
[
train_indices
], c_lambda
=
best_lambda
)
y_test_pred =
reg
.
predict
(
x_all_feat
[
test_indices
], weight
)
test_rmse =
reg
.
rmse
(
y_test_pred
, y_all_noisy
[
test_indices
])
print
(
"Best Test RMSE: %.4f" %
test_rmse
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
40/71
Lambda: 0.0001 RMSE: 0.957365
Lambda: 0.0010 RMSE: 0.955850
Lambda: 0.1000 RMSE: 0.953591
Lambda: 1.0000 RMSE: 0.951137
Lambda: 5.0000 RMSE: 0.949576
Lambda: 10.0000 RMSE: 0.949279
Lambda: 50.0000 RMSE: 0.949647
Lambda: 100.0000 RMSE: 0.951808
Lambda: 1000.0000 RMSE: 1.162351
Lambda: 10000.0000 RMSE: 3.065832
Best Lambda: 10.0000
Best Test RMSE: 1.0463
3.6 Noisy Input Samples in Linear Regression [10 pts Bonus for
All] **[W]**
Consider a linear model of the form:
where and weights . Given the the D-
dimension input sample set with corresponding target value
, the sum-of-squares error function is:
Now, suppose that Gaussian noise is added independently to each of the input
sample to generate a new sample set . Here, (an entry
of ) has zero mean and variance . For each sample , let
, where and is independent across both and indices.
1. (3pts) Show that 2. (7pts) Assume the sum-of-squares error function of the noise sample set
is . Prove the expectation of is equivalent
to the sum-of-squares error for noise-free input samples with the addition of a
weight-decay regularization term (e.g. norm) , in which the bias parameter is
omitted from the regularizer. In other words, show that
N.B. You should be incorporating your solution from the first part of this problem into the
given sum of squares equation for the second part.
Write your responses below using LaTeX in Markdown.
HINT:
During the class, we have discussed how to solve for the weight for ridge regression,
ridge
the function looks like this:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
41/71
where the first term is the sum-of-squares error and the second term is the
regularization term. N is the number of samples. In this question, we use another form
of the ridge regression, which is:
ridge
For the Gaussian noise , we have Assume the noise are independent
to each other, we have
1. Answer
After adding Guassian noise into input sample , we have
1. Answer:
According to the previous question, we have
For
simplicity, put , then we can get
Q4: Naive Bayes and Logistic Regression [35pts] **
[P]** | **[W]**
In Bayesian classification, we're interested in finding the probability of a label given some
observed feature vector , which we can write as . Bayes's
theorem tells us how to express this in terms of quantities we can compute more directly:
The main assumption in Naive Bayes is that, given the label, the observed features are
conditionally independent i.e.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
42/71
Therefore, we can rewrite Bayes rule as
Training Naive Bayes
One way to train a Naive Bayes classifier is done using frequentist approach to calculate
probability, which is simply going over the training data and calculating the frequency of
different observations in the training set given different labels. For example,
Testing Naive Bayes
During the testing phase, we try to estimate the probability of a label given an observed
feature vector. We combine the probabilities computed from training data to estimate the
probability of a given label. For example, if we are trying to decide between two labels and , then we compute the ratio of the posterior probabilities for each label:
All we need now is to compute and for each label by
plugging in the numbers we got during training. The label with the higher posterior
probabilities is the one that is selected.
4.1 Llama Breed Problem using Naive Bayes [5pts] [W]
Above are images of two different breeds of llamas – the Suri and the Wooly. The difference
between these two breeds is subtle, as these two breeds are often mixed up. However the
Suri Llama is vastly more valuable than the Wooly llama. You devise a way to determine with
some confidence, which is which – without the need for expensive genetic testing.
You look at four key features of the llama: {curly hair, over 14 inch tail, over 400 pounds,
extremely shy}.
You only have 7 randomly chosen llamas to work with, and their breed as the ground truth.
You record the data as vectors with the entry 1 if true and 0 if false. For example a llama with
vector {1,1,0,1} would have curly hair, a tail over 14 inches, be less
than 400 pounds, and be
extremely shy.
The Suri Llamas
yield the following data: {1, 0, 1, 0}, {0, 1, 0, 1}, {1, 1, 1, 1}, {0, 0, 0, 1}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
43/71
The Wooly Llamas
yield the following data: {0, 0, 1, 0}, {1, 1, 0, 0}, {1, 0, 1, 1}.
Now is the time to test your method!
You see a new llama you are interested in that has
curly hair, does
have a tail over 14 inches,
is more than
400 pounds, and is not
shy.
Using Naive Bayes, is this a Suri or a Wooly Llama?
NOTE: We expect students to show their work (Naive Bayes calculations) and not just
the final answer.
Answer
From the dataset, we can calculate the probabilities of Suri Llamas and Wooly Llamas:
Also, we can calculate the conditional probabilities of four features.
For Suri Llamas:
Therefore, the probability that the test llama is a Suri Llama can be calculated as follow:
For Wooly Llamas:
Therefore, the probability that the test llama is a Wooly Llama can be calculated as follow:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
44/71
Since , this new llama is classified as a Wooly Llama.
4.2 News Data Sentiment Classification via Logistic Regression
[30pts] **[P]**
This dataset contains the sentiments for financial news headlines from the perspective of a
retail investor. The sentiment of news has 3 classes, negative, positive and neutral. In this
problem, we only use the negative (class label = 0) and positive (class label = 1) classes for
binary logistic regression. For data preprocessing, we remove the duplicate headlines and
remove the neutral class to get 1967 unique news headlines. Then we randomly split the
1967 headlines into training set and evaluation set with 8:2 ratio. We use the training set to
fit a binary logistic regression model.
The code which is provided loads the documents, preprocess the data, builds a “bag of
words” representation of each document. Your task is to complete the missing portions of
the code in logisticRegression.py
to determine whether a news headline is negative or
positive.
In logistic_regression.py
file, complete the following functions:
sigmoid
: transform to probability of being positive using sigmoid function,
which is .
bias_augment
: augment with 1's to account for bias term in predict_probs
: predicts the probability of positive label predict_labels
: predicts labels
loss
: calculates binary cross-entropy loss
gradient
: calculate the gradient of the loss function with respect to the parameters .
accuracy
: calculate the accuracy of predictions
evaluate
: gives loss and accuracy for a given set of points
fit
: fit the logistic regression model on the training data.
Logistic Regression Overview:
1. In logistic regression, we model the conditional probability using parameters , which
includes a bias term b.
where is the sigmoid function as follows:
1. The conditional probabilities of the positive class and the negative class
of the sample attributes are combined into one equation as follows:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
45/71
1. Assuming that the samples are independent of each other, the likelihood of the entire
dataset is the product of the probabilities of all samples. We use maximum likehood
estimation to estimate the model parameters . The negative log likelihood (scaled by
the dataset size ) is given by:
where:
number of training samples
bag of words
features of the i-th training sample
label of the i-th training sample
Note that this will be our model's loss function
1. Then calculate the gradient and use gradient descent to optimize the loss function:
where is the learning rate and the gradient is given by:
4.2.1 Local Tests for Logistic Regression [No Points]
You may test your implementation of the functions contained in logistic_regression.py
in
the cell below. Feel free to comment out tests for functions that have not been completed
yet. See Using the Local Tests for more details.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestLogisticRegression
unittest_lr =
TestLogisticRegression
()
unittest_lr
.
test_sigmoid
()
unittest_lr
.
test_bias_augment
()
unittest_lr
.
test_loss
()
unittest_lr
.
test_predict_probs
()
unittest_lr
.
test_predict_labels
()
unittest_lr
.
test_loss
()
unittest_lr
.
test_accuracy
()
unittest_lr
.
test_evaluate
()
unittest_lr
.
test_fit
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
46/71
UnitTest passed successfully for "Logistic Regression sigmoid"!
UnitTest passed successfully for "Logistic Regression bias_augment"!
UnitTest passed successfully for "Logistic Regression loss"!
UnitTest passed successfully for "Logistic Regression predict_probs"!
UnitTest passed successfully for "Logistic Regression predict_labels"!
UnitTest passed successfully for "Logistic Regression loss"!
UnitTest passed successfully for "Logistic Regression accuracy"!
UnitTest passed successfully for "Logistic Regression evaluate"!
Epoch 0:
train loss: 0.675
train acc: 0.7
val loss: 0.675
val acc: 0.7
UnitTest passed successfully for "Logistic Regression fit"!
4.2.2 Logistic Regression Model Training [No Points]
Fit the model to the training data Try different learning rates lr
and number of epochs
to achieve >80% test accuracy.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
logistic_regression import
LogisticRegression as
LogReg
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
news_data =
pd
.
read_csv
(
"./data/news-data.csv"
, encoding
=
"cp437"
, header
=
None
)
class_to_label_mappings =
{
"negative"
: 0
, "positive"
: 1
}
label_to_class_mappings =
{
0
: "negative"
, 1
: "positive"
}
news_data
.
columns =
[
"Sentiment"
, "News"
]
news_data
.
drop_duplicates
(
inplace
=
True
)
news_data =
news_data
[
news_data
.
Sentiment !=
"neutral"
]
news_data
[
"Sentiment"
] =
news_data
[
"Sentiment"
]
.
map
(
class_to_label_mappings
)
vectorizer =
text
.
CountVectorizer
(
stop_words
=
"english"
)
X =
news_data
[
"News"
]
.
values
y =
news_data
[
"Sentiment"
]
.
values
.
reshape
(
-
1
, 1
)
RANDOM_SEED =
5
BOW =
vectorizer
.
fit_transform
(
X
)
.
toarray
()
indices =
np
.
arange
(
len
(
news_data
))
X_train
, X_test
, y_train
, y_test
, indices_train
, indices_test =
train_test_split
(
BOW
, y
, indices
, test_size
=
0.2
, random_state
=
RANDOM_SEED
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
model =
LogReg
()
lr =
0.05
epochs =
10000
theta =
model
.
fit
(
X_train
, y_train
, X_test
, y_test
, lr
, epochs
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
47/71
Epoch 0:
train loss: 0.69
train acc: 0.7
val loss: 0.691
val acc: 0.665
Epoch 1000:
train loss: 0.436
train acc: 0.794
val loss: 0.532
val acc: 0.701
Epoch 2000:
train loss: 0.364
train acc: 0.846
val loss: 0.484
val acc: 0.746
Epoch 3000:
train loss: 0.318
train acc: 0.873
val loss: 0.456
val acc: 0.761
Epoch 4000:
train loss: 0.286
train acc: 0.896
val loss: 0.438
val acc: 0.772
Epoch 5000:
train loss: 0.262
train acc: 0.914
val loss: 0.425
val acc: 0.782
Epoch 6000:
train loss: 0.242
train acc: 0.926
val loss: 0.416
val acc: 0.789
Epoch 7000:
train loss: 0.226
train acc: 0.933
val loss: 0.409
val acc: 0.797
Epoch 8000:
train loss: 0.212
train acc: 0.943
val loss: 0.404
val acc: 0.802
Epoch 9000:
train loss: 0.2
train acc: 0.95
val loss: 0.4
val acc: 0.799
4.2.3 Logistic Regression Model Evaluation [No Points]
Evaluate the model on the test dataset
Test Dataset Accuracy: 0.807
Plotting the loss function on the training data and the test data for every 100th epoch
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
test_loss
, test_acc =
model
.
evaluate
(
X_test
, y_test
, theta
)
print
(
f"Test Dataset Accuracy: {
round
(
test_acc
, 3
)
}"
)
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
model
.
plot_loss
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
48/71
Plotting the accuracy function on the training data and the test data for each epoch
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
model
.
plot_accuracy
()
In [ ]:
np
.
reshape
(
X_test
[
0
], (
1
, X_test
.
shape
[
1
]))
.
shape
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
49/71
(1, 5286)
Check out sample evaluations from the test set.
Out[ ]:
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
num_samples =
10
for
i in
range
(
10
):
rand_index =
np
.
random
.
randint
(
0
, len
(
X_test
))
x_test =
np
.
reshape
(
X_test
[
rand_index
], (
1
, X_test
.
shape
[
1
]))
prob =
model
.
predict_probs
(
model
.
bias_augment
(
x_test
), theta
)
pred =
model
.
predict_labels
(
prob
)
print
(
f"Input News: {
X
[
indices_test
[
rand_index
]]
}\n"
)
print
(
f"Predicted Sentiment: {
label_to_class_mappings
[
pred
[
0
][
0
]]
}"
)
print
(
f"Actual Sentiment: {
label_to_class_mappings
[
y_test
[
rand_index
][
0
]]
}\n"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
50/71
Input News: Operating profit for the 12-month period decreased from EUR5 .4 m whil
e net sales increased from EUR62 .0 m , as compared to the financial year 2004 .
Predicted Sentiment: negative
Actual Sentiment: negative
Input News: It is a disappointment to see the plan folded .
Predicted Sentiment: positive
Actual Sentiment: negative
Input News: Finnish Bore that is owned by the Rettig family has grown recently thr
ough the acquisition of smaller shipping companies .
Predicted Sentiment: positive
Actual Sentiment: positive
Input News: Meanwhile , Nokia said that it will be able to deliver a complete rang
e of services from deployment operations to consulting and integration to managed services as a result of the buyout .
Predicted Sentiment: positive
Actual Sentiment: positive
Input News: Mika Stahlberg , VP F-Secure Labs , said , `` We are excited and proud that F-Secure has been recognized by AV-Comparatives as the Product of the Year .
Predicted Sentiment: positive
Actual Sentiment: positive
Input News: Besides we have increased the share of meat in various sausages and ar
e offering a number of new tastes in the grill products and shish kebabs segment , '' Paavel said .
Predicted Sentiment: positive
Actual Sentiment: positive
Input News: Consumption is forecast to grow by about 2 % .
Predicted Sentiment: positive
Actual Sentiment: positive
Input News: In September alone , the market declined by 10.2 percent year-on-year to 19.28 million liters .
Predicted Sentiment: positive
Actual Sentiment: negative
Input News: However , the suspect stole his burgundy Nissan Altima .
Predicted Sentiment: positive
Actual Sentiment: negative
Input News: ADP News - May 29 , 2009 - Bank of America BofA downgraded today its r
atings on Swedish-Finnish paper maker Stora Enso Oyj HEL : STERV and on Finnish se
ctor player UPM-Kymmene Oyj HEL : UPM1V to `` underperf
Predicted Sentiment: negative
Actual Sentiment: negative
Q5: Noise in PCA and Linear Regression [15pts] **
[W]**
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
51/71
Both PCA and least squares regression can be viewed as algorithms for inferring (linear)
relationships among data variables. In this part of the assignment, you will develop some
intuition for the differences between these two approaches and develop an understanding
of the settings that are better suited to using PCA or better suited to using the least squares
fit.
The high level bit is that PCA is useful when there is a set of latent (hidden/underlying)
variables, and all the coordinates of your data are linear combinations (plus noise) of those
variables. The least squares fit is useful when you have direct access to the independent
variables, so any noisy coordinates are linear combinations (plus noise) of known variables.
5.1 Slope Functions [5 pts] **[W]**
In the following cell
, complete the following:
1. pca_slope
: For this function, assume that is the first feature and is the second
feature for the data. Write a function, that takes in the first feature vector and the
second feature vector . Stack these two feature vectors into a single N x 2 matrix and
use this to determine the first principal component vector of this dataset. Be careful of
how you are stacking the two vectors. You can check the output by printing it which
should help you debug. Finally, return the slope of this first component. You should use
the PCA implementation from Q2.
2. lr_slope
: Write a function that takes and and returns the slope of the least squares
fit. You should use the Linear Regression implementation from Q3 but do not use any
kind of regularization. Think about how weight could relate to slope.
In later subparts, we consider the case where our data consists of noisy measurements of and . For each part, we will evaluate the quality of the relationship recovered by PCA, and
that recovered by standard least squares regression.
As a reminder, least squares regression minimizes the squared error of the dependent
variable from its prediction. Namely, given pairs, least squares returns the line that minimizes .
In [ ]:
import
numpy as
np
from
pca import
PCA
from
regression import
Regression
def
pca_slope
(
X
, y
):
"""
Calculates the slope of the first principal component given by PCA
Args:
x: N x 1 array of feature x
y: N x 1 array of feature y
Return:
slope: (float) scalar slope of the first principal component
"""
pca =
PCA
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
52/71
We will consider a simple example with two variables, and , where the true relationship
between the variables is . Our goal is to recover this relationship—namely, recover
the coefficient “3”. We set and . Make sure both
functions return 3.
data =
np
.
concatenate
((
X
,
y
), axis
=
1
)
pca
.
fit
(
data
)
pc_1 =
pca
.
get_V
()
slope =
pc_1
[
0
, 1
] /
pc_1
[
0
,
0
]
return
slope
def
lr_slope
(
X
, y
):
"""
Calculates the slope of the best fit returned by linear_fit_closed()
For this function don't use any regularization
Args:
X: N x 1 array corresponding to a dataset
y: N x 1 array of labels y
Return:
slope: (float) slope of the best fit
"""
reg =
Regression
()
weight =
reg
.
linear_fit_closed
(
X
,
y
)
return
weight
[
0
, 0
]
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
x =
np
.
arange
(
0
, 1.02
, 0.02
)
.
reshape
(
-
1
, 1
)
y =
3 *
np
.
arange
(
0
, 1.02
, 0.02
)
.
reshape
(
-
1
, 1
)
print
(
"Slope of first principal component"
, pca_slope
(
x
, y
))
print
(
"Slope of best linear fit"
, lr_slope
(
x
, y
))
fig =
plt
.
figure
()
plt
.
scatter
(
x
, y
)
plt
.
xlabel
(
"x"
)
plt
.
ylabel
(
"y"
)
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT *
0.8
,
)
plt
.
show
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
53/71
Slope of first principal component 2.9999999999999987
Slope of best linear fit 3.0
5.2 Analysis Setup [5 pts] **[W]**
Error in y
In this subpart, we consider the setting where our data consists of the actual values of , and
noisy estimates of . Run the following cell to see how the data looks when there is error in
.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
base =
np
.
arange
(
0.001
, 1.001
, 0.001
)
.
reshape
(
-
1
, 1
)
c =
0.5
X =
base
y =
3 *
base +
np
.
random
.
normal
(
loc
=
[
0
], scale
=
c
, size
=
base
.
shape
)
fig =
plt
.
figure
()
plt
.
scatter
(
X
, y
)
plt
.
xlabel
(
"x"
)
plt
.
ylabel
(
"y"
)
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA
,
fontname
=
EO_FONT
,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
54/71
In following cell
, you will implement the addNoise
function:
1. Create a vector where .
2. For a given noise level , set , and
. You can use the np.random.normal
function, where scale
is equal to noise level, to add noise to your points.
3. Notice the parameter x*noise
in the addNoise
function. When this parameter is set to
, you will have to add noise to . For a given noise level c, let
, and 4. Return the pca_slope
and lr_slope
values of this and dataset you have created
where has noise (
or depending on the problem).
Hint 1:
Refer to the above example on how to add noise to or Hint 2:
Be careful not to add double noise to or ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
plt
.
show
()
In [ ]:
def
addNoise
(
c
, x_noise
=
False
, seed
=
1
):
"""
Creates a dataset with noise and calculates the slope of the dataset
using the pca_slope and lr_slope functions implemented in this class.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
55/71
A scatter plot with on the horizontal axis and the output of pca_slope
and lr_slope
on the
vertical axis has already been implemented for you.
A sample has been taken for each in . The output of
pca_slope
is plotted as a red dot, and the output of lr_slope
as a blue dot. This has been
repeated 30 times, you can see that we end up with a plot of 1260 dots, in 21 columns of 60,
half red and half blue. Note that the plot you get might not look exactly like the TA version
and that is fine because you might have randomized the noise slightly differently than how
we did it.
NOTE
: Here, x_noise = False
since we only want Y to have noise.
Args:
c: (float) scalar, a given noise level to be used on Y and/or X
x_noise: (Boolean) When set to False, X should not have noise added
When set to True, X should have noise.
Note that the noise added to X should be different from the
noise added to Y. You should NOT use the same noise you add
to Y here.
seed: (int) Random seed
Return:
pca_slope_value: (float) slope value of dataset created using pca_slope
lr_slope_value: (float) slope value of dataset created using lr_slope
"""
np
.
random
.
seed
(
seed
) #### DO NOT CHANGE THIS ####
############# START YOUR CODE BELOW #############
X =
np
.
arange
(
0.001
,
1
+
0.001
,
0.001
)
Y_cap =
4
*
X +
np
.
random
.
normal
(
loc
=
0.0
, scale
=
c
, size
=
(
X
.
shape
)) if
x_noise
:
X =
X +
np
.
random
.
normal
(
loc
=
0.0
, scale
=
c
, size
=
(
X
.
shape
))
X =
X
.
reshape
((
-
1
, 1
))
Y_cap =
Y_cap
.
reshape
((
-
1
,
1
))
#print(X.shape)
#print(Y_cap.shape)
pca_slope_value =
pca_slope
(
X
, Y_cap
)
lr_slope_value =
lr_slope
(
X
, Y_cap
)
############# END YOUR CODE ABOVE #############
return
pca_slope_value
, lr_slope_value
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
pca_slope_values =
[]
linreg_slope_values =
[]
c_values =
[]
s_idx =
0
for
i in
range
(
30
):
for
c in
np
.
arange
(
0
, 1.05
, 0.05
):
# Calculate pca_slope_value (psv) and lr_slope_value (lsv)
psv
, lsv =
addNoise
(
c
, seed
=
s_idx
)
# Append pca and lr slope values to list for plot function
pca_slope_values
.
append
(
psv
)
linreg_slope_values
.
append
(
lsv
)
# Append c value to list for plot function
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
56/71
Error in and We will now examine the case where our data consists of noisy estimates of both
and .
Run the following cell to see how the data looks when there is error in both.
c_values
.
append
(
c
)
# Increment random seed index
s_idx +=
1
fig =
plt
.
figure
()
plt
.
scatter
(
c_values
, pca_slope_values
, c
=
"r"
)
plt
.
scatter
(
c_values
, linreg_slope_values
, c
=
"b"
)
plt
.
xlabel
(
"c"
)
plt
.
ylabel
(
"slope"
)
if
not
STUDENT_VERSION
:
fig
.
text
(
0.6
,
0.4
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA *
0.5
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
plt
.
show
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
57/71
In the below cell, we graph the predicted PCA and LR slopes on the vertical axis against the
value of c on the horizontal axis. Note that the graph you get might not look exactly like the
TA version and that is fine because you might have randomized the noise slightly differently
than how we did it.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
base =
np
.
arange
(
0.001
, 1
, 0.001
)
.
reshape
(
-
1
, 1
)
c =
0.5
X =
base +
np
.
random
.
normal
(
loc
=
[
0
], scale
=
c
, size
=
base
.
shape
)
y =
3 *
base +
np
.
random
.
normal
(
loc
=
[
0
], scale
=
c
, size
=
base
.
shape
)
fig =
plt
.
figure
()
plt
.
scatter
(
X
, y
)
plt
.
xlabel
(
"x"
)
plt
.
ylabel
(
"y"
)
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA *
0.8
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
plt
.
show
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
58/71
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
pca_slope_values =
[]
linreg_slope_values =
[]
c_values =
[]
s_idx =
0
for
i in
range
(
30
):
for
c in
np
.
arange
(
0
, 1.05
, 0.05
):
# Calculate pca_slope_value (psv) and lr_slope_value (lsv), notice x_noise psv
, lsv =
addNoise
(
c
, x_noise
=
True
, seed
=
s_idx
)
# Append pca and lr slope values to list for plot function
pca_slope_values
.
append
(
psv
)
linreg_slope_values
.
append
(
lsv
)
# Append c value to list for plot function
c_values
.
append
(
c
)
# Increment random seed index
s_idx +=
1
fig =
plt
.
figure
()
plt
.
scatter
(
c_values
, pca_slope_values
, c
=
"r"
)
plt
.
scatter
(
c_values
, linreg_slope_values
, c
=
"b"
)
plt
.
xlabel
(
"c"
)
plt
.
ylabel
(
"slope"
)
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA *
0.5
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
rotation
=
EO_ROT
,
)
plt
.
show
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
59/71
5.3. Analysis [5 pts] **[W]**
Based on your observations from previous subsections answer the following questions about
the two cases (error in and error in both and ) in 2-3 lines.
NOTE:
1. You don't need to provide a mathematical proof for this question.
2. Understanding how PCA and Linear Regression work should help you decipher which
case was better for which algorithm. Base your answer on this understanding of how
either algorithms works.
QUESTIONS:
1. Based on the obtained plots, how can you determine which technique (PCA/Linear
regression) is performing better in comparison? (1 Pt)
2. In case-1 where there is error in which technique gave better performance and why
do you think it performed better in this case? (2 Pts)
3. In case-2 where there is error in both and which technique gave better
performance and why do you think it performed better in this case? (2 Pts)
Answer
1. The closer the value of the calculated slope to actual slope ("4" here) the better the
algorithm is performing.In the above 2 cases, Linear Regression performs better when
we have ground truth X data and noisy Y data, and PCA performs better when X and Y
are both noisy.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
60/71
2. In case-1 where there is error in , Linear Regression performs better than PCA because
because it is specifically designed to predict the dependent variable (in this case, )
from the independent variable (here, ), and is robust to noise in the dependent
variable. PCA, on the other hand, may be misled by the variance introduced by the noise
in , as it treats all variables equally and aims to capture the direction of maximum
variance in the data, regardless of whether this variance reflects the underlying
relationship between the variables.
3. In case-2 where there is error in both and , PCA performs better than Linear
Regression, because the noise affects both dimensions in a similar way, allowing PCA to
identify the principal component that captures the core variance in the original and
data. In contrast, linear regression, which relies on the assumption of a less noisy
independent variable, struggled under these conditions.
Q6 Feature Reduction Implementation [25pts Bonus
for All] **[P]** | **[W]**
6.1 Implementation [18 Points] **[P]**
Feature selection is an integral aspect of machine learning. It is the process of selecting a
subset of relevant features that are to be used as the input for the machine learning task.
Feature selection may lead to simpler models for easier interpretation, shorter training times,
avoidance of the curse of dimensionality, and better generalization by reducing overfitting.
In the feature_reduction.py
file, complete the following functions:
forward_selection
backward_elimination
These functions should each output a list of features.
Forward Selection:
In forward selection, we start with a null model, start fitting the model with one individual
feature at a time, and select the feature with the minimum p-value. We continue to do this
until we have a set of features where one feature's p-value is less than the confidence level.
Steps to implement it:
1. Choose a significance level (given to you).
2. Fit all possible simple regression models by considering one feature at a time.
3. Select the feature with the lowest p-value.
4. Fit all possible models with one extra feature added to the previously selected
feature(s).
5. Select the feature with the minimum p-value again. if p_value < significance, go to Step
4. Otherwise, terminate.
Backward Elimination:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
61/71
In backward elimination, we start with a full model, and then remove the insignificant feature
with the highest p-value (that is greater than the significance level). We continue to do this
until we have a final set of significant features.
Steps to implement it:
1. Choose a significance level (given to you).
2. Fit a full model including all the features.
3. Select the feature with the highest p-value. If (p-value > significance level), go to Step 4,
otherwise terminate.
4. Remove the feature under consideration.
5. Fit a model without this feature. Repeat entire process from Step 3 onwards.
HINT 1:
The p-value is known as the observed significance value for a null hypothesis. In our
case, the p-value of a feature is associated with the hypothesis . If , then
this feature contributes no predictive power to our model and should be dropped. We reject
the null hypothesis if the p-value is smaller than our significance level. More briefly, a p-
value is a measure of how much the given feature significantly represents an observed
change. A lower p-value represents higher significance. Some more information about p-
values can be found here: https://towardsdatascience.com/what-is-a-p-value-b9e6c207247f
HINT 2:
For this function, you will have to install statsmodels if not installed already. To do
this, run pip install statsmodels
in command line/terminal. In the case that you are
using an Anaconda environment, run conda install -c conda-forge statsmodels
in
the command line/terminal. For more information about installation, refer to
https://www.statsmodels.org/stable/install.html
. The statsmodels library is a Python module
that provides classes and functions for the estimation of many different statistical models, as
well as for conducting statistical tests, and statistical data exploration. You will have to use
this library to choose a regression model to fit your data against. Some more information
about this module can be found here: https://www.statsmodels.org/stable/index.html
HINT 3:
For step 2 in each of the forward and backward selection functions, you can use the
sm.OLS
function as your regression model. Also, do not forget to add a bias to your
regression model. A function that may help you is the sm.add_constants
function.
TIP 4: You should be able to implement these function using only the libraries provided in
the cell below.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
feature_reduction import
FeatureReduction
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
bc_dataset =
load_breast_cancer
()
bc =
pd
.
DataFrame
(
bc_dataset
.
data
, columns
=
bc_dataset
.
feature_names
)
bc
[
"Diagnosis"
] =
bc_dataset
.
target
# print(bc)
X =
bc
.
drop
(
"Diagnosis"
, axis
=
1
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
62/71
Features selected by forward selection: ['worst concave points', 'worst radius', 'worst texture', 'worst area', 'smoothness error', 'worst symmetry', 'compactness error', 'radius error', 'worst fractal dimension', 'mean compactness', 'mean conca
ve points', 'worst concavity', 'concavity error', 'area error']
Features selected by backward elimination: ['mean radius', 'mean compactness', 'me
an concave points', 'radius error', 'smoothness error', 'concavity error', 'concav
e points error', 'worst radius', 'worst texture', 'worst area', 'worst concavity', 'worst symmetry', 'worst fractal dimension']
6.2 Feature Selection - Discussion [7pts] **[W]**
Question 6.2.1:
We have seen two regression methods namely Lasso and Ridge regression earlier in this
Ridge
assignment. Another extremely important and common use-case of these methods is to
perform feature selection. Considering there are no restrictions set on the dataset, according
to you, which of these two methods is more appropriate for feature selection generally
(choose one method)? Why? (3 pts)
Answer
- For feature selection purposes, Lasso regression is typically the preferred method.
Its key advantage lies in its capability to reduce the coefficients of less influential features to
zero, thereby excluding them from the equation. This characteristic is especially valuable for
pinpointing key features in a large dataset. Ridge regression, on the other hand, usually only
Ridge
decreases the coefficients close to zero but doesn't set any of them exactly to zero, making
it less suitable for direct feature reduction.
Question 6.2.2:
We have seen that we use different subsets of features to get different regression models.
These models depend on the relevant features that we have selected. Using forward
selection, what fraction of the total possible models can we explore? Assume that the total
number of features that we have at our disposal is . Remember that in stepwise feature
selection (like forward selection and backward elimination), we always include an intercept in
our model, so you only need to consider the features. (4 pts)
Answer
In forward selection with features, the fraction of total possible models explored
is determined by the number of models considered in forward selection divided by the total
possible models.
The total number of models with features is . Now if we consider the number of
models we take into account at each step, we would have one less model to choose from in
every subsequent step:
a. At first step, we will have all N features, hence N models to evaluate and pick one.
y =
bc
[
"Diagnosis"
]
featureselection =
FeatureReduction
()
# Run the functions to make sure two lists are generated, one for each method
print
(
"Features selected by forward selection:"
, FeatureReduction
.
forward_selection
(
X
)
print
(
"Features selected by backward elimination:"
,
FeatureReduction
.
backward_elimination
(
X
, y
),
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
63/71
b. At second step, now we have N-1 features, hence N-1 models now
c. This process goes until Nth step, where we consider just 1 model that's left
Therefore, total number of models explored
Therefore, fraction of total models we explore would be: fraction =
Q7: Netflix Movie Recommendation
Problem Solved using SVD [10pts Bonus for
All] **[P]**
Let us try to tackle the famous problem of movie recommendation using just our SVD
functions that we have implemented. We are given a table of reviews that 600+ users have
provided for close to 10,000 different movies. Our challenge is to predict how much a user
would rate a movie that they have not seen (or rated) yet. Once we have these ratings, we
would then be able to predict which movies to recommend to that user.
Understanding How SVD Helps in Movie Recommendation
We are given a dataset of user-movie ratings (
) that looks like the following:
Ratings in the matrix range from 1-5. In addition, the matrix contains nan
wherever there is
no rating provided by the user for the corresponding movie. One simple way to utilize this
matrix to predict movie ratings for a given user-movie pair would be to fill in each row /
column with the average rating for that row / column. For example: For each movie, if any
rating is missing, we could just fill in the average value of all available ratings and expect this
to be around the actual / expected rating.
While this may sound like a good approximation, it turns out that by just using SVD we can
improve the accuracy of the predicted rating.
How does SVD fit into this picture?
Recall how we previously used SVD to compress images by throwing out less important
information. We could apply the same idea to our above matrix (
) to generate another
matrix (
) which will provide the same information, i.e ratings for any user-movie pairs but
by combining only the most important features.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
64/71
Let's look at this with an example:
Assume that decomposition of matrix looks like:
We can re-write this decomposition as follows:
If we were to take only the top K singular values from this matrix, we could again write this
as:
Thus we have now effectively separated our ratings matrix into two matrices given by:
and There are many ways to visualize the importance of and matrices but with respect to
our context of movie ratings, we can visualize these matrices as follows:
We can imagine each row of to be holding some information how much each user likes a
particular feature (feature1, feature2, feature 3...feature ). On the contrary, we can imagine
each column of to be holding some information about how much each movie relates to
the given features (feature 1, feature 2, feature 3 ... feature ).
Lets denote the row of by and the column of by . Then the dot-product:
can provide us with information on how much a user i
likes movie j
.
What have we achieved by doing this?
Starting with a matrix containing very few ratings, we have been able to summarize the
sparse matrix of ratings into matrices and which each contain feature vectors about
the Users and the Movies. Since these feature vectors are summarized from only the most
important K features (by our SVD), we can predict any User-Movie rating that is closer to the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
65/71
actual value than just taking any average rating of a row / column (recall our brute force
solution discussed above).
Now this method in practice is still not close to the state-of-the-art but for a naive and
simple method we have used, we can still build some powerful visualizations as we will see
in part 3.
We have divided the task into 3 parts:
1. Implement recommender_svd
to return matrices and 2. Implement predict
to predict top 3 movies a given user would watch
3. (Ungraded)
Feel free to run the final cell labeled to see some visualizations of the
feature vectors you have generated
Hint: Movie IDs are IDs assigned to the movies in the dataset and can be greater than the
number of movies. This is why we have given movies_index and users_index as well that map
between the movie IDs and the indices in the ratings matrix. Please make sure to use this as
well.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
svd_recommender import
SVDRecommender
from
regression import
Regression
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
recommender =
SVDRecommender
()
recommender
.
load_movie_data
()
regression =
Regression
()
# Read the data into the respective train and test dataframes
train
, test =
recommender
.
load_ratings_datasets
()
print
(
"---------------------------------------------"
)
print
(
"Train Dataset Stats:"
)
print
(
"Shape of train dataset: {}"
.
format
(
train
.
shape
))
print
(
"Number of unique users (train): {}"
.
format
(
train
[
"userId"
]
.
unique
()
.
shape
[
0
]
print
(
"Number of unique users (train): {}"
.
format
(
train
[
"movieId"
]
.
unique
()
.
shape
[
0
print
(
"Sample of Train Dataset:"
)
print
(
"------------------------------------------"
)
print
(
train
.
head
())
print
(
"------------------------------------------"
)
print
(
"Test Dataset Stats:"
)
print
(
"Shape of test dataset: {}"
.
format
(
test
.
shape
))
print
(
"Number of unique users (test): {}"
.
format
(
test
[
"userId"
]
.
unique
()
.
shape
[
0
]))
print
(
"Number of unique users (test): {}"
.
format
(
test
[
"movieId"
]
.
unique
()
.
shape
[
0
])
print
(
"Sample of Test Dataset:"
)
print
(
"------------------------------------------"
)
print
(
test
.
head
())
print
(
"------------------------------------------"
)
# We will first convert our dataframe into a matrix of Ratings: R
# R[i][j] will indicate rating for movie:(j) provided by user:(i)
# users_index, movies_index will store the mapping between array indices and actual
R
, users_index
, movies_index =
recommender
.
create_ratings_matrix
(
train
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
66/71
---------------------------------------------
Train Dataset Stats:
Shape of train dataset: (88940, 4)
Number of unique users (train): 671
Number of unique users (train): 8370
Sample of Train Dataset:
------------------------------------------
userId movieId rating timestamp
0 1 2294 2.0 1260759108
1 1 2455 2.5 1260759113
2 1 3671 3.0 1260759117
3 1 1339 3.5 1260759125
4 1 1343 2.0 1260759131
------------------------------------------
Test Dataset Stats:
Shape of test dataset: (10393, 4)
Number of unique users (test): 671
Number of unique users (test): 4368
Sample of Test Dataset:
------------------------------------------
userId movieId rating timestamp
0 1 2968 1.0 1260759200
1 1 1405 1.0 1260759203
2 1 1172 4.0 1260759205
3 2 52 3.0 835356031
4 2 314 4.0 835356044
------------------------------------------
Shape of Ratings Matrix (R): (671, 8370)
7.1.1 Implement the recommender_svd
method to
use SVD for Recommendation [5pts] **[P]**
In svd_recommender.py
file, complete the following function:
recommender_svd
: Use the above equations to output and . You can utilize the
svd
and compress
methods from imgcompression.py
to retrieve your initial , and matrices. Then, calculate and based on the decomposition example above.
Local Test for recommender_svd
Function [No Points]
You may test your implementation of the function in the cell below. See Using the Local
Tests for more details.
print
(
"Shape of Ratings Matrix (R): {}"
.
format
(
R
.
shape
))
# Replacing `nan` with average rating given for the movie by all users
# Additionally, zero-centering the array to perform SVD
mask =
np
.
isnan
(
R
)
masked_array =
np
.
ma
.
masked_array
(
R
, mask
)
r_means =
np
.
array
(
np
.
mean
(
masked_array
, axis
=
0
))
R_filled =
masked_array
.
filled
(
r_means
)
R_filled =
R_filled -
r_means
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
from
utilities.localtests import
TestSVDRecommender
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
67/71
UnitTest passed successfully for "recommender_svd() function"!
RMSE for k = 2 --> 1.0223035413708281
RMSE for k = 3 --> 1.022526649417955
RMSE for k = 8 --> 1.0182709203352787
RMSE for k = 15 --> 1.017307118738714
RMSE for k = 18 --> 1.0166562048687973
RMSE for k = 25 --> 1.0182856984912254
RMSE for k = 30 --> 1.0186282488126601
Plot the Test Error over the different values of k
unittest_svd_rec =
TestSVDRecommender
()
unittest_svd_rec
.
test_recommender_svd
()
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
# Implement the method `recommender_svd` and run it for the following values of fea
no_of_features =
[
2
, 3
, 8
, 15
, 18
, 25
, 30
]
test_errors =
[]
for
k in
no_of_features
:
U_k
, V_k =
recommender
.
recommender_svd
(
R_filled
, k
)
pred =
[] # to store the predicted ratings
for
_
, row in
test
.
iterrows
():
user =
row
[
"userId"
]
movie =
row
[
"movieId"
]
u_index =
users_index
[
user
]
# If we have a prediction for this movie, use that
if
movie in
movies_index
:
m_index =
movies_index
[
movie
]
pred_rating =
np
.
dot
(
U_k
[
u_index
, :], V_k
[:, m_index
]) +
r_means
[
m_inde
# Else, use an average of the users ratings
else
:
pred_rating =
np
.
mean
(
np
.
dot
(
U_k
[
u_index
], V_k
)) +
r_means
[
m_index
]
pred
.
append
(
pred_rating
)
test_error =
regression
.
rmse
(
test
[
"rating"
], pred
)
test_errors
.
append
(
test_error
)
print
(
"RMSE for k = {} --> {}"
.
format
(
k
, test_error
))
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
fig =
plt
.
figure
()
plt
.
plot
(
no_of_features
, test_errors
, "bo"
)
plt
.
plot
(
no_of_features
, test_errors
)
plt
.
xlabel
(
"Value for k"
)
plt
.
ylabel
(
"RMSE on Test Dataset"
)
plt
.
title
(
"SVD Recommendation Test Error with Different k values"
)
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA *
0.5
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
68/71
7.1.2 Implement the predict
method to find which
movie a user is interested in watching next [5pts] **
[P]**
Our goal here is to predict movies that a user would be interested in watching next. Since
our dataset contains a large list of movies and our model is very naive, filtering among this
huge set for top 3 movies can produce results that we may not correlate immediately.
Therefore, we'll restrict this prediction to only movies among a subset as given by
movies_pool.
Let us consider a user (ID: 660) who has already watched and rated well (>3) on the
following movies:
Iron Man (2008)
Thor: The Dark World (2013)
Avengers, The (2012)
The following cell tries to predict which among the movies given by the list below, the user
would be most interested in watching next:
movies_pool
:
Ant-Man (2015)
rotation
=
EO_ROT
,
)
plt
.
show
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
69/71
Iron Man 2 (2010)
Avengers: Age of Ultron (2015)
Thor (2011)
Captain America: The First Avenger (2011)
Man of Steel (2013)
Star Wars: Episode IV - A New Hope (1977)
Ladybird Ladybird (1994)
Man of the House (1995)
Jungle Book, The (1994)
In svd_recommender.py
file, complete the following function:
predict
: Predict the next 3 movies that the user would be most interested in watching
among the ones above.
HINT: You can use the method get_movie_id_by_name
to convert movie names into
movie IDs and vice-versa.
NOTE: The user may have already watched and rated some of the movies in
movies_pool
. Remember to filter these out before returning the output. The original
Ratings Matrix, might come in handy here along with np.isnan
Local Test for predict
Functions [No Points]
You may test your implementation of the function in the cell below. See Using the Local
Tests for more details.
Top 3 Movies the User would want to watch:
Captain America: The First Avenger (2011)
Ant-Man (2015)
Avengers: Age of Ultron (2015)
--------------------------------------------------------------
UnitTest passed successfully for "predict() function"!
7.2 Visualize Movie Vectors [No Points]
Our model is still a very naive model, but it can still be used for some powerful analysis such
as clustering similar movies together based on user's ratings.
We have said that our matrix that we have generated above contains information about
movies. That is, each column in contains (feature 1, feature 2, ....
feature ) for each
movie. We can also say this in other terms that gives us a feature vector (of length k) for
each movie that we can visualize in a -dimensional space. For example, using this feature
vector, we can find out which movies are similar or vary.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
unittest_svd_rec
.
test_predict
()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
70/71
While we would love to visualize a -dimensional space, the constraints of our 2D screen
wouldn't really allow us to do so. Instead let us set and try to plot the feature vectors
for just a couple of these movies.
As a fun activity run the following cell to visualize how our model separates the two sets of
movies given below.
NOTE:
There are 2 possible visualizations. Your plot could be the one that's given on the
expected PDF or the one where the y-coordinates are inverted.
In [ ]:
###############################
### DO NOT CHANGE THIS CELL ###
###############################
marvel_movies =
[
"Thor: The Dark World (2013)"
,
"Avengers: Age of Ultron (2015)"
,
"Ant-Man (2015)"
,
"Iron Man 2 (2010)"
,
"Avengers, The (2012)"
,
"Thor (2011)"
,
"Captain America: The First Avenger (2011)"
,
]
marvel_labels =
[
"Blue"
] *
len
(
marvel_movies
)
star_wars_movies =
[
"Star Wars: Episode IV - A New Hope (1977)"
,
"Star Wars: Episode V - The Empire Strikes Back (1980)"
,
"Star Wars: Episode VI - Return of the Jedi (1983)"
,
"Star Wars: Episode I - The Phantom Menace (1999)"
,
"Star Wars: Episode II - Attack of the Clones (2002)"
,
"Star Wars: Episode III - Revenge of the Sith (2005)"
,
]
star_wars_labels =
[
"Green"
] *
len
(
star_wars_movies
)
movie_titles =
star_wars_movies +
marvel_movies
genre_labels =
star_wars_labels +
marvel_labels
movie_indices =
[
movies_index
[
recommender
.
get_movie_id_by_name
(
str
(
x
))] for
x in
movie_titles
]
_
, V_k =
recommender
.
recommender_svd
(
R_filled
, k
=
2
)
x
, y =
V_k
[
0
, movie_indices
], V_k
[
1
, movie_indices
]
fig =
plt
.
figure
()
plt
.
scatter
(
x
, y
, c
=
genre_labels
)
for
i
, movie_name in
enumerate
(
movie_titles
):
plt
.
annotate
(
movie_name
, (
x
[
i
], y
[
i
]))
if
not
STUDENT_VERSION
:
fig
.
text
(
0.5
,
0.5
,
EO_TEXT
,
transform
=
fig
.
transFigure
,
fontsize
=
EO_SIZE /
2
,
color
=
EO_COLOR
,
alpha
=
EO_ALPHA *
0.5
,
fontname
=
EO_FONT
,
ha
=
"center"
,
va
=
"center"
,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/10/23, 11:49 PM
FALL2023_HW3_Student_Bharat
file:///C:/Users/Arpit/Downloads/FALL2023_HW3_Student_Bharat.html
71/71
rotation
=
EO_ROT
,
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L

Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Recommended textbooks for you
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageNp Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:CengageEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENT
- Systems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage LearningCOMPREHENSIVE MICROSOFT OFFICE 365 EXCEComputer ScienceISBN:9780357392676Author:FREUND, StevenPublisher:CENGAGE LDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781305627482Author:Carlos Coronel, Steven MorrisPublisher:Cengage Learning
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L

Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning