Okafor_Odera.HW2_report
pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
6740
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
1
Uploaded by ElderJaguarPerson412
1.Conceptual Questions - photo
2 PCA Food Consuption in European Countries
The dataset represent food consumption. Each rows is a country and each column represents a specific food.Each cell in every column represents a specific food item to each country row.In the above
scatterplot you can see that PCA was performed by treating each country's food consumption as "feature".You can see cluster patterns with Denmark,Swedwn,Norway,Finland,Sweden are clustered
together.Spain,Italy,Portugul are also clustered together.Theses countries may have the same dietary consumption patterns.
The dataset treated country consumptions as feature vectors and performs PCA on them.In the scatterplot view the data is roughly clustered into two-three larger groupings.The one side seems to represent
fresher food and the other side seems to represent frozen foods or refrigerated foods with the exception of potatoes.The middle clusters seems to group non-perishable with outlier for garlic and olive oil.I think
the outliers are tied to consumption of garlic in differnt regions.Garlic and olive oil may be consumed alot in some countires and not nearly as much on others.
3 Order of faces using ISOMAP
In this question we will approach nonlinear dimensionality reduction using ISOMAP.Our goal is to visualize the image data in a low-dimensional space and build insite from the mappings.Since the dataset is non-
linear PCA and MDS cannot be used until reduced.The ISOMAP proccess requires three name steps.First we would create the Adjacency matrix and Distance Matrix.Then we would create a centering matrix
that will modify our distance matrix.From there we could create a kernal matrix from our centered matrix and distance matrix. Our last step would use eigendecomposition to find our eignvalues and eignvectors
from our kernal matrix.
A.)The visualization of the weighted adjacency matrix is shown above.After creating and zero matrix and implementing the Euclidean distance between each point.The adjacency matrix is created by setting a
threshold for each face so that it connects to at least the desired nearest neighbors.The threshold use 13.9
B.)The photos are cluster by face position.The scatterplot represents an natural change of direction.Moving across the horizantal axis, the images show headmovment from left to right.Moving acrosss the
vertical axis,it shows head movement looking up and down.The distinguishes are clearer towards the edge of the scatterplot.
Perform PCA (you can now use your implementation written in Question 1) on the images and project them into the top 2 principal components. Again show them on a scatter plot. Explain whether or you see a
more meaningful projection using ISOMAP than PCA.
C.)Comparing the two plots,PCA show similar results but has more outliers on its plot.ISOMAP provides more meaningful projections.The outliers show faces with differnt directions facing each other.PCA looks
like an arrangement by saturated color and doesnt correctly represent the geodesic distance between images.
1. Eigenfaces and Simple Face Recognition
The above plots show the top 6 eigenvalues.The images are from left to right.It seems like some of the eigenfaces have some correlation to one another and some do not.The first two eigenfaces look very
similar with lighting on the right vs left side for Subject 1.For Subject 2 the first two eigenfaces look the same with different levels of saturation. Larger eigenvalues have more recognizable features and lower
eigenvalues create an image with more distortion and become visibly less pleasing.
Projected Residual
S11: 106442692264.84364
S21: 151902624321.72995
S12: 176443617103.59607
S22: 228289385704.42532
The projection residual score measures the similarity between the two vectors.The cosine angle of the two vectors are check for directional similarities.We can compare perfomace by reviewing these
scores.Since S22 is great than S21 we can confirm that it belong to subject 2.Therefore since S12 is larger than S11,we can assume that it belongs to subject 1.
Part C
The results seems higher than i was expecting.I think the accuracy could have been improved.Using more images or more eignfaces or even more subject could have help.More test data could have help or
maybe a different scoring method.
5.To subtract or not to subtract, that is the question -photo
In [1]:
import
numpy as
np
import
math
import
matplotlib.pyplot as
plt
import
scipy.io as
spio
import
scipy.sparse.linalg as
ll
import
sklearn.preprocessing as
skpp
import
pandas as
pd
#Part A
#Read in the data
food =
pd
.
read_csv
(
"data/food-consumption.csv"
)
f =
np
.
array
(
food
[
food
.
columns
[
1
:]])
countries =
food
[
"Country"
]
# print(food)
#indicator matrix
# print(Inew)
m
,
n =
f
.
shape
#normalize the data
# print(f.T)
f
=
f
.
T
#PCA
mu =
np
.
mean
(
f
,
axis =
1
,
keepdims =
True
)
f =
f -
mu
C =
np
.
dot
(
f
,
f
.
T
)
/
m
K
=
2
#Find k eigenvalues and eigenvectors of the square matrix A.
S
,
W =
ll
.
eigs
(
C
,
k =
K
)
S =
S
.
real
W =
W
.
real
dim1 =
np
.
dot
(
W
[:,
0
]
.
T
,
f
)
/
math
.
sqrt
(
S
[
0
]) # extract the 1st principal component
dim2 =
np
.
dot
(
W
[:,
1
]
.
T
,
f
)
/
math
.
sqrt
(
S
[
1
]) # extract the 2nd principal component
fig
, ax =
plt
.
subplots
(
figsize
=
(
10
, 10
))
ax
.
scatter
(
dim1
,
dim2
)
for
i in
range
(
len
(
countries
)):
ax
.
annotate
(
countries
[
i
], (
dim1
[
i
], dim2
[
i
]))
plt
.
title
(
"Q4PartA"
)
plt
.
show
()
In [2]:
import
numpy as
np
import
math
import
matplotlib.pyplot as
plt
import
scipy.io as
spio
import
scipy.sparse.linalg as
ll
import
sklearn.preprocessing as
skpp
import
pandas as
pd
#PartB
food =
pd
.
read_csv
(
"data/food-consumption.csv"
)
f =
(
np
.
array
(
food
[
food
.
columns
[
1
:]]))
food_items =
food
.
columns
[
1
:]
# print(food)
#indicator matrix
# print(Inew)
m
,
n =
f
.
shape
# print(f.T)
#PCA
mu =
np
.
mean
(
f
,
axis =
1
,
keepdims =
True
)
f =
f -
mu
C =
np
.
dot
(
f
,
f
.
T
)
/
m
K
=
2
#Find k eigenvalues and eigenvectors of the square matrix A.
S
,
W =
ll
.
eigs
(
C
,
k =
K
)
S =
S
.
real
W =
W
.
real
dim1 =
np
.
dot
(
W
[:,
0
]
.
T
,
f
)
/
math
.
sqrt
(
S
[
0
]) # extract the 1st principal component
dim2 =
np
.
dot
(
W
[:,
1
]
.
T
,
f
)
/
math
.
sqrt
(
S
[
1
]) # extract the 2nd principal component
fig
, ax =
plt
.
subplots
(
figsize
=
(
10
, 10
))
ax
.
scatter
(
dim1
,
dim2
)
for
i in
range
(
len
(
food_items
)):
ax
.
annotate
(
food_items
[
i
], (
dim1
[
i
], dim2
[
i
]))
plt
.
title
(
"Q4PartB"
)
plt
.
show
()
In [3]:
#Part A
import
numpy as
np
from
matplotlib import
pyplot as
plt
from
sklearn.metrics import
pairwise_distances
from
matplotlib.offsetbox import
OffsetImage
, AnnotationBbox
import
scipy.io
import
pandas as
pd
import
networkx as
nx
from
scipy.spatial.distance import
cdist
#3A.Visualize the nearest neighbor graph
#no epsilon adjustment
images =
scipy
.
io
.
loadmat
(
"data/isomap.mat"
)[
"images"
]
.
T
#print(x["images"].shape)
m
,
n =
images
.
shape
#l2 norm
dist =
cdist
(
images
,
images
,
metric =
"euclidean"
)
plt
.
imshow
(
dist
,)
plt
.
colorbar
()
plt
.
title
(
"No Epsilon Adjustment"
)
plt
.
show
()
In [ ]:
In [4]:
#Part A,B
#Implement the ISOMAP algorithm
import
scipy.io as
spio
import
sklearn.utils.graph_shortest_path as
sp
import
matplotlib.pyplot as
plt
from
matplotlib.offsetbox import
OffsetImage
, AnnotationBbox
import
random as
rand from
scipy.spatial.distance import
cdist
from
sklearn.utils.graph_shortest_path import
graph_shortest_path
images =
scipy
.
io
.
loadmat
(
"data/isomap.mat"
)[
"images"
]
.
T
m
,
n =
images
.
shape
#Creating weighted Adjacency Matrix A =
np
.
zeros
(
shape =
(
m
,
m
))
dist =
cdist
(
images
,
images
,
metric =
"euclidean"
)
for
i in
range
(
m
):
# find the threshold (epsilon) threshold =
np
.
partition
(
dist
[
i
],
101
)[
100
] #print(threshold)
A_ij =
dist <
threshold
A
[
A_ij
] =
dist
[
A_ij
]
#Visualize the matrix
plt
.
imshow
(
A
,
cmap =
'YlGnBu'
)
plt
.
title
(
"Eplison at {}"
.
format
(
threshold
))
plt
.
colorbar
()
plt
.
show
()
#shortest path distance matrix
D =
graph_shortest_path
(
A
)
#Compute Centering matrix, H = I-(1/m)11.T, C = (-1/2)HD^2H
H =
np
.
eye
(
m
) -
np
.
ones
((
m
,
m
))
/
m
C =
np
.
matmul
(
H
,
D
*
D
)
C =
np
.
matmul
(
C
,
H
)
C =
-
C
/
2
C =
(
C +
C
.
T
)
/
2
#Find eigenvalues and eigenvectors S
,
W =
ll
.
eigs
(
C
,
k =
2
)
S =
S
.
real
W =
W
.
real
dim1 =
W
[:,
0
]
*
math
.
sqrt
(
S
[
0
]) # extract the 1st principal component
dim2 =
W
[:,
1
]
*
math
.
sqrt
(
S
[
1
]) # extract the 2nd principal component
#Graph ISOMAP fig
,
ax fig
, ax =
plt
.
subplots
(
figsize
=
(
10
,
10
))
ax
.
scatter
(
dim1
,
dim2
)
# Add photos
sample =
rand
.
sample
(
range
(
m
), 40
)
for
i in
sample
:
img =
images
[
i
,:]
.
reshape
(
64
, 64
)
.
T
ab =
AnnotationBbox
(
OffsetImage
(
img
, cmap =
'gray_r'
,
zoom
=
0.5
,), (
dim1
[
i
],
dim2
[
i
]) ,
pad
=
0.1
)
ax
.
add_artist
(
ab
)
ax
.
scatter
(
dim1
,
dim2
) In [5]:
#Part C
import
scipy.io as
spio
import
sklearn.utils.graph_shortest_path as
sp
import
matplotlib.pyplot as
plt
from
matplotlib.offsetbox import
OffsetImage
, AnnotationBbox
import
random as
rand from
scipy.spatial.distance import
cdist
from
sklearn.utils.graph_shortest_path import
graph_shortest_path
#perform PCA
img =
images
.
T
mu =
np
.
mean
(
img
,
axis =
1
,
keepdims =
True
)
img =
img -
mu
C =
np
.
dot
(
img
,
img
.
T
)
/
m
K
=
2
#Find k eigenvalues and eigenvectors
S
,
W =
ll
.
eigs
(
C
,
k =
K
)
S =
S
.
real
W =
W
.
real
dim1 =
np
.
dot
(
W
[:,
0
]
.
T
,
img
)
/
math
.
sqrt
(
S
[
0
]) # extract the 1st principal component
dim2 =
np
.
dot
(
W
[:,
1
]
.
T
,
img
)
/
math
.
sqrt
(
S
[
1
]) # extract the 2nd principal component
#Graph
fig
,
ax fig
, ax =
plt
.
subplots
(
figsize
=
(
10
,
10
))
ax
.
scatter
(
dim1
,
dim2
)
# Add photos
sample =
rand
.
sample
(
range
(
m
), 40
)
for
i in
sample
:
img =
images
[
i
,:]
.
reshape
(
64
, 64
)
.
T
ab =
AnnotationBbox
(
OffsetImage
(
img
, cmap =
'gray_r'
,
zoom
=
0.6
,), (
dim1
[
i
],
dim2
[
i
]) ,
pad
=
0.1
)
ax
.
add_artist
(
ab
)
ax
.
scatter
(
dim1
,
dim2
) In [6]:
#Part A
import
numpy as
np
from
scipy.linalg import
svd
import
math
import
matplotlib.pyplot as
plt
import
matplotlib.image as
mpl_img
from
PIL import
Image
#Read in data for part A
files =
[
"data/yalefaces/subject01.glasses.gif"
,
"data/yalefaces/subject01.happy.gif"
,
"data/yalefaces/subject01.leftlight.gif"
,
"data/yalefaces/subject01.noglasses.gif"
,
"data/yalefaces/subject01.normal.gif"
,
"data/yalefaces/subject01.rightlight.gif"
,
"data/yalefaces/subject01.sad.gif"
,
"data/yalefaces/subject01.sleepy.gif"
,
"data/yalefaces/subject01.surprised.gif"
,
"data/yalefaces/subject01.wink.gif"
,
"data/yalefaces/subject02.glasses.gif"
,
"data/yalefaces/subject02.happy.gif"
,
"data/yalefaces/subject02.leftlight.gif"
,
"data/yalefaces/subject02.noglasses.gif"
,
"data/yalefaces/subject02.normal.gif"
,
"data/yalefaces/subject02.rightlight.gif"
,
"data/yalefaces/subject02.sad.gif"
,
"data/yalefaces/subject02.sleepy.gif"
,
"data/yalefaces/subject02.wink.gif"
]
S1 =
[]
S2 =
[]
S1_D =
[]
S2_D =
[]
for
file_name in
files
: image =
plt
.
imread
(
file_name
)
if
'1' in
file_name
:
S1
.
append
(
image
)
else
:
S2
.
append
(
image
)
def
downsample
(
image
):
m =
image
.
shape
[
0
]
n =
image
.
shape
[
1
]
width =
n //
4
height =
m //
4
downsample =
np
.
zeros
(
shape
=
(
height
, width
), dtype
=
np
.
uint8
)
for
i in
range
(
0
, m
-
4
, 4
):
for
j in
range
(
0
, n
-
4
, 4
):
downsample
[
i
//
4
, j
//
4
] =
np
.
mean
(
image
[
i
:
i +
4
, j
:
j
+
4
], axis
=
(
0
, 1
))
downsample =
downsample
.
astype
(
np
.
uint8
)
return
downsample
for
image in
S1
:
down_image1 =
downsample
(
image
)
down_image1 =
down_image1
.
flatten
()
S1_D
.
append
(
down_image1
)
for
image in
S2
:
down_image2 =
downsample
(
image
)
down_image2 =
down_image2
.
flatten
()
S2_D
.
append
(
down_image2
)
S1_D =
np
.
array
(
S1_D
)
S2_D =
np
.
array
(
S2_D
)
m1
,
n1 =
S1_D
.
shape
m2
,
n2 =
S2_D
.
shape
X1 =
S1_D
.
T X2 =
S2_D
.
T
mu1 =
np
.
mean
(
X1
,
axis =
1
, keepdims =
True
)
mu2 =
np
.
mean
(
X2
,
axis =
1
, keepdims =
True
)
X1 =
X1 -
mu1
X2 =
X2 -
mu2
U1
,
S1
,
V1 =
svd
(
X1
)
U2
,
S2
,
V2 =
svd
(
X2
)
#top 6 eigenfaces
W1 =
U1
[:,
0
:
6
]
W2 =
U2
[:,
0
:
6
]
def
pltfaces
(
W
,
sub
):
k =
W
.
shape
[
1
]
fig
,
axs =
plt
.
subplots
(
1
,
k
, figsize
=
(
14
,
2
),
facecolor =
'w'
,
edgecolor =
'k'
)
axs =
axs
.
ravel
()
for
i in
range
(
k
):
image =
W
[:,
i
]
.
reshape
((
60
,
80
))
axs
[
i
]
.
imshow
(
image
,
cmap =
'gray'
)
plt
.
title
(
label =
"Eigenface for Subject "
+
sub
,
fontsize =
20
,
loc =
"right"
) plt
.
show
()
pltfaces
(
W1
,
str
(
1
))
pltfaces
(
W2
,
str
(
2
))
In [7]:
#Part B
S1_DT =
downsample
(
plt
.
imread
(
"data/yalefaces/subject01-test.gif"
))
.
flatten
()
S2_DT =
downsample
(
plt
.
imread
(
"data/yalefaces/subject02-test.gif"
))
.
flatten
()
S1_DT =
np
.
array
(
S1_DT
)
S2_DT =
np
.
array
(
S2_DT
)
# Mean Center
S1_DT =
(
S1_DT -
mu1
)
S2_DT =
(
S2_DT -
mu2
)
# Compute Residuals
s11 =
np
.
linalg
.
norm
(
S1_DT -
(
np
.
dot
(
W1
,
W1
.
T
))
@S1_DT
)
**
2
s12 =
np
.
linalg
.
norm
(
S2_DT -
(
np
.
dot
(
W1
,
W1
.
T
))
@S2_DT
)
**
2
s21 =
np
.
linalg
.
norm
(
S1_DT -
(
np
.
dot
(
W2
,
W2
.
T
))
@S1_DT
)
**
2
s22 =
np
.
linalg
.
norm
(
S2_DT -
(
np
.
dot
(
W2
,
W2
.
T
))
@S2_DT
)
**
2
In [8]:
print
(
"Projected Residual"
)
print
(
"S11:"
,
s11
)
print
(
"S21:"
,
s21
)
print
(
"S12:"
,
s12
)
print
(
"S22:"
,
s22
)
In [ ]:
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help