Okafor_Odera.HW2_report

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6740

Subject

Mathematics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by ElderJaguarPerson412

1.Conceptual Questions - photo 2 PCA Food Consuption in European Countries The dataset represent food consumption. Each rows is a country and each column represents a specific food.Each cell in every column represents a specific food item to each country row.In the above scatterplot you can see that PCA was performed by treating each country's food consumption as "feature".You can see cluster patterns with Denmark,Swedwn,Norway,Finland,Sweden are clustered together.Spain,Italy,Portugul are also clustered together.Theses countries may have the same dietary consumption patterns. The dataset treated country consumptions as feature vectors and performs PCA on them.In the scatterplot view the data is roughly clustered into two-three larger groupings.The one side seems to represent fresher food and the other side seems to represent frozen foods or refrigerated foods with the exception of potatoes.The middle clusters seems to group non-perishable with outlier for garlic and olive oil.I think the outliers are tied to consumption of garlic in differnt regions.Garlic and olive oil may be consumed alot in some countires and not nearly as much on others. 3 Order of faces using ISOMAP In this question we will approach nonlinear dimensionality reduction using ISOMAP.Our goal is to visualize the image data in a low-dimensional space and build insite from the mappings.Since the dataset is non- linear PCA and MDS cannot be used until reduced.The ISOMAP proccess requires three name steps.First we would create the Adjacency matrix and Distance Matrix.Then we would create a centering matrix that will modify our distance matrix.From there we could create a kernal matrix from our centered matrix and distance matrix. Our last step would use eigendecomposition to find our eignvalues and eignvectors from our kernal matrix. A.)The visualization of the weighted adjacency matrix is shown above.After creating and zero matrix and implementing the Euclidean distance between each point.The adjacency matrix is created by setting a threshold for each face so that it connects to at least the desired nearest neighbors.The threshold use 13.9 B.)The photos are cluster by face position.The scatterplot represents an natural change of direction.Moving across the horizantal axis, the images show headmovment from left to right.Moving acrosss the vertical axis,it shows head movement looking up and down.The distinguishes are clearer towards the edge of the scatterplot. Perform PCA (you can now use your implementation written in Question 1) on the images and project them into the top 2 principal components. Again show them on a scatter plot. Explain whether or you see a more meaningful projection using ISOMAP than PCA. C.)Comparing the two plots,PCA show similar results but has more outliers on its plot.ISOMAP provides more meaningful projections.The outliers show faces with differnt directions facing each other.PCA looks like an arrangement by saturated color and doesnt correctly represent the geodesic distance between images. 1. Eigenfaces and Simple Face Recognition The above plots show the top 6 eigenvalues.The images are from left to right.It seems like some of the eigenfaces have some correlation to one another and some do not.The first two eigenfaces look very similar with lighting on the right vs left side for Subject 1.For Subject 2 the first two eigenfaces look the same with different levels of saturation. Larger eigenvalues have more recognizable features and lower eigenvalues create an image with more distortion and become visibly less pleasing. Projected Residual S11: 106442692264.84364 S21: 151902624321.72995 S12: 176443617103.59607 S22: 228289385704.42532 The projection residual score measures the similarity between the two vectors.The cosine angle of the two vectors are check for directional similarities.We can compare perfomace by reviewing these scores.Since S22 is great than S21 we can confirm that it belong to subject 2.Therefore since S12 is larger than S11,we can assume that it belongs to subject 1. Part C The results seems higher than i was expecting.I think the accuracy could have been improved.Using more images or more eignfaces or even more subject could have help.More test data could have help or maybe a different scoring method. 5.To subtract or not to subtract, that is the question -photo In [1]: import numpy as np import math import matplotlib.pyplot as plt import scipy.io as spio import scipy.sparse.linalg as ll import sklearn.preprocessing as skpp import pandas as pd #Part A #Read in the data food = pd . read_csv ( "data/food-consumption.csv" ) f = np . array ( food [ food . columns [ 1 :]]) countries = food [ "Country" ] # print(food) #indicator matrix # print(Inew) m , n = f . shape #normalize the data # print(f.T) f = f . T #PCA mu = np . mean ( f , axis = 1 , keepdims = True ) f = f - mu C = np . dot ( f , f . T ) / m K = 2 #Find k eigenvalues and eigenvectors of the square matrix A. S , W = ll . eigs ( C , k = K ) S = S . real W = W . real dim1 = np . dot ( W [:, 0 ] . T , f ) / math . sqrt ( S [ 0 ]) # extract the 1st principal component dim2 = np . dot ( W [:, 1 ] . T , f ) / math . sqrt ( S [ 1 ]) # extract the 2nd principal component fig , ax = plt . subplots ( figsize = ( 10 , 10 )) ax . scatter ( dim1 , dim2 ) for i in range ( len ( countries )): ax . annotate ( countries [ i ], ( dim1 [ i ], dim2 [ i ])) plt . title ( "Q4PartA" ) plt . show () In [2]: import numpy as np import math import matplotlib.pyplot as plt import scipy.io as spio import scipy.sparse.linalg as ll import sklearn.preprocessing as skpp import pandas as pd #PartB food = pd . read_csv ( "data/food-consumption.csv" ) f = ( np . array ( food [ food . columns [ 1 :]])) food_items = food . columns [ 1 :] # print(food) #indicator matrix # print(Inew) m , n = f . shape # print(f.T) #PCA mu = np . mean ( f , axis = 1 , keepdims = True ) f = f - mu C = np . dot ( f , f . T ) / m K = 2 #Find k eigenvalues and eigenvectors of the square matrix A. S , W = ll . eigs ( C , k = K ) S = S . real W = W . real dim1 = np . dot ( W [:, 0 ] . T , f ) / math . sqrt ( S [ 0 ]) # extract the 1st principal component dim2 = np . dot ( W [:, 1 ] . T , f ) / math . sqrt ( S [ 1 ]) # extract the 2nd principal component fig , ax = plt . subplots ( figsize = ( 10 , 10 )) ax . scatter ( dim1 , dim2 ) for i in range ( len ( food_items )): ax . annotate ( food_items [ i ], ( dim1 [ i ], dim2 [ i ])) plt . title ( "Q4PartB" ) plt . show () In [3]: #Part A import numpy as np from matplotlib import pyplot as plt from sklearn.metrics import pairwise_distances from matplotlib.offsetbox import OffsetImage , AnnotationBbox import scipy.io import pandas as pd import networkx as nx from scipy.spatial.distance import cdist #3A.Visualize the nearest neighbor graph #no epsilon adjustment images = scipy . io . loadmat ( "data/isomap.mat" )[ "images" ] . T #print(x["images"].shape) m , n = images . shape #l2 norm dist = cdist ( images , images , metric = "euclidean" ) plt . imshow ( dist ,) plt . colorbar () plt . title ( "No Epsilon Adjustment" ) plt . show () In [ ]: In [4]: #Part A,B #Implement the ISOMAP algorithm import scipy.io as spio import sklearn.utils.graph_shortest_path as sp import matplotlib.pyplot as plt from matplotlib.offsetbox import OffsetImage , AnnotationBbox import random as rand from scipy.spatial.distance import cdist from sklearn.utils.graph_shortest_path import graph_shortest_path images = scipy . io . loadmat ( "data/isomap.mat" )[ "images" ] . T m , n = images . shape #Creating weighted Adjacency Matrix A = np . zeros ( shape = ( m , m )) dist = cdist ( images , images , metric = "euclidean" ) for i in range ( m ): # find the threshold (epsilon) threshold = np . partition ( dist [ i ], 101 )[ 100 ] #print(threshold) A_ij = dist < threshold A [ A_ij ] = dist [ A_ij ] #Visualize the matrix plt . imshow ( A , cmap = 'YlGnBu' ) plt . title ( "Eplison at {}" . format ( threshold )) plt . colorbar () plt . show () #shortest path distance matrix D = graph_shortest_path ( A ) #Compute Centering matrix, H = I-(1/m)11.T, C = (-1/2)HD^2H H = np . eye ( m ) - np . ones (( m , m )) / m C = np . matmul ( H , D * D ) C = np . matmul ( C , H ) C = - C / 2 C = ( C + C . T ) / 2 #Find eigenvalues and eigenvectors S , W = ll . eigs ( C , k = 2 ) S = S . real W = W . real dim1 = W [:, 0 ] * math . sqrt ( S [ 0 ]) # extract the 1st principal component dim2 = W [:, 1 ] * math . sqrt ( S [ 1 ]) # extract the 2nd principal component #Graph ISOMAP fig , ax fig , ax = plt . subplots ( figsize = ( 10 , 10 )) ax . scatter ( dim1 , dim2 ) # Add photos sample = rand . sample ( range ( m ), 40 ) for i in sample : img = images [ i ,:] . reshape ( 64 , 64 ) . T ab = AnnotationBbox ( OffsetImage ( img , cmap = 'gray_r' , zoom = 0.5 ,), ( dim1 [ i ], dim2 [ i ]) , pad = 0.1 ) ax . add_artist ( ab ) ax . scatter ( dim1 , dim2 ) In [5]: #Part C import scipy.io as spio import sklearn.utils.graph_shortest_path as sp import matplotlib.pyplot as plt from matplotlib.offsetbox import OffsetImage , AnnotationBbox import random as rand from scipy.spatial.distance import cdist from sklearn.utils.graph_shortest_path import graph_shortest_path #perform PCA img = images . T mu = np . mean ( img , axis = 1 , keepdims = True ) img = img - mu C = np . dot ( img , img . T ) / m K = 2 #Find k eigenvalues and eigenvectors S , W = ll . eigs ( C , k = K ) S = S . real W = W . real dim1 = np . dot ( W [:, 0 ] . T , img ) / math . sqrt ( S [ 0 ]) # extract the 1st principal component dim2 = np . dot ( W [:, 1 ] . T , img ) / math . sqrt ( S [ 1 ]) # extract the 2nd principal component #Graph fig , ax fig , ax = plt . subplots ( figsize = ( 10 , 10 )) ax . scatter ( dim1 , dim2 ) # Add photos sample = rand . sample ( range ( m ), 40 ) for i in sample : img = images [ i ,:] . reshape ( 64 , 64 ) . T ab = AnnotationBbox ( OffsetImage ( img , cmap = 'gray_r' , zoom = 0.6 ,), ( dim1 [ i ], dim2 [ i ]) , pad = 0.1 ) ax . add_artist ( ab ) ax . scatter ( dim1 , dim2 ) In [6]: #Part A import numpy as np from scipy.linalg import svd import math import matplotlib.pyplot as plt import matplotlib.image as mpl_img from PIL import Image #Read in data for part A files = [ "data/yalefaces/subject01.glasses.gif" , "data/yalefaces/subject01.happy.gif" , "data/yalefaces/subject01.leftlight.gif" , "data/yalefaces/subject01.noglasses.gif" , "data/yalefaces/subject01.normal.gif" , "data/yalefaces/subject01.rightlight.gif" , "data/yalefaces/subject01.sad.gif" , "data/yalefaces/subject01.sleepy.gif" , "data/yalefaces/subject01.surprised.gif" , "data/yalefaces/subject01.wink.gif" , "data/yalefaces/subject02.glasses.gif" , "data/yalefaces/subject02.happy.gif" , "data/yalefaces/subject02.leftlight.gif" , "data/yalefaces/subject02.noglasses.gif" , "data/yalefaces/subject02.normal.gif" , "data/yalefaces/subject02.rightlight.gif" , "data/yalefaces/subject02.sad.gif" , "data/yalefaces/subject02.sleepy.gif" , "data/yalefaces/subject02.wink.gif" ] S1 = [] S2 = [] S1_D = [] S2_D = [] for file_name in files : image = plt . imread ( file_name ) if '1' in file_name : S1 . append ( image ) else : S2 . append ( image ) def downsample ( image ): m = image . shape [ 0 ] n = image . shape [ 1 ] width = n // 4 height = m // 4 downsample = np . zeros ( shape = ( height , width ), dtype = np . uint8 ) for i in range ( 0 , m - 4 , 4 ): for j in range ( 0 , n - 4 , 4 ): downsample [ i // 4 , j // 4 ] = np . mean ( image [ i : i + 4 , j : j + 4 ], axis = ( 0 , 1 )) downsample = downsample . astype ( np . uint8 ) return downsample for image in S1 : down_image1 = downsample ( image ) down_image1 = down_image1 . flatten () S1_D . append ( down_image1 ) for image in S2 : down_image2 = downsample ( image ) down_image2 = down_image2 . flatten () S2_D . append ( down_image2 ) S1_D = np . array ( S1_D ) S2_D = np . array ( S2_D ) m1 , n1 = S1_D . shape m2 , n2 = S2_D . shape X1 = S1_D . T X2 = S2_D . T mu1 = np . mean ( X1 , axis = 1 , keepdims = True ) mu2 = np . mean ( X2 , axis = 1 , keepdims = True ) X1 = X1 - mu1 X2 = X2 - mu2 U1 , S1 , V1 = svd ( X1 ) U2 , S2 , V2 = svd ( X2 ) #top 6 eigenfaces W1 = U1 [:, 0 : 6 ] W2 = U2 [:, 0 : 6 ] def pltfaces ( W , sub ): k = W . shape [ 1 ] fig , axs = plt . subplots ( 1 , k , figsize = ( 14 , 2 ), facecolor = 'w' , edgecolor = 'k' ) axs = axs . ravel () for i in range ( k ): image = W [:, i ] . reshape (( 60 , 80 )) axs [ i ] . imshow ( image , cmap = 'gray' ) plt . title ( label = "Eigenface for Subject " + sub , fontsize = 20 , loc = "right" ) plt . show () pltfaces ( W1 , str ( 1 )) pltfaces ( W2 , str ( 2 )) In [7]: #Part B S1_DT = downsample ( plt . imread ( "data/yalefaces/subject01-test.gif" )) . flatten () S2_DT = downsample ( plt . imread ( "data/yalefaces/subject02-test.gif" )) . flatten () S1_DT = np . array ( S1_DT ) S2_DT = np . array ( S2_DT ) # Mean Center S1_DT = ( S1_DT - mu1 ) S2_DT = ( S2_DT - mu2 ) # Compute Residuals s11 = np . linalg . norm ( S1_DT - ( np . dot ( W1 , W1 . T )) @S1_DT ) ** 2 s12 = np . linalg . norm ( S2_DT - ( np . dot ( W1 , W1 . T )) @S2_DT ) ** 2 s21 = np . linalg . norm ( S1_DT - ( np . dot ( W2 , W2 . T )) @S1_DT ) ** 2 s22 = np . linalg . norm ( S2_DT - ( np . dot ( W2 , W2 . T )) @S2_DT ) ** 2 In [8]: print ( "Projected Residual" ) print ( "S11:" , s11 ) print ( "S21:" , s21 ) print ( "S12:" , s12 ) print ( "S22:" , s22 ) In [ ]:

Discover more documents: Sign up today!

Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Okafor_Odera.HW2_report

Related Documents