We will create this vocabulary by randomly selecting tens or hundreds of thousands of local features from our training set and clustering them with k-means. The number of k-means clusters represents the size of our vocabulary and features. For example, you could begin by clustering a large number of SIFT descriptors into k=50 clusters. This divides the 128-dimensional continuous SIFT feature space into 50 regions. As long as we keep the centroids of our original clusters, we can figure out which region any new SIFT feature belongs to. Our visual word vocabulary is made up of centroids. Work with Historgrams. We will densely sample many SIFT descriptors for each image. Rather than storing hundreds of SIFT descriptors, we simply count the number of SIFT descriptors that fall into each cluster in our visual word vocabulary. This is accomplished by locating the nearest neighbor k-means centroid for each SIFT feature. Thus, if we have a visual vocabulary of 50 words and detect 200 distinct SIFT features in an image, our bag of SIFT representation will be a histogram of 50 dimensions, with each bin counting how many times a SIFT descriptor was assigned to that cluster. The sum of all bin-counts is 200. The histogram should be normalized so that the magnitude of the bag of features does not change dramatically with image size. Instead of using SIFT to detect invariant keypoints, which takes time, you should densely sample keypoints in a grid with a specific step size (sampling density) and scale. Because the bag of SIFT representation has many design decisions and free parameters (number of clusters, sampling density, sampling scales, SIFT parameters, etc.), accuracy may vary. Use KMeans in Sklearn to do clustering and find the nearest cluster centroid for each SIFT feature; Use cv2.xfeatures2d.SIFT_create() to create a SIFT object; Use cv2.Keypoint() to generate key points; Use sift.compute() to compute SIFT descriptors given densely sampled keypoints. Be mindful of RAM usage. Try to make the code more memory efficient, otherwise it could easily exceed RAM limits in Colab, at which point your session will crash. If your RAM is going to run out of space, use gc.collect() for the garbage collector to collect unused objects in memory to free some space. Store data or features as NumPy arrays instead of lists. Computation on NumPy arrays is much more efficient than lists. CODE from sklearn import neighbors np.random.seed(56) ##########--WRITE YOUR CODE HERE--########## # The following steps are just for your reference # You can write in your own way # # # densely sample keypoints # def sample_kp(shape, stride, size): # return kp # # # extract vocabulary of SIFT features # def extract_vocabulary(raw_data, key_point): # return vocabulary # # # extract Bag of SIFT Representation of images # def extract_feat(raw_data, vocabulary, key_point): # return feat # # # sample dense keypoints # skp = sample_kp((train_data[0].shape[0],train_data[0].shape[1]),(64,64), 8) # vocabulary = extract_vocabulary(train_data, skp) # train_feat = extract_feat(train_data, vocabulary, skp) # test_feat = extract_feat(test_data, vocabulary, skp) train_feat = test_feat = ##########-------END OF CODE-------########## # this block should generate # train_feat and test_feat corresponding to train_data and test_data
We will create this vocabulary by randomly selecting tens or hundreds of thousands of local features from our training set and clustering them with k-means. The number of k-means clusters represents the size of our vocabulary and features. For example, you could begin by clustering a large number of SIFT descriptors into k=50 clusters. This divides the 128-dimensional continuous SIFT feature space into 50 regions. As long as we keep the centroids of our original clusters, we can figure out which region any new SIFT feature belongs to. Our visual word vocabulary is made up of centroids.
Work with Historgrams. We will densely sample many SIFT descriptors for each image. Rather than storing hundreds of SIFT descriptors, we simply count the number of SIFT descriptors that fall into each cluster in our visual word vocabulary. This is accomplished by locating the nearest neighbor k-means centroid for each SIFT feature. Thus, if we have a visual vocabulary of 50 words and detect 200 distinct SIFT features in an image, our bag of SIFT representation will be a histogram of 50 dimensions, with each bin counting how many times a SIFT descriptor was assigned to that cluster. The sum of all bin-counts is 200. The histogram should be normalized so that the magnitude of the bag of features does not change dramatically with image size.
Instead of using SIFT to detect invariant keypoints, which takes time, you should densely sample keypoints in a grid with a specific step size (sampling density) and scale.
Because the bag of SIFT representation has many design decisions and free parameters (number of clusters, sampling density, sampling scales, SIFT parameters, etc.), accuracy may vary.
-
Use KMeans in Sklearn to do clustering and find the nearest cluster centroid for each SIFT feature;
-
Use cv2.xfeatures2d.SIFT_create() to create a SIFT object;
-
Use cv2.Keypoint() to generate key points;
-
Use sift.compute() to compute SIFT descriptors given densely sampled keypoints.
-
Be mindful of RAM usage. Try to make the code more memory efficient, otherwise it could easily exceed RAM limits in Colab, at which point your session will crash.
-
If your RAM is going to run out of space, use gc.collect() for the garbage collector to collect unused objects in memory to free some space.
-
Store data or features as NumPy arrays instead of lists. Computation on NumPy arrays is much more efficient than lists.
CODE
Trending now
This is a popular solution!
Step by step
Solved in 2 steps