3. Classification with DBSCAN of generated data Let's start by generating a dataset of 500 observations in 2D space. We will use a function built into scikit-learn to produce circular point clouds. import matplotlib.pyplot as plt import numpy as np import sklearn.datasets from sklearn.utils import shuffle #Let's generate a scatter plot composed of two circles The cloud contains 500 observations ('n_samples') noisy by a Gaussian noise of standard deviation 0.1 ('noise'). The ratio between the radius of the small circle and the large circle #is 0.3 (factor'). data, labels sklearn.datasets.make_circles (n_samples-500, noise-0.1, factor-0.3, random_state=0) print (data.shape) Random permutation of the rows of the matrix (we mix the observations) data, labels = shuffle (data, labels) Point cloud display plt.scatter (data[:,0], data[:,11, c-labels) plt.show() Question 1: How many groups does this dataset have? Question 2: Perform a clustering of this dataset using k-means. What can we expect? What do you notice? 2 Since the two circles are separated by an area with no data, a density-based method seems appropriate. We can create a clustering model using DBSCAN by importing it from scikit- learn: from sklearn.cluster import DBSCAN db DBSCAN () The constructor arguments of DBSCAN are as follows: eps: the dimension of the neighborhood, i.e. the maximum distance between two observations allowing them to be considered as neighbors of each other, min_samples: the minimum number of neighbors that a central point must have, ⚫ metric: the distance to consider (by default, the Euclidean distance is used). You can call the following methods: ■ .fit (X): performs an automatic classification using the DBSCAN method on the observation matrix X. The results are stored in the .labels_attribute. .fit_predict (X): same as .fit (X) but returns group labels directly. The following attributes are available after calling the .fit() method: core_sample_indices_: the indices of the core points. labels: the group numbers of the points in the observation matrix.

icon
Related questions
Question

Can you please do questions 1 and 2? Thank you

3. Classification with DBSCAN of generated data
Let's start by generating a dataset of 500 observations in 2D space. We will use a function
built into scikit-learn to produce circular point clouds.
import matplotlib.pyplot as plt
import numpy as np
import sklearn.datasets
from sklearn.utils import shuffle
#Let's generate a scatter plot composed of two circles
The cloud contains 500 observations ('n_samples') noisy by
a Gaussian noise of standard deviation 0.1 ('noise').
The ratio between the radius of the small circle and the large circle
#is 0.3 (factor').
data, labels sklearn.datasets.make_circles (n_samples-500, noise-0.1,
factor-0.3, random_state=0)
print (data.shape)
Random permutation of the rows of the matrix (we mix the observations)
data, labels = shuffle (data, labels)
Point cloud display
plt.scatter (data[:,0], data[:,11, c-labels)
plt.show()
Question 1: How many groups does this dataset have?
Question 2: Perform a clustering of this dataset using k-means. What can we expect? What
do you notice?
2
Since the two circles are separated by an area with no data, a density-based method seems
appropriate. We can create a clustering model using DBSCAN by importing it from scikit-
learn:
from sklearn.cluster import DBSCAN
db
DBSCAN ()
The constructor arguments of DBSCAN are as follows:
eps: the dimension of the neighborhood, i.e. the maximum distance between two
observations allowing them to be considered as neighbors of each other,
min_samples: the minimum number of neighbors that a central point must have,
⚫ metric: the distance to consider (by default, the Euclidean distance is used).
You can call the following methods:
■ .fit (X): performs an automatic classification using the DBSCAN method on the
observation matrix X. The results are stored in the .labels_attribute.
.fit_predict (X): same as .fit (X) but returns group labels directly.
The following attributes are available after calling the .fit() method:
core_sample_indices_: the indices of the core points.
labels: the group numbers of the points in the observation matrix.
Transcribed Image Text:3. Classification with DBSCAN of generated data Let's start by generating a dataset of 500 observations in 2D space. We will use a function built into scikit-learn to produce circular point clouds. import matplotlib.pyplot as plt import numpy as np import sklearn.datasets from sklearn.utils import shuffle #Let's generate a scatter plot composed of two circles The cloud contains 500 observations ('n_samples') noisy by a Gaussian noise of standard deviation 0.1 ('noise'). The ratio between the radius of the small circle and the large circle #is 0.3 (factor'). data, labels sklearn.datasets.make_circles (n_samples-500, noise-0.1, factor-0.3, random_state=0) print (data.shape) Random permutation of the rows of the matrix (we mix the observations) data, labels = shuffle (data, labels) Point cloud display plt.scatter (data[:,0], data[:,11, c-labels) plt.show() Question 1: How many groups does this dataset have? Question 2: Perform a clustering of this dataset using k-means. What can we expect? What do you notice? 2 Since the two circles are separated by an area with no data, a density-based method seems appropriate. We can create a clustering model using DBSCAN by importing it from scikit- learn: from sklearn.cluster import DBSCAN db DBSCAN () The constructor arguments of DBSCAN are as follows: eps: the dimension of the neighborhood, i.e. the maximum distance between two observations allowing them to be considered as neighbors of each other, min_samples: the minimum number of neighbors that a central point must have, ⚫ metric: the distance to consider (by default, the Euclidean distance is used). You can call the following methods: ■ .fit (X): performs an automatic classification using the DBSCAN method on the observation matrix X. The results are stored in the .labels_attribute. .fit_predict (X): same as .fit (X) but returns group labels directly. The following attributes are available after calling the .fit() method: core_sample_indices_: the indices of the core points. labels: the group numbers of the points in the observation matrix.
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer