CSCI_5080_Assignment_07_1

docx

School

Austin Peay State University *

*We aren’t endorsed by this school

Course

5080

Subject

Computer Science

Date

Jan 9, 2024

Type

docx

Pages

14

Uploaded by JusticeGalaxyHorse20

Report
CSCI_5080_Assignment_07 Question 1 A 1(2, 10), A 2(2, 5), A 3(8, 4), B 1(5, 8), B 2(7, 5), B 3(6, 4), C 1(1, 2), C 2(4, 9). Using R to solve for the Euclidean distance > data_hmwk7 <- matrix(c(2,10,2,5,8,4,5,8,7,5,6,4,1,2,4,9), byrow = T, nrow = 8 ) > hmwk7<- dist(data_hmwk7, method = "euclidean") > hmwk7 1 2 3 4 5 6 7 2 5.000000 3 8.485281 6.082763 4 3.605551 4.242641 5.000000 5 7.071068 5.000000 1.414214 3.605551 6 7.211103 4.123106 2.000000 4.123106 1.414214 7 8.062258 3.162278 7.280110 7.211103 6.708204 5.385165 8 2.236068 4.472136 6.403124 1.414214 5.000000 5.385165 7.615773 Dataset A1(2, 10) B1(5, 8) C1(1, 2) Cluster Assignment A1 (2,10) 0 3.605551 8.06 1 A2 (2,5) 5 4.24 3.16 3 A3 (8,4) 8.485281 5 7.27 2 B1 (5,8) 3.605551 0 7.211103 2 B2 (7,5) 7.071068 3.605551 6.708204 2 B3 (6,4) 7.211103 4.123106 5.385165 2 C1 (1,2) 8.062258 7.211103 0 3 C2 (4,9) 2.236068 1.414214 7.615773 2 (a.) After the initial round of execution, the clustering result comprises three clusters: (1) consisting of {A1}, (2) including {B1, A3, B2, B3, C2}, and (3) containing {C1, A2}. The centers for these clusters are derived by averaging the x- values and y-values, resulting in coordinates (1) (2, 10), (2) (6, 6), and (3) (1.5, 3.5). (b.) 2nd session 1 CSCI_5080_Assignment_07
Dataset Cluster1(2, 10) Cluster2(6, 6) Cluster 3(1.5, 3.5) Cluster Assignment A1 (2,10) 0 5.66 6.52 1 A2 (2,5) 5 4.123 1.58 3 A3 (8,4) 8.485281 2.83 6.52 2 B1 (5,8) 3.605551 2.23 5.7 2 B2 (7,5) 7.071068 1.414 5.7 2 B3 (6,4) 7.211103 2 4.53 2 C1 (1,2) 8.062258 6.4 1.58 3 C2 (4,9) 2.236068 3.6056 6.04 1 (1) {A1, C2} (2) {A3, B1, B2, B3}, (3) {A2, C1} with centers (1) (3, 9.5), (2) (6.5, 5.25), (3) (1.5, 3.5) The Third Session Datase t Cluster_c1( 3, 9.5) Cluster_c2(6. 5, 5.25) Cluster_c 3(1.5, 3.5) Assignme nt of Cluster A1 (2,10) 1.118 6.543 6.52 1 A2 (2,5) 4.61 4.51 1.58 3 A3 (8,4) 7.433 1.95 6.52 2 B1 (5,8) 2.5 3.13 5.7 1 B2 (7,5) 6.02 0.56 5.7 2 B3 (6,4) 6.26 1.35 4.53 2 C1 (1,2) 7.76 6.388 1.58 3 C2 (4,9) 1.11 4.5 6.04 1 The final three clusters are (1) {A1, C2, B1}, (2) {A3, B2, B3}, (3) {C1, A2} Question 2: The Clustering-Based SVM (CB-SVM) is specifically crafted to address challenges posed by large datasets, where traditional Support Vector Machines (SVMs) may underperform when trained on the entire dataset. To mitigate this, CB-SVM employs a hierarchical micro-clustering algorithm, scanning the entire dataset only once to provide the SVM with high-quality samples containing statistical summaries. This approach maximizes the 2 CSCI_5080_Assignment_07
learning benefit for SVMs while maintaining scalability in terms of training efficiency. The core concept of CB-SVM involves utilizing a hierarchical micro-clustering technique, generating detailed descriptions close to the boundary and coarser descriptions farther away. The algorithm begins by constructing two micro-cluster trees from positive and negative training data. Each higher-level node in a tree serves as a summarized representation of its children nodes. Once the trees are constructed, CB-SVM initiates SVM training solely from the root nodes. After establishing the "rough" boundary, it selectively declusters only the data summary near the boundary into lower (finer) levels using the tree structure. This hierarchical representation of data summaries provides an effective foundation for CB-SVM to perform selective decluttering. The algorithm repeats this process until reaching the leaf level. CB-SVM proves valuable for analyzing extensive datasets, including streaming data or large data warehouses, where random sampling may hinder performance due to infrequently occurring important data or irregular patterns. The algorithm significantly reduces the total number of data points for SVM training while preserving the high quality of Support Vectors (SVs) that best describe the boundary. While traditional selective sampling requires scanning the entire dataset at each round, CB-SVM operates based on the CF tree, constructed in a single scan of the entire dataset. This tree carries statistical summaries that facilitate the efficient and effective construction of an SVM boundary. The CB-SVM algorithm can be outlined as follows: 1. Construct two CF trees independently from positive and negative datasets. 2. Train an SVM boundary function using centroids of root entries from the CF trees. If the root node contains too few entries, train from entries in the second levels. 3 CSCI_5080_Assignment_07
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. Decluster entries near the boundary into the next level, accumulating children entries declustered from parent entries into the training set with non-declustered parent entries. 4. Construct another SVM from centroids of entries in the training set and repeat from step 3 until no further accumulation occurs. Time spent: - 2hrs. 4 CSCI_5080_Assignment_07
PART 2 Lesson 3: 5 CSCI_5080_Assignment_07
Lesson 4 TM_Decision Tree Model: Decision Tree Tab 6 CSCI_5080_Assignment_07
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Dependency Network Tab: 7 CSCI_5080_Assignment_07
TM_Clustering Model: Cluster Diagram Tab Cluster Profiles Tab Cluster Characteristics Tab 8 CSCI_5080_Assignment_07
Cluster Discrimination Tab TM_Naïve Baiyes Model Dependency Network tab 9 CSCI_5080_Assignment_07
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Attribute Profile Tab: Attribute Characteristics Tab: 10 CSCI_5080_Assignment_07
Attribute Discrimination Tab: Time spent: 4 hours. 11 CSCI_5080_Assignment_07
APPENDIX R-code for Question 1 > data_hmwk7 <- matrix(c(2,10,2,5,8,4,5,8,7,5,6,4,1,2,4,9), byrow = T, nrow = 8 ) > hmwk7<- dist(data_hmwk7, method = "euclidean") > hmwk7 1 2 3 4 5 6 7 2 5.000000 3 8.485281 6.082763 4 3.605551 4.242641 5.000000 5 7.071068 5.000000 1.414214 3.605551 6 7.211103 4.123106 2.000000 4.123106 1.414214 7 8.062258 3.162278 7.280110 7.211103 6.708204 5.385165 8 2.236068 4.472136 6.403124 1.414214 5.000000 5.385165 7.615773 > # 2nd session > a=c(6,6) > a1=c(2,10) > a2=c(2,5) > a3=c(8,4) > a4=c(5,8) > a5=c(7,5) > a6=c(6,4) > a7=c(1,2) > a8=c(4,9) > b=c(1.5,3.5) > dist(rbind(a,a1), method="euclidean") a a1 5.656854 > dist(rbind(a,a2), method="euclidean") a a2 4.123106 > dist(rbind(a,a3), method="euclidean") a a3 2.828427 > dist(rbind(a,a4), method="euclidean") a a4 2.236068 > dist(rbind(a,a5), method="euclidean") a a5 1.414214 > dist(rbind(a,a6), method="euclidean") a a6 2 > dist(rbind(a,a7), method="euclidean") a a7 6.403124 > dist(rbind(a,a8), method="euclidean") a a8 3.605551 12 CSCI_5080_Assignment_07
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
> # 3rd cluster > dist(rbind(b,a1), method="euclidean") b a1 6.519202 > dist(rbind(b,a2), method="euclidean") b a2 1.581139 > dist(rbind(b,a3), method="euclidean") b a3 6.519202 > dist(rbind(b,a4), method="euclidean") b a4 5.700877 > dist(rbind(b,a5), method="euclidean") b a5 5.700877 > dist(rbind(b,a6), method="euclidean") b a6 4.527693 > dist(rbind(b,a7), method="euclidean") b a7 1.581139 > dist(rbind(b,a8), method="euclidean") b a8 6.041523 > # 3rd Section > c1=c(3, 9.5) > c2=c(6.5, 5.25) > c3=c(1.5, 3.5) > # > dist(rbind(c1,a1), method="euclidean") c1 a1 1.118034 > dist(rbind(c1,a2), method="euclidean") c1 a2 4.609772 > dist(rbind(c1,a3), method="euclidean") c1 a3 7.433034 > dist(rbind(c1,a4), method="euclidean") c1 a4 2.5 > dist(rbind(c1,a5), method="euclidean") c1 a5 6.020797 > dist(rbind(c1,a6), method="euclidean") c1 a6 6.264982 > dist(rbind(c1,a7), method="euclidean") c1 a7 7.762087 > dist(rbind(c1,a8), method="euclidean") 13 CSCI_5080_Assignment_07
c1 a8 1.118034 > # > dist(rbind(c2,a1), method="euclidean") c2 a1 6.543126 > dist(rbind(c2,a2), method="euclidean") c2 a2 4.506939 > dist(rbind(c2,a3), method="euclidean") c2 a3 1.952562 > dist(rbind(c2,a4), method="euclidean") c2 a4 3.132491 > dist(rbind(c2,a5), method="euclidean") c2 a5 0.559017 > dist(rbind(c2,a6), method="euclidean") c2 a6 1.346291 > dist(rbind(c2,a7), method="euclidean") c2 a7 6.388466 > dist(rbind(c2,a8), method="euclidean") c2 a8 4.506939 > # > dist(rbind(c3,a1), method="euclidean") c3 a1 6.519202 > dist(rbind(c3,a2), method="euclidean") c3 a2 1.581139 > dist(rbind(c3,a3), method="euclidean") c3 a3 6.519202 > dist(rbind(c3,a4), method="euclidean") c3 a4 5.700877 > dist(rbind(c3,a5), method="euclidean") c3 a5 5.700877 > dist(rbind(c3,a6), method="euclidean") c3 a6 4.527693 > dist(rbind(c3,a7), method="euclidean") c3 a7 1.581139 > dist(rbind(c3,a8), method="euclidean") c3 a8 6.041523 14 CSCI_5080_Assignment_07