final 5080_2024

docx

School

Austin Peay State University *

*We aren’t endorsed by this school

Course

5080

Subject

Computer Science

Date

Jan 9, 2024

Type

docx

Pages

Uploaded by JusticeGalaxyHorse20

Q1. For the following 200, 400, 800, 1000, 2000 1. Calculate the mean and variance 2. Normalize the above group by min-max min=0, max=10 3. In z-score normalization what value should the first 200 be transformed to? 1. Mean = 880, Variance= 492,000 2. Min_max= 0, Max=10 Normalized by Min_Max = (X - Min /(Max-Min)) * (New_Max – New_Min) + New_Min (X-Min/(Max-Min) x (10 – 0) + 0 (200 – 0/2000-200) x (10 – 0) + 0 = 1.11 Value Min_Max 200 0 400 1.11 800 3.33 1000 4.44 2000 10 3. Z-score normalization Z-Score= x – mean/ σ 2 For the value (200) = 200 − 880 √ 492,000 = -0.969452112 Q2. A database has four transactions. Let min sup = 60% and min conf = 80%. (a) At the granularity of item category (e.g., item i could be “ Milk" ), for the following rule template, [s,c] ꓯꓯ transaction ϵ , buys ( X, item 1 ) ^ buys ( X, item 2 ) => buys ( X, item 3 ) [ s; c ] list the frequent k -itemset for the largest k and all of the strong association rules (with their support s and confidence c ) containing the frequent k -itemset for the largest k . Answer: The value of k =3 and the frequent 3-itemset is {Bread, Milk, Cheese}. These are the Rules. Bread ^ Cheese => Milk, [75%, 100%] Cheese ^ Milk => Bread, [75%, 100%] Cheese => Milk ^ Bread, [75%, 100%] (b) At the granularity of brand-item category (e.g., itemi could be \Sunset-Milk" ), for the following rule template, [s, c] customer ; buys ( X; item 1) ^ buys ( X; item 2) )buys ( X; item 3) list the frequent k -itemset for the largest k . The maximum value of k =3. The frequent 3-itemset includes {(Wonder-Bread, Dairyland-Milk, Tasty-Pie), (Wonder-Bread, Sunset-Milk, Dairyland-Cheese)}. Q3. Suppose you are requesting to classify microarray with 100 tissues and 10000 genes. Which of the following algorithms would you recommend and why? 1. Decision tree induction 2. Piece-wise linear regression 3. SVM 4. Associative classification 5. Genetic algorithm 6. Bayesian clief network Answer: I suggest opting for Support Vector Machines (SVM) because this algorithm is adept at accommodating penalties arising from the adverse impact of genes on the classification process.

Additionally, SVM excels in managing high-dimensional data, addressing non-linear relationships, and delivering distinct separations with the incorporation of penalties for misclassifications. Q4. The Table below shows the Decision…..see text

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Question 5: *why does an SVM algorithm have high classification accuracy in high-demential space? *why is an SVM algorithm slow in large data sets? *outline an extended SVM algorithm that is scalable in large data sets? Answer: i. SVMs excel at classifying high-dimensional data by cleverly mapping it to a higher-dimensional space. This clever trick allows us to find a clear dividing line (hyperplane) that separates different classes effectively. To achieve this separation, a technique called the "kernel trick" is often used. This trick essentially transforms the data in a way that makes it easier to find the separating hyperplane. ii. As datasets grow larger, training a Support Vector Machine (SVM) can become increasingly demanding, both in terms of computational resources and memory usage. This is because the algorithm needs to analyze all data points to find the optimal decision boundary, and the computational cost of this process scales significantly with the size of the dataset. This can lead to slower training times and even system failures if memory limitations are reached. iii. outlined of a cluster based SVM(CB-SVM) method below. 1. Micro-cluster Formation: Utilize a CF-Tree (Clustering Feature Tree) to break down the large dataset into smaller, more manageable micro-clusters. 2. Representative Centroids: Train an SVM using the centroids of the micro-clusters as representatives for the original data points, reducing computational complexity compared to using the entire dataset. 3. Declustering Near the Boundary: Identify and eliminate data points near the decision boundary of the SVM to minimize redundancy and improve efficiency. 4. Data Augmentation: Introduce additional data points from the original dataset that were not initially part of the micro- clusters. This step aims to capture a more comprehensive range of information. 5. Iterative Training: Repeat the entire process iteratively. In each iteration, update the micro-clusters based on the current SVM model. Retrain the SVM using the updated representative points. Refine the model through iterations until it converges to a stable state .

Question 6: Different data types may need to use different similarities distance measures. State what is the expected similarity measure in each of the following questions. 1. Clustering stars in the universe 2. Clustering text documents 3. Clustering clinical data 4. Clustering houses to find delivery centers in a city rivers and bridges 1. Clustering Stars in the Universe = Euclidean Distance 2. Clustering text documents = Euclidean Distance, Cosine Similarity, Jaccard Coefficient 3. Clustering clinical data = Cosine Similarity 4. Clustering houses to find delivery centers = Cosine Similarity

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

wek 8 discusion.docx

CSCI_5080_Assignment_06_1.docx

5080_assig_1_1.docx

Final_Exam_FALL_2023_6083B.docx

CSCI_5080_Assignment_04_01.docx

CSCI_5080_Assignment_07_1.docx

Assignment 1.pdf

CIS_5200_Fall_2023_Final_Practice_Problems.pdf

test-1-mock.pdf

Deom_for_GMIY.docx

Response_for_GMIY.docx

Lesson_Plan_for_GMIY.docx

Recommended textbooks for you

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

A Guide to SQL

Computer Science

ISBN:9781111527273

Author:Philip J. Pratt

Publisher:Course Technology Ptr

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
A Guide to SQL
Computer Science
ISBN:9781111527273
Author:Philip J. Pratt
Publisher:Course Technology Ptr
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning