LabF_netid-2

xlsx

School

Johns Hopkins University *

*We aren’t endorsed by this school

Course

210

Subject

Computer Science

Date

Dec 6, 2023

Type

xlsx

Pages

15

Uploaded by seedmaster101

Report
name calories protein fat sodium fiber carbo All-Bran_with_Extra_Fiber 50 4 0 140 14 8 100%_Bran 70 4 1 130 10 5 All-Bran 70 4 1 260 9 7 Post_Nat._Raisin_Bran 120 3 1 200 6 11 Bran_Flakes 90 3 0 210 5 13 Fruitful_Bran 120 3 0 240 5 14 Fruit_&_Fibre_Dates,_Walnuts,_and_Oats 120 3 2 160 5 12 Raisin_Bran 120 3 1 210 5 14 Description of the data: 1. Name: Name of cereal 2. calories: calories per serving 3. protein: grams of protein 4. fat: grams of fat 5. sodium: milligrams of sodium 6. fiber: grams of dietary fiber 7. carbo: grams of complex carbohydrates 8. sugars: grams of sugars 9. potass: milligrams of potassium
sugars potass 0 330 6 280 5 320 14 260 5 190 12 190 10 200 12 240
Variables # Selected Variables 8 Selected Variables calories protein fat sodium fiber carbo sugars Hierarchical Clustering: Fitting Parameters Clustering Method WARD Distance Measure Euclidian Normalized? 1 Output-Dendrogram (Group 1) (Group 2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
potass Output-Clustering Stages Stage Sub-Cluster +Sub-Cluster Distance Stage1 4 8 1.0992063421 Stage2 5 6 1.9383688093 Stage3 4+8 7 2.4511513613 Stage4 1 2 2.6688169914 Stage5 1+2 3 3.4505484927 Stage6 4+8+7 5+6 3.9379612141 Stage7 1+2+3 4 10.80388824 Legend for Dendrogram Cluster/Obs Cereal 1 All-Bran_with_Extra_Fiber 2 100%_Bran 3 All-Bran 4 Post_Nat._Raisin_Bran 5 Bran_Flakes 6 Fruitful_Bran 7 Fruit_&_Fibre_Dates,_Walnuts,_and_Oats 8 Raisin_Bran All-Bran_with_Extra_Fiber
Question 1: Use Cereals data to answer the questions below- a) When using the Hierarchical method for clustering, how many clusters do we start with? b) How many distinct clusters you see from the dendrogram. c) What observations or cereals are members of each of the clusters you idenitifed. Group 1 Group 1 Group 2 Cluster # or name Cereals Cluster # or name
? (Fill out the below table) Group 2 Cereals Observation Cereal Observation 1 All-Bran_with_Extra_Fiber
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 2: Term DF (Doucment Freq) N (Number of Documents) TF (Term Frequency) employee 1295 1431 operat 410 1431 "the " 1431 1431 our 987 1431 pandemic 19 1431 Term: Description: DF DF is the number of documents containing this term (token) N This is the total number of documents in the Corpus. TF This is the number of times the term appears in the document of interest. N/DF This is the N/DF ratio which represents the inverse of the relative frequency TFIDF Measures how often a term appears in a text in relation to how common it
N/DF TFIDF y of a token across documents. is in other texts.
Interpretation
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 1 Solution: Use Cereals data to answer the questions below- a) When using the Hierarchical method for clustering, how many clusters do we start with? b) How many distinct clusters you see from the dendrogram. c) What observations or cereals are members of each of the clusters you idenitifed. Group 1 Group 1 Group 2 Cluster # or name Cereals Cluster # or name 4 Post_Nat._Raisin_Bran 1
8 2 (Fill out the below table) Group 2 Observation Cereal Cereals Observation 1 All-Bran_with_Extra_Fiber All-Bran_with_Extra_Fiber Observation 2 100%_Bran
Question 2 Solution: Term DF (Doucment Freq) N (Number of Documents) TF (Term Frequency) employee 1295 1431 7 operat 410 1431 2 "the " 1431 1431 19 our 987 1431 15 pandemic 19 1431 6 Term: Description: DF DF is the number of documents containing this term (token) N This is the total number of documents in the Corpus. TF This is the number of times the term appears in the document of interest. N/DF This is the N/DF ratio which represents the inverse of the relative frequency TFIDF Measures how often a term appears in a text in relation to how common it
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
N/DF TFIDF 1.105 0.304 3.490 1.086 1.000 0.000 1.450 2.420 75.316 11.261 y of a token across documents. is in other texts.
Interpretation "employee" has a weak presence in the document based on TFIDF "operat" token has a stronger presence than employee "the" has less or no importance - it is prevelant in all documents! "our" seems to be used substantially by Caterpillar "pandemic" is used much more commonly than most words in this newest Caterpillar document based on TFIDF
F
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help