DATA MINING Assi4 Sowmya Gonugunta

pdf

School

Georgia State University *

*We aren’t endorsed by this school

Course

320

Subject

Mechanical Engineering

Date

Dec 6, 2023

Type

pdf

Pages

6

Uploaded by SargentDuckMaster1010

Report
DATA MINING 1) Calculating the information gain of each attribute as follows: First calculate information gain of the given data set A formula: = -(7/16)*log(7/16)-(9/16)*log(9/16) = -(0.438)*(1.191)-(0.563)*(-0.829) = (0.522)+(0.467) = 0.989 Expected information needed to classify a data object in 𝐷 after partitioning by attribute 𝐴 Rain: Info(Rain) =(4/16)*(-(3/4)*log(3/4)-(1/4)*log(1/4))+(12/16*(-(4/12)*log(4/12)-(8/12)*log(8/12)) = (0.25) -(0.75)*(-0.415)-(0.25)*(-2)) + (0.75)*(-(0.333)*(-1.586)-(0.667)*(-0.584)) = 0.892 Gain(Rain) = 0.989-0.892= 0.097 Sprinkler:Info(Sprinkler) =(6/16)*(-(5/6)*log(5/6)-(1/6)*log(1/6))+(10/16)*(-(2/10)*log(2/10)-(8/10)*log(8/10)) =(0.375) (-(0.833)*(-0.264)-(0.167)*(-2.582))+(0.625)*(-(0.2)*(-2.322)-(0.8)*(-0.322)) =0.695 Gain(Sprinkler) = 0.989-0.695= 0.294 As the “sprinkler” has the highest gain , it will be the root node of the decision tree. We have only one more attribute. rain. Thus, “rain” will be the child of Sprinkler. Split the tuples based on the labels of Sprinkler.
For, Sprinkler=Yes Index Rain Grass 1 No Wet 2 No Wet 3 No Wet 4 Yes Wet 5 No Dry 6 No Wet For, Sprinkler=No Index Rain Grass 1 No Dry 2 No Dry 3 Yes Wet 4 No Dry 5 No Dry 6 Yes Dry 7 No Dry 8 No Dry 9 Yes Wet 10 No Dry Again Split the tuples based on the labels of Rain.
Final Decision Tree, 2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2.1) Classification accuracy = TP+TN/P+N = 4+3/10 = 0.7 => 70% 2.2) Error rate = FP+FN/P+N = 1+2/10 = 0.3 => 30% 2.3) Sensitivity = TP/P = 4/5 = 0.8 => 80% 2.4) Specificity = TN/N =3/5= 0.6 =>60% 2.5) Precision = TP/TP+FP = 4/6 = 0.666 =>66.6% 2.6) Recall = TP/TP+FN = 4/5 = 0.8 => 80 % 2.7) F-score = 2-Precision-Recall/Precision+Recall = 2-(4/5)-(4/6)/(4/5) + (4/6) =0.7272 = 72.72 3) Prior Calculation: P(Grass = dry) = 5/10=0.5 P(Grass = wet) = 5/10=0.5 Likelihood: P(rain = no|grass = dry) = 5/5 = 1 P(rain = no|grass = wet) = 3/5 = 0.6 P(Sprinkler = yes|grass = dry) = 2/5 = 0.4 P(Sprinkler = yes|grass = wet) = 3/5= 0.6
Posterior Calculation: P(Grass=wet|(Rain=no and Sprinkler=yes)= (P(rain=no|Grass=wet)* P(Sprinkler=yes |Grass=wet)*P(Grass=wet))/P(Rain=no and Sprinkler=yes) P(Grass=Dry|(Rain=no and Sprinkler=yes)= (P(rain=no|Grass=dry)* P(Sprinkler=yes |Grass=dry)*P(Grass=wet))/P(Rain=no and Sprinkler=yes) As Evidence is the same for both classes we can ignore it for simplicity: Then, P(Grass=wet|(Rain=no and Sprinkler=yes) (P(rain=no|Grass=wet)* P(Sprinkler=yes|Grass=wet)*P(Grass=wet) =3/5*3/5*0.5=0.18 P(Grass=Dry|(Rain=no and Sprinkler=yes) (P(rain=no|Grass=Dry) P(Sprinkler=yes|Grass=dry)*P(Grass=Dry)=1*2/5*0.5=0.2 Since P(Grass=wet|(Rain=no and Sprinkler=yes)+ P(Grass=Dry|(Rain=no and Sprinkler=yes) =1 These numbers can be converted into a probability by making the sum equal to 1 (normalization): P(Grass=wet|(Rain=no and Sprinkler=yes)= P(Grass=wet|(Rain=no and Sprinkler=yes)/(P(Grass=wet|(Rain=no and Sprinkler=yes))+ P(Grass=Dry|(Rain=no and Sprinkler=yes))= 0.18/(0.18+0.2)=0.47 P(Grass=Dry|(Rain=no and Sprinkler=yes)/(P(Grass=wet|(Rain=no and Sprinkler=yes))+ P(Grass=Dry|(Rain=no and Sprinkler=yes))= 0.2/(0.18+0.2)=0.53 Since P(Grass=dry|(Rain=no and Sprinkler=yes) is greater than P(Grass=wet|(Rain=no and Sprinkler=yes) NB Classifier will predict the grass label is dry. 4) Then conditional probability tables can be defined as given below: P(rain) True False 4/16 = 0.25 11/16=0.75 P(sprinkler|rain) Rain True False False 5/12 = 0.41 7/12 = 0.583 True ¼ = 0.25 ¾ = 0.75
P(grass|rain,sprinkler) Sprinkler Rain True False False False 0 1 False True 0.67 0.44 True False 0.8 0.2 True True 1 0
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help