Assume you have two documents D1 = abccacab D2 = acacabca a.) Find the 2-shingles of both documents (hints: A 2-shingle is a k-shingle, with k = 2. Also, remember that sets cannot contain duplicates) b.) Find the Jaccard Similarity of D1 and D2 The question has been answered but isn’t the jaccard similarity 0.8 not 0.4? Isn’t (AUB)= 5 [ab, ca, bc, ac, cc] and (|A ∩ B|) = 4 [ab, bc, ca, ac] So: 4/5 = 0.8 Right? Because the sets cannot contain duplicates.
Assume you have two documents D1 = abccacab D2 = acacabca a.) Find the 2-shingles of both documents (hints: A 2-shingle is a k-shingle, with k = 2. Also, remember that sets cannot contain duplicates) b.) Find the Jaccard Similarity of D1 and D2 The question has been answered but isn’t the jaccard similarity 0.8 not 0.4? Isn’t (AUB)= 5 [ab, ca, bc, ac, cc] and (|A ∩ B|) = 4 [ab, bc, ca, ac] So: 4/5 = 0.8 Right? Because the sets cannot contain duplicates.
Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
Related questions
Question
Assume you have two documents
D1 = abccacab
D2 = acacabca
a.) Find the 2-shingles of both documents (hints: A 2-shingle is a k-shingle, with k = 2. Also, remember that sets cannot contain duplicates)
b.) Find the Jaccard Similarity of D1 and D2
The question has been answered but isn’t the jaccard similarity 0.8 not 0.4?
Isn’t (AUB)= 5 [ab, ca, bc, ac, cc] and (|A ∩ B|) = 4 [ab, bc, ca, ac]
So:
4/5 = 0.8
Right? Because the sets cannot contain duplicates.
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution!
Trending now
This is a popular solution!
Step by step
Solved in 3 steps
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education