Consider a dataset from an online store consisting of selected sets of five ebooks, which are representative of the following categories: thriller, education and comedy. Using a document comparison technique, we can calculate the similarity of a new ebook b which is added to the store. The similarity represents the distance between the new ebook b and the five ebooks within each set for the categories that are within the store. The table of similarity values of the new ebook b to the sample sets is shown below.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
Consider a dataset from an online store consisting of selected sets of five ebooks, which are
representative of the following categories: thriller, education and comedy. Using a document
comparison technique, we can calculate the similarity of a new ebook b which is added to the store.
The similarity represents the distance between the new ebook b and the five ebooks within each set
for the categories that are within the store.
The table of similarity values of the new ebook b to the sample sets is shown below.
Category Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
thriller
0.43
0.11
0.03
0.23
0.34
education
0.67
0.12
0.36
0.12
0.72
comedy
0.08
0.31
0.8
0.1
0.53
Consider a k-NN problem for k=3 with distance weighted average voting. What is the genre of the
new ebook b?
Hint: The weight for each vote of a sample of a given category with distance d is calculated by
w=1/d. For example, for Sample 1 in thriller, the weight is w=1/0.43 = 2.32. The distance weighted
average voting calculates the weight of each vote based on the distance from the new data point
using the previous formula for w. To make a prediction, you take the k nearest neighbours of the new
sample, sum the votes indepently for each class and make the prediction based upon which class has
the highest total vote.
thriller
comedy
it is not decidable
education
Transcribed Image Text:Consider a dataset from an online store consisting of selected sets of five ebooks, which are representative of the following categories: thriller, education and comedy. Using a document comparison technique, we can calculate the similarity of a new ebook b which is added to the store. The similarity represents the distance between the new ebook b and the five ebooks within each set for the categories that are within the store. The table of similarity values of the new ebook b to the sample sets is shown below. Category Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 thriller 0.43 0.11 0.03 0.23 0.34 education 0.67 0.12 0.36 0.12 0.72 comedy 0.08 0.31 0.8 0.1 0.53 Consider a k-NN problem for k=3 with distance weighted average voting. What is the genre of the new ebook b? Hint: The weight for each vote of a sample of a given category with distance d is calculated by w=1/d. For example, for Sample 1 in thriller, the weight is w=1/0.43 = 2.32. The distance weighted average voting calculates the weight of each vote based on the distance from the new data point using the previous formula for w. To make a prediction, you take the k nearest neighbours of the new sample, sum the votes indepently for each class and make the prediction based upon which class has the highest total vote. thriller comedy it is not decidable education
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Complex Datatypes
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education