
This
The field of information retrieval is concerned with finding relevant electronic documents based upon a query. For example, given a group of keywords (the query), a search engine retrieves Web pages (documents)and displays them sorted by relevance to the query. This technology requires a way to compare a document with the query to see which is most relevant to the query.
A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, given the following sample document:
“Chocolate ice cream, chocolate milk, and chocolate bars are delicious.” This document would be parsed into keywords where case is ignored and punctuation discarded and turned into the set containing the words {chocolate, ice, cream, milk, and, bars, are, delicious}. An identical processis performed on the query to turn it into a set of strings. Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity between Q and D is computed by:
Sim=| Q∩D || Q | | D |
Modify the StringSet from Programming Project 12 by adding an additional member function that computes the similarity between the current StringSet and an input parameter of type StringSet. The sqrt function is in the cmath library.
Create two text files on your disk named Document1.txt and Document2.txt. Write some text content of your choice in each file, but make sure that each file contains different content. Next, write a program that allows the user to input from the keyboard a set of strings that represents a query. The program should then compare the query to both text files on the disk and output the similarity to each one using the binary cosine coefficient. Test your program with different queries to see if the similarity metric is working correctly.

Want to see the full answer?
Check out a sample textbook solution
Chapter 11 Solutions
Problem Solving with C++ plus MyProgrammingLab with Pearson eText-- Access Card Package (9th Edition)
Additional Engineering Textbook Solutions
Starting Out with Programming Logic and Design (5th Edition) (What's New in Computer Science)
Database Concepts (8th Edition)
Modern Database Management
Java: An Introduction to Problem Solving and Programming (8th Edition)
Electric Circuits. (11th Edition)
Thinking Like an Engineer: An Active Learning Approach (4th Edition)
- Please answer both Exercise 1 and2(these questions are not GRADED)arrow_forwardDiscussion 1. Comment on your results. 2. Compare between the practical and theoretical results. 3. Find VB, Vc on the figure below: 3V V₁₁ R₁ B IR, R, IR, R www ΙΚΩ www www I 1.5KQ 18₁ 82002 R₁ 3.3KQ R₂ 2.2KQ E Darrow_forwardAgile1. a. Describe it and how it differs from other SDLC approachesb. List and describe the two primary terms for the agile processc. What are the three activities in the Construction phasearrow_forward
- how are youarrow_forwardneed help with thi Next, you are going to combine everything you've learned about HTML and CSS to make a static site portfolio piece. The page should first introduce yourself. The content is up to you, but should include a variety of HTML elements, not just text. This should be followed by an online (HTML-ified) version of your CV (Resume). The following is a minimum list of requirements you should have across all your content: Both pages should start with a CSS reset (imported into your CSS, not included in your HTML) Semantic use of HTML5 sectioning elements for page structure A variety other semantic HTML elements Meaningful use of Grid, Flexbox and the Box Model as appropriate for different layout components A table An image Good use of CSS Custom Properties (variables) Non-trivial use of CSS animation Use of pseudeo elements An accessible colour palette Use of media queries The focus of this course is development, not design. However, being able to replicate a provided design…arrow_forwardUsing the notationarrow_forward
- you can select multipy optionsarrow_forwardFor each of the following, decide whether the claim is True or False and select the True ones: Suppose we discover that the 3SAT can be solved in worst-case cubic time. Then it would mean that all problems in NP can also be solved in cubic time. If a problem can be solved using Dynamic Programming, then it is not NP-complete. Suppose X and Y are two NP-complete problems. Then, there must be a polynomial-time reduction from X to Y and also one from Y to X.arrow_forwardMaximum Independent Set problem is known to be NP-Complete. Suppose we have a graph G in which the maximum degree of each node is some constant c. Then, is the following greedy algorithm guaranteed to find an independent set whose size is within a constant factor of the optimal? 1) Initialize S = empty 2) Arbitrarily pick a vertex v, add v to S delete v and its neighbors from G 3) Repeat step 2 until G is empty Return S Yes Noarrow_forward
- C++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningC++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrNew Perspectives on HTML5, CSS3, and JavaScriptComputer ScienceISBN:9781305503922Author:Patrick M. CareyPublisher:Cengage Learning
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,




