
This
The field of information retrieval is concerned with finding relevant electronic documents based upon a query. For example, given a group of keywords (the query), a search engine retrieves Web pages (documents)and displays them sorted by relevance to the query. This technology requires a way to compare a document with the query to see which is most relevant to the query.
A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, given the following sample document:
“Chocolate ice cream, chocolate milk, and chocolate bars are delicious.” This document would be parsed into keywords where case is ignored and punctuation discarded and turned into the set containing the words {chocolate, ice, cream, milk, and, bars, are, delicious}. An identical processis performed on the query to turn it into a set of strings. Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity between Q and D is computed by:
Sim=| Q∩D || Q | | D |
Modify the StringSet from Programming Project 12 by adding an additional member function that computes the similarity between the current StringSet and an input parameter of type StringSet. The sqrt function is in the cmath library.
Create two text files on your disk named Document1.txt and Document2.txt. Write some text content of your choice in each file, but make sure that each file contains different content. Next, write a program that allows the user to input from the keyboard a set of strings that represents a query. The program should then compare the query to both text files on the disk and output the similarity to each one using the binary cosine coefficient. Test your program with different queries to see if the similarity metric is working correctly.

Want to see the full answer?
Check out a sample textbook solution
Chapter 11 Solutions
Problem Solving with C++ (10th Edition)
Additional Engineering Textbook Solutions
Starting Out with Programming Logic and Design (5th Edition) (What's New in Computer Science)
Database Concepts (8th Edition)
Modern Database Management
Java: An Introduction to Problem Solving and Programming (8th Edition)
Electric Circuits. (11th Edition)
Thinking Like an Engineer: An Active Learning Approach (4th Edition)
- Briefly describe the issues involved in using ATM technology in Local Area Networksarrow_forwardFor this question you will perform two levels of quicksort on an array containing these numbers: 59 41 61 73 43 57 50 13 96 88 42 77 27 95 32 89 In the first blank, enter the array contents after the top level partition. In the second blank, enter the array contents after one more partition of the left-hand subarray resulting from the first partition. In the third blank, enter the array contents after one more partition of the right-hand subarray resulting from the first partition. Print the numbers with a single space between them. Use the algorithm we covered in class, in which the first element of the subarray is the partition value. Question 1 options: Blank # 1 Blank # 2 Blank # 3arrow_forward1. Transform the E-R diagram into a set of relations. Country_of Agent ID Agent H Holds Is_Reponsible_for Consignment Number $ Value May Contain Consignment Transports Container Destination Ф R Goes Off Container Number Size Vessel Voyage Registry Vessel ID Voyage_ID Tonnagearrow_forward
- I want to solve 13.2 using matlab please helparrow_forwarda) Show a possible trace of the OSPF algorithm for computing the routing table in Router 2 forthis network.b) Show the messages used by RIP to compute routing tables.arrow_forwardusing r language to answer question 4 Question 4: Obtain a 95% standard normal bootstrap confidence interval, a 95% basic bootstrap confidence interval, and a percentile confidence interval for the ρb12 in Question 3.arrow_forward
- using r language Obtain a bootstrap t confidence interval estimate for the correlation statistic in Example 8.2 (law data in bootstrap).arrow_forwardusing r language Compute a jackknife estimate of the bias and the standard error of the correlation statistic in Example 8.2.arrow_forwardusing r languagearrow_forward
- C++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningC++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrNew Perspectives on HTML5, CSS3, and JavaScriptComputer ScienceISBN:9781305503922Author:Patrick M. CareyPublisher:Cengage Learning
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,




