This
The field of information retrieval is concerned with finding relevant electronic documents based upon a query. For example, given a group of keywords (the query), a search engine retrieves Web pages (documents)and displays them sorted by relevance to the query. This technology requires a way to compare a document with the query to see which is most relevant to the query.
A simple way to make this comparison is to compute the binary cosine coefficient. The coefficient is a value between 0 and 1, where 1 indicates that the query is very similar to the document and 0 indicates that the query has no keywords in common with the document. This approach treats each document as a set of words. For example, given the following sample document:
“Chocolate ice cream, chocolate milk, and chocolate bars are delicious.” This document would be parsed into keywords where case is ignored and punctuation discarded and turned into the set containing the words {chocolate, ice, cream, milk, and, bars, are, delicious}. An identical processis performed on the query to turn it into a set of strings. Once we have a query Q represented as a set of words and a document D represented as a set of words, the similarity between Q and D is computed by:
Sim=| Q∩D || Q | | D |
Modify the StringSet from Programming Project 12 by adding an additional member function that computes the similarity between the current StringSet and an input parameter of type StringSet. The sqrt function is in the cmath library.
Create two text files on your disk named Document1.txt and Document2.txt. Write some text content of your choice in each file, but make sure that each file contains different content. Next, write a program that allows the user to input from the keyboard a set of strings that represents a query. The program should then compare the query to both text files on the disk and output the similarity to each one using the binary cosine coefficient. Test your program with different queries to see if the similarity metric is working correctly.
Want to see the full answer?
Check out a sample textbook solutionChapter 11 Solutions
Problem Solving with C++ (10th Edition)
Additional Engineering Textbook Solutions
Starting Out with Programming Logic and Design (5th Edition) (What's New in Computer Science)
Database Concepts (8th Edition)
Modern Database Management
Java: An Introduction to Problem Solving and Programming (8th Edition)
Electric Circuits. (11th Edition)
Thinking Like an Engineer: An Active Learning Approach (4th Edition)
- Write with Python softwarearrow_forwardA common memory matching game played by young children is to start with a deck of cards that contain identical pairs. For example, given six cards in the deck, two might be labeled 1, two labeled 2, and two labeled 3. The cards are shuffled and placed face down on the table. A player then selects two cards that are face down, turns them face up, and if the cards match, they are left face up. If the two cards do not match, they are returned to their original face down position. The game continues until all cards are face up. Write a program that plays the memory matching game. Use 16 cards that are laid out in a 4 4 square and are labeled with pairs of numbers from 1 to 8. Your program should allow the player to specify the cards that he or she would like to select through a coordinate system. For example, in the following layout: 1 2 3 4 1 8 * * * 2 * * * * 3 * 8 * * 4 * * * * all of the face down cards are indicated by *. The pairs of 8 that are face up are at coordinates (1,1) and…arrow_forwardCSCI 2436:01L Data Structures Lab Lab 1 - Chapter 4 The Efficiency of Algorithms In this lab, you will practice how to measure the running time f a section of code in Java. One approach is to use System.nano Time() where the current time is stored as a long integer equals to the number of nanoseconds. By subtracting the starting time in nanoseconds from the ending time in nanoseconds, you get the run time-in nanoseconds of a section of code. public static void main(String[] args) { int n1 = 10, n2 = 100, n3 = 1000, n4 = 10000; long n1Time, n2Time, n3Time, n4Time; n1Time AlgorithmA (nl); For example, suppose that AlgorithmA is the name of a method you wish to time. The following statements will compute the number of nanoseconds that AlgorithmA requires to execute: } public static long AlgorithmA (int n) { long startTime, endTime, elapsedTime; startTime = System.nanoTime (); int sum = 0; for (int i = 1; i 0 1 2. By midnight, Tuesday, Jan 24th, submit your Java source file and a…arrow_forward
- WRITE A CODE IN C++ LANGUAGE To implement the multiplayer game with multiple players moving and collecting items concurrently, you can use the following steps: Step 1: Generate a random number between 10 - 99 and multiply it by the last digit of your roll number. Step 2: Divide your roll number with the generated number in step 1 and take the mod of the result with 25. Step 3: If the result in step 2 is less than 10, add 15 to it to get the board size. Step 4: Create an n x n board and initialize it with empty squares and randomly place items on some squares. Step 5: Create a thread for each player and pass the player's ID and starting position to the thread. Step 6: Each player thread should handle the player's movement and item collection. Step 7: When a player moves onto a square that contains an item, the player thread should send a message to the main thread indicating that the player has collected an item and the item's value. Step 8: The main thread should update the…arrow_forwardWrite a program that prints an mxn matrix whose dimensions are specified by the user. Let the matrix values be random variables. You must use it within the repetition cycle. Example format: Enter dimension of matrix mxn: 2 4 The 2x4 matrix is : 1 2 3 4 5 6 7 8arrow_forward4 20 matrix can be represented as a list and a column count value in Python. For example, the 3x3 matrix 1 2 3 45 6 7 8 9 can be row-wise represented as ([1,2,3,4,5,6,7,8,9], 3), where the number3 represents the number of columns in the matrix. Similarly, 1 35 24 6 becomes ([1,3,5,2,4,6], 3). A submatrix can be defined as an (1,r,t,b) tuple, where 1 and r are left and right column indices, and t and b are top and bottom row indices (all inclusive). write a function that takes a tuple containing the list representing a matrix, and the column count of the matrix, along with another tuple representing a specific submatrix, and returns the list representation of the submatrix along with its column count as a tuple. For example, given submatrix(([1,2,3,4,5,6,7,8,9,10,11,12], 4), (1,2,0,1)) returns: ([2,3,6, 7], 2) because, ([1, 2,3,4,5,6,7,8,9,10,11,12], 4) represents: 2 3 4 5 6 7 9 10 11 12 8 and (1,2,0,1) represents the submatrix bet ween column indices 1 and 2 (both inclusive), and row…arrow_forward
- In C language pleasearrow_forwardArtificial Intelligence (Part - 2) ==================== The Towers of Hanoi is a famous problem for studying recursion incomputer science and searching in artificial intelligence. We start with N discs of varying sizes on a peg (stacked in order according to size), and two empty pegs. We are allowed to move a disc from one peg to another, but we are never allowed to move a larger disc on top of a smaller disc. The goal is to move all the discs to the rightmost peg (see figure). To solve the problem by using search methods, we need first formulate the problem. Supposing there are K pegs and N disk. (2) What is the size of the state space?arrow_forwardIN HASKELL PROGRAMMING LANGUAGE PLEASE In case you do not know it: the game is played on a 3x3 grid that is initially empty. Two players are playing, by alternatingly making moves. A move by a player places their token (an X for player 1, an O for player 2) into a cell that was empty. We are using algebraic notations for indexing the positions in the board, with A,B,C indexing the columns and 1,2,3 the rows. Specifically, these coordinates would be used in the implementation for moves made by a human player. If the X X O Figure 1: Sample board position same token appears 3 times in any of the three columns, three rows or two main diagonals the game is over and that player wins. If the grid is filled without that happening the game is a draw. For the depicted board, we have Xs in positions C3 and A2, and an O in position B1. It would be O’s turn to make a move; a legal move would be C2, but it is not a good move, because X can force a win by responding A1. O cannot force a win, but…arrow_forward
- A spell checker in a word processing program makes suggestions when it finds a word not in the dictionary. To determine what words to suggest, it tries to find similar words. One measure of word similarity is the Levenshtein distance, which measures the number of substitutions, additions, or deletions that are required to change one word into another. For example, the words spit and spot are a distance of 1 apart; changing spit to spot requires one substitution (i for o). Likewise, spit is distance 1 from pit since the change requires one deletion (the s). The word spite is also distance 1 from spit since it requires one addition (the e). The word soot is distance 2 from spit since two substitutions would be required. a. Create a graph using words as vertices, and edges connecting words with a Levenshtein distance of 1. Use the misspelled word “moke” as the center, and try to find at least 10 connected dictionary words. How might a spell checker use this graph? b. Improve the method…arrow_forwardcode this in RUBY programming languagearrow_forwardBlackout Math is a math puzzle in which you are given an incorrect arithmetic equation. The goal of the puzzle is to remove two of the digits and/or operators in the equation so that the resulting equation is correct. For example, given the equation 6 - 5 = 15 ^ 4/2 we can remove the digit 5 and the / operator from the right-hand side in order to obtain the correct equality 6 - 5 = 1 ^ 42. Both sides of the equation now equal to 1. Observe how removing an operator between two numbers (4 and 2) causes the digits of the numbers to be concatenated (42). Here is a more complicated example: 288 / 24 x 6 = 18 x 13 x 8 We can remove digits and operators from either side of the equals sign (either both from one side, or one on each side). In this case, we can remove the 2 from the number 24 on the left-hand side and the 1 from the number 13 on the right-hand side to obtain the correct equality 288 / 4 x 6 = 18 x 3 x 8 Both sides of the equation now equal to 432. Here is another puzzle for you…arrow_forward
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education