IN SCALA COMPLETE THE FUNCTIONS (prod, overlap and similarity) //(1) Complete the clean function below. It should find // all words in a string using the regular expression // \w+ and the library function // // some_regex.findAllIn(some_string) // // The words should be Returned as a list of strings. def clean(s: String) : List[String] = { valreg = """\w+""".r reg.findAllIn(s).toList

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
IN SCALA
COMPLETE THE FUNCTIONS (prod, overlap and similarity)
//(1) Complete the clean function below. It should find
// all words in a string using the regular expression
// \w+ and the library function
//
// some_regex.findAllIn(some_string)
//
// The words should be Returned as a list of strings.


def clean(s: String) : List[String] = {
valreg = """\w+""".r
reg.findAllIn(s).toList
}
//clean("list of strings")


//(2) The function occurrences calculates the number of times
// strings occur in a list of strings. These occurrences should
// be calculated as a Map from strings to integers.


def occurrences(xs: List[String]): Map[String, Int] = {
valcleaned = xs.distinct
cleaned.map(x => (x , xs.count(_==x))).toMap
}
//occurrences(List("a","a","b"))


//(3) This functions calculates the dot-product of two documents
// (list of strings). For this it calculates the occurrence
// maps from (2) and then multiplies the corresponding occurrences.
// If a string does not occur in a document, the product is zero.
// The function finally sums up all products.


def prod(lst1: List[String], lst2: List[String]) : Int = ???


//(4) Complete the functions overlap and similarity. The overlap of
// two documents is calculated by the formula given in the assignment
// description. The similarity of two strings is given by the overlap
// of the cleaned strings (see (1)).


def overlap(lst1: List[String], lst2: List[String]) : Double = ???

def similarity(s1: String, s2: String) : Double = ???



You can think of the Maps calculated under (2) as memory-efficient rep-
resentations of sparse "vectors". In this subtask you need to implement
the product of two such vectors, sometimes also called dot product of two
vectors.3
For this dot product, implement a function that takes two documents
(List[String]) as arguments. The function first calculates the (unique)
strings in both. For each string, it multiplies the corresponding occur-
rences in each document. If a string does not occur in one of the docu-
ments, then the product for this string is zero. At the end you need to
Transcribed Image Text:You can think of the Maps calculated under (2) as memory-efficient rep- resentations of sparse "vectors". In this subtask you need to implement the product of two such vectors, sometimes also called dot product of two vectors.3 For this dot product, implement a function that takes two documents (List[String]) as arguments. The function first calculates the (unique) strings in both. For each string, it multiplies the corresponding occur- rences in each document. If a string does not occur in one of the docu- ments, then the product for this string is zero. At the end you need to
add up all products. For the two documents in (2) the dot product is 7,
because
1 * 0 + 2* 2 + 1*0 + 1*3
= 7
%3D
"a"
"b"
"c"
"d"
Implement first a function that calculates the overlap between two docu-
ments, say di and d2, according to the formula
d1 · d2
max(dz, dz)
overlap(dı, d2)
where d? means d1 · dj and so on. You can expect this function to re-
turn a Double between 0 and 1. The overlap between the lists in (2) is
0.5384615384615384.
Second, implement a function that calculates the similarity of two strings,
by first extracting the substrings using the clean function from (1) and
then calculating the overlap of the resulting documents.
Transcribed Image Text:add up all products. For the two documents in (2) the dot product is 7, because 1 * 0 + 2* 2 + 1*0 + 1*3 = 7 %3D "a" "b" "c" "d" Implement first a function that calculates the overlap between two docu- ments, say di and d2, according to the formula d1 · d2 max(dz, dz) overlap(dı, d2) where d? means d1 · dj and so on. You can expect this function to re- turn a Double between 0 and 1. The overlap between the lists in (2) is 0.5384615384615384. Second, implement a function that calculates the similarity of two strings, by first extracting the substrings using the clean function from (1) and then calculating the overlap of the resulting documents.
Expert Solution
steps

Step by step

Solved in 3 steps with 2 images

Blurred answer
Knowledge Booster
Lists
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education