IN SCALA COMPLETE THE FUNCTIONS (prod, overlap and similarity) //(1) Complete the clean function below. It should find // all words in a string using the regular expression // \w+ and the library function // // some_regex.findAllIn(some_string) // // The words should be Returned as a list of strings. def clean(s: String) : List[String] = { valreg = """\w+""".r reg.findAllIn(s).toList
IN SCALA COMPLETE THE FUNCTIONS (prod, overlap and similarity) //(1) Complete the clean function below. It should find // all words in a string using the regular expression // \w+ and the library function // // some_regex.findAllIn(some_string) // // The words should be Returned as a list of strings. def clean(s: String) : List[String] = { valreg = """\w+""".r reg.findAllIn(s).toList
Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
Related questions
Question
IN SCALA
COMPLETE THE FUNCTIONS (prod, overlap and similarity)
//(1) Complete the clean function below. It should find
// all words in a string using the regular expression
// \w+ and the library function
//
// some_regex.findAllIn(some_string)
//
// The words should be Returned as a list of strings.
def clean(s: String) : List[String] = {
valreg = """\w+""".r
reg.findAllIn(s).toList
}
//clean("list of strings")
//(2) The function occurrences calculates the number of times
// strings occur in a list of strings. These occurrences should
// be calculated as a Map from strings to integers.
def occurrences(xs: List[String]): Map[String, Int] = {
valcleaned = xs.distinct
cleaned.map(x => (x , xs.count(_==x))).toMap
}
//occurrences(List("a","a","b"))
//(3) This functions calculates the dot-product of two documents
// (list of strings). For this it calculates the occurrence
// maps from (2) and then multiplies the corresponding occurrences.
// If a string does not occur in a document, the product is zero.
// The function finally sums up all products.
def prod(lst1: List[String], lst2: List[String]) : Int = ???
//(4) Complete the functions overlap and similarity. The overlap of
// two documents is calculated by the formula given in the assignment
// description. The similarity of two strings is given by the overlap
// of the cleaned strings (see (1)).
def overlap(lst1: List[String], lst2: List[String]) : Double = ???
def similarity(s1: String, s2: String) : Double = ???
![You can think of the Maps calculated under (2) as memory-efficient rep-
resentations of sparse "vectors". In this subtask you need to implement
the product of two such vectors, sometimes also called dot product of two
vectors.3
For this dot product, implement a function that takes two documents
(List[String]) as arguments. The function first calculates the (unique)
strings in both. For each string, it multiplies the corresponding occur-
rences in each document. If a string does not occur in one of the docu-
ments, then the product for this string is zero. At the end you need to](/v2/_next/image?url=https%3A%2F%2Fcontent.bartleby.com%2Fqna-images%2Fquestion%2Faa006240-8843-4db5-94ab-5ace3592c8b1%2Fc2f9c3ea-2b38-4b7a-a6d3-e8ef3bac694e%2Fiswbw46_processed.png&w=3840&q=75)
Transcribed Image Text:You can think of the Maps calculated under (2) as memory-efficient rep-
resentations of sparse "vectors". In this subtask you need to implement
the product of two such vectors, sometimes also called dot product of two
vectors.3
For this dot product, implement a function that takes two documents
(List[String]) as arguments. The function first calculates the (unique)
strings in both. For each string, it multiplies the corresponding occur-
rences in each document. If a string does not occur in one of the docu-
ments, then the product for this string is zero. At the end you need to
![add up all products. For the two documents in (2) the dot product is 7,
because
1 * 0 + 2* 2 + 1*0 + 1*3
= 7
%3D
"a"
"b"
"c"
"d"
Implement first a function that calculates the overlap between two docu-
ments, say di and d2, according to the formula
d1 · d2
max(dz, dz)
overlap(dı, d2)
where d? means d1 · dj and so on. You can expect this function to re-
turn a Double between 0 and 1. The overlap between the lists in (2) is
0.5384615384615384.
Second, implement a function that calculates the similarity of two strings,
by first extracting the substrings using the clean function from (1) and
then calculating the overlap of the resulting documents.](/v2/_next/image?url=https%3A%2F%2Fcontent.bartleby.com%2Fqna-images%2Fquestion%2Faa006240-8843-4db5-94ab-5ace3592c8b1%2Fc2f9c3ea-2b38-4b7a-a6d3-e8ef3bac694e%2Fnn7l2hle_processed.png&w=3840&q=75)
Transcribed Image Text:add up all products. For the two documents in (2) the dot product is 7,
because
1 * 0 + 2* 2 + 1*0 + 1*3
= 7
%3D
"a"
"b"
"c"
"d"
Implement first a function that calculates the overlap between two docu-
ments, say di and d2, according to the formula
d1 · d2
max(dz, dz)
overlap(dı, d2)
where d? means d1 · dj and so on. You can expect this function to re-
turn a Double between 0 and 1. The overlap between the lists in (2) is
0.5384615384615384.
Second, implement a function that calculates the similarity of two strings,
by first extracting the substrings using the clean function from (1) and
then calculating the overlap of the resulting documents.
Expert Solution
![](/static/compass_v2/shared-icons/check-mark.png)
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by step
Solved in 3 steps with 2 images
![Blurred answer](/static/compass_v2/solution-images/blurred-answer.jpg)
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Recommended textbooks for you
![Database System Concepts](https://www.bartleby.com/isbn_cover_images/9780078022159/9780078022159_smallCoverImage.jpg)
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
![Starting Out with Python (4th Edition)](https://www.bartleby.com/isbn_cover_images/9780134444321/9780134444321_smallCoverImage.gif)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
![Digital Fundamentals (11th Edition)](https://www.bartleby.com/isbn_cover_images/9780132737968/9780132737968_smallCoverImage.gif)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
![Database System Concepts](https://www.bartleby.com/isbn_cover_images/9780078022159/9780078022159_smallCoverImage.jpg)
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
![Starting Out with Python (4th Edition)](https://www.bartleby.com/isbn_cover_images/9780134444321/9780134444321_smallCoverImage.gif)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
![Digital Fundamentals (11th Edition)](https://www.bartleby.com/isbn_cover_images/9780132737968/9780132737968_smallCoverImage.gif)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
![C How to Program (8th Edition)](https://www.bartleby.com/isbn_cover_images/9780133976892/9780133976892_smallCoverImage.gif)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
![Database Systems: Design, Implementation, & Manag…](https://www.bartleby.com/isbn_cover_images/9781337627900/9781337627900_smallCoverImage.gif)
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
![Programmable Logic Controllers](https://www.bartleby.com/isbn_cover_images/9780073373843/9780073373843_smallCoverImage.gif)
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education