IN SCALA COMPLETE THE FUNCTIONS (prod, overlap and similarity) //(1) Complete the clean function below. It should find // all words in a string using the regular expression // \w+ and the library function // // some_regex.findAllIn(some_string) // // The words should be Returned as a list of strings. def clean(s: String) : List[String] = { valreg = """\w+""".r reg.findAllIn(s).toList
IN SCALA COMPLETE THE FUNCTIONS (prod, overlap and similarity) //(1) Complete the clean function below. It should find // all words in a string using the regular expression // \w+ and the library function // // some_regex.findAllIn(some_string) // // The words should be Returned as a list of strings. def clean(s: String) : List[String] = { valreg = """\w+""".r reg.findAllIn(s).toList
Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
Related questions
Question
IN SCALA
COMPLETE THE FUNCTIONS (prod, overlap and similarity)
//(1) Complete the clean function below. It should find
// all words in a string using the regular expression
// \w+ and the library function
//
// some_regex.findAllIn(some_string)
//
// The words should be Returned as a list of strings.
def clean(s: String) : List[String] = {
valreg = """\w+""".r
reg.findAllIn(s).toList
}
//clean("list of strings")
//(2) The function occurrences calculates the number of times
// strings occur in a list of strings. These occurrences should
// be calculated as a Map from strings to integers.
def occurrences(xs: List[String]): Map[String, Int] = {
valcleaned = xs.distinct
cleaned.map(x => (x , xs.count(_==x))).toMap
}
//occurrences(List("a","a","b"))
//(3) This functions calculates the dot-product of two documents
// (list of strings). For this it calculates the occurrence
// maps from (2) and then multiplies the corresponding occurrences.
// If a string does not occur in a document, the product is zero.
// The function finally sums up all products.
def prod(lst1: List[String], lst2: List[String]) : Int = ???
//(4) Complete the functions overlap and similarity. The overlap of
// two documents is calculated by the formula given in the assignment
// description. The similarity of two strings is given by the overlap
// of the cleaned strings (see (1)).
def overlap(lst1: List[String], lst2: List[String]) : Double = ???
def similarity(s1: String, s2: String) : Double = ???
![You can think of the Maps calculated under (2) as memory-efficient rep-
resentations of sparse "vectors". In this subtask you need to implement
the product of two such vectors, sometimes also called dot product of two
vectors.3
For this dot product, implement a function that takes two documents
(List[String]) as arguments. The function first calculates the (unique)
strings in both. For each string, it multiplies the corresponding occur-
rences in each document. If a string does not occur in one of the docu-
ments, then the product for this string is zero. At the end you need to](/v2/_next/image?url=https%3A%2F%2Fcontent.bartleby.com%2Fqna-images%2Fquestion%2Faa006240-8843-4db5-94ab-5ace3592c8b1%2Fc2f9c3ea-2b38-4b7a-a6d3-e8ef3bac694e%2Fiswbw46_processed.png&w=3840&q=75)
Transcribed Image Text:You can think of the Maps calculated under (2) as memory-efficient rep-
resentations of sparse "vectors". In this subtask you need to implement
the product of two such vectors, sometimes also called dot product of two
vectors.3
For this dot product, implement a function that takes two documents
(List[String]) as arguments. The function first calculates the (unique)
strings in both. For each string, it multiplies the corresponding occur-
rences in each document. If a string does not occur in one of the docu-
ments, then the product for this string is zero. At the end you need to

Transcribed Image Text:add up all products. For the two documents in (2) the dot product is 7,
because
1 * 0 + 2* 2 + 1*0 + 1*3
= 7
%3D
"a"
"b"
"c"
"d"
Implement first a function that calculates the overlap between two docu-
ments, say di and d2, according to the formula
d1 · d2
max(dz, dz)
overlap(dı, d2)
where d? means d1 · dj and so on. You can expect this function to re-
turn a Double between 0 and 1. The overlap between the lists in (2) is
0.5384615384615384.
Second, implement a function that calculates the similarity of two strings,
by first extracting the substrings using the clean function from (1) and
then calculating the overlap of the resulting documents.
Expert Solution

This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by step
Solved in 3 steps with 2 images

Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Recommended textbooks for you

Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education

Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON

Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON

Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education

Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON

Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON

C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON

Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning

Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education