Answered: IN SCALA COMPLETE THE FUNCTIONS (prod,…

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

Related questions

Question

IN SCALA

COMPLETE THE FUNCTIONS (prod, overlap and similarity)

//(1) Complete the clean function below. It should find

// all words in a string using the regular expression

// \w+ and the library function

// some_regex.findAllIn(some_string)

// The words should be Returned as a list of strings.

def clean(s: String) : List[String] = {

valreg = """\w+""".r

reg.findAllIn(s).toList

}

//clean("list of strings")

//(2) The function occurrences calculates the number of times

// strings occur in a list of strings. These occurrences should

// be calculated as a Map from strings to integers.

def occurrences(xs: List[String]): Map[String, Int] = {

valcleaned = xs.distinct

cleaned.map(x => (x , xs.count(_==x))).toMap

}

//occurrences(List("a","a","b"))

//(3) This functions calculates the dot-product of two documents

// (list of strings). For this it calculates the occurrence

// maps from (2) and then multiplies the corresponding occurrences.

// If a string does not occur in a document, the product is zero.

// The function finally sums up all products.

def prod(lst1: List[String], lst2: List[String]) : Int = ???

//(4) Complete the functions overlap and similarity. The overlap of

// two documents is calculated by the formula given in the assignment

// description. The similarity of two strings is given by the overlap

// of the cleaned strings (see (1)).

def overlap(lst1: List[String], lst2: List[String]) : Double = ???

def similarity(s1: String, s2: String) : Double = ???

You can think of the Maps calculated under (2) as memory-efficient rep-
resentations of sparse "vectors". In this subtask you need to implement
the product of two such vectors, sometimes also called dot product of two
vectors.3
For this dot product, implement a function that takes two documents
(List[String]) as arguments. The function first calculates the (unique)
strings in both. For each string, it multiplies the corresponding occur-
rences in each document. If a string does not occur in one of the docu-
ments, then the product for this string is zero. At the end you need to

add up all products. For the two documents in (2) the dot product is 7,
because
1 * 0 + 2* 2 + 1*0 + 1*3
= 7
%3D
"a"
"b"
"c"
"d"
Implement first a function that calculates the overlap between two docu-
ments, say di and d2, according to the formula
d1 · d2
max(dz, dz)
overlap(dı, d2)
where d? means d1 · dj and so on. You can expect this function to re-
turn a Double between 0 and 1. The overlap between the lists in (2) is
0.5384615384615384.
Second, implement a function that calculates the similarity of two strings,
by first extracting the substrings using the clean function from (1) and
then calculating the overlap of the resulting documents.

Expert Solution

Step by step

Solved in 3 steps with 2 images

SEE SOLUTION Check out a sample Q&A here

Knowledge Booster

Learn more about

Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.

Similar questions

SEE MORE QUESTIONS

Recommended textbooks for you

Database System Concepts

Computer Science

ISBN:

9780078022159

Author:

Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:

McGraw-Hill Education

Starting Out with Python (4th Edition)

Computer Science

ISBN:

9780134444321

Author:

Tony Gaddis

Publisher:

PEARSON

Digital Fundamentals (11th Edition)

Computer Science

ISBN:

9780132737968

Author:

Thomas L. Floyd

Publisher:

PEARSON

Database System Concepts

Computer Science

ISBN:

9780078022159

Author:

Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:

McGraw-Hill Education

Starting Out with Python (4th Edition)

Computer Science

ISBN:

9780134444321

Author:

Tony Gaddis

Publisher:

PEARSON

Digital Fundamentals (11th Edition)

Computer Science

ISBN:

9780132737968

Author:

Thomas L. Floyd

Publisher:

PEARSON

C How to Program (8th Edition)

Computer Science

ISBN:

9780133976892

Author:

Paul J. Deitel, Harvey Deitel

Publisher:

PEARSON

Database Systems: Design, Implementation, & Manag…

Computer Science

ISBN:

9781337627900

Author:

Carlos Coronel, Steven Morris

Publisher:

Cengage Learning

Programmable Logic Controllers

Computer Science

ISBN:

9780073373843

Author:

Frank D. Petruzella

Publisher:

McGraw-Hill Education

SEE MORE TEXTBOOKS