Sample Midterm 2 - With Answers, Updated

pdf

School

University of Pennsylvania *

*We aren’t endorsed by this school

Course

5450

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by momoshen17

CIS 450/550 Database and Information Systems Sample Midterm 2 - With Answers Instructions: The exam is closed book (and closed friend/other source of advice). Electronic devices (such as cell phones) must be turned off and stored away; you cannot use the internet, except to download the exam and upload your answers to Gradescope. However you may use a “cheat sheet” prepared according to the guidelines given earlier, which should be uploaded to Gradescope. The cheat sheet is worth 3 points. Part# Topic Max. Points Your Score 1 General MC 10 2 Transactions 20 3 B+ Trees 11 4 Relational Algebra 13 5 Query Cost Estimation 16 6 MongoDB 17 7 Neo4J 10 8 Cheat Sheet 3 Total: 100

Part 1. (10 points, General MC) Select the best answer or answers for each of the following questions. 1. (2 points) Which of the following are true about indexes? a. A search key does not have to be a candidate key. b. Every sparse index is clustered. c. Indexes do not impose additional overhead for record updates. d. Every search key in the index must be the search key of some record in the data file. ANSWER: a, b Indices must be updated as the tuples in the indexed relation are updated, and therefore create overhead. It is possible for a search key to appear in the index but not the file, e.g. a B+ tree where the tuple was deleted but the index was not updated. 2. (2 points) Which of the following are true about Block Nested Loop joins, where B is the number of buffered pages? a. It is always better than the naive Simple Nested Loop join b. The cost is the same no matter which relation is used as outer c. B-2 pages should be used for the larger relation d. The cost decreases as B increases ANSWER: a, d A Simple Nested Loop join does a scan of the inner relation per tuple of the outer relation, and a Page Oriented Nested Loop join does a scan of the inner relation per page of the outer relation. Block Nested Loop (BNL) does one scan of the inner relation per block of the outer relation (where a block is (B-2) pages). Assuming that there is more than one tuple, Simple Nested Loop join is always worse than BNL. Assuming that there are more than 3 buffers, Page Oriented Nested Loop is always worse than BNL. As B increases, there are fewer “chunks” and the cost decreases. The cost of any flavor of nested loop join is better if the smaller relation is used as outer.

3. (2 points) Which of the following are more true of a relational database than of MongoDB? a. Flexible schema b. Strong consistency guarantees c. Distributed query processing d. Joins that are native to the language ANSWER: b, d RDBMS have stronger consistency guarantees and join is one of the algebraic operators. The schema is fixed, so it is not flexible. It is also harder to parallelize the operation. 4. (2 points) You develop a database application in which relation R is only accessed using equality predicates over attribute A. Which file organization is best for R? a. Heap file with B+ tree index on A. b. Sorted file, where the sort field is A. c. Hash file, with hash key A. d. None of the above. ANSWER: c 5. (2 points) Consider a graph with nodes of type Student with the following properties (firstName, LastName, ID, Degree, GPA). Which of the following queries is equivalent to MATCH (s: Student) WHERE s.GPA = 3.0 RETURN s a. MATCH result= (s: Student {GPA:3.0}) RETURN result b. MATCH (s: Student {GPA:3.0}) RETURN s c. MATCH (s: Student) WHERE s.GPA>=3.0 AND s.GPA=<3.1 RETURN s d. MATCH (s: Student) WHERE s.GPA>2.99 RETURN s ANSWER: b

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Part 2. (20 points, Transactions) (a) (4 points) Consider the following schedule, in which locks have been omitted: T 1 : R ( A ) , T 2 : R ( A ) , T 2 : W ( A ) , T 2 : COMMIT, T 1 : R ( B ) , T 1 : W ( B ) , T 1 : COMMIT T 2 has isolation level SERIALIZABLE. What isolation level can T 1 have for this schedule to have occurred? Explain your answer. ANSWER: T1 must either not acquire a lock on A or must release it before the end of the transaction. Since T1 is READ-WRITE, it cannot be READ UNCOMMITTED. Therefore it must be READ COMMITTED in which the lock on A can be released early. (b) (4 points) Insert lock requests and releases into the schedule above to reflect the isolation level you chose for T 1 and SERIALIZABLE for T 2. ANSWER: T1 : SLOCK(A), T1 : R(A), T1 : REL(A), T2 : XLOCK(A), T2 : R(A), T2 : W(A), T2 : REL(A), T2 : COMMIT, T1 : XLOCK(B), T1 : R(B), T1 : W(B), T1 : REL(B), T1 : COMMIT (c) (4 points) Suppose that T 1, T 2 and T 3 are READ ONLY transactions with isolation level REPEATABLE READ, each of which reads data items A , B and C . Assuming that there are no other transactions in the system, is it possible for a deadlock to arise? Explain your answer. ANSWER: Under REPEATABLE READ, each transaction would acquire a shared lock on A, B and C and hold it until the end of the transaction. However, shared locks do not block each other so deadlock is not possible without other transactions in the system. (d) (4 points) Given the following transactions, shown with lock requests and releases: T 4 : SLOCK ( A ) , R ( A ) , XLOCK ( B ) , XLOCK ( C ) , REL ( A ) , R ( B ) , R ( C ) , W ( B ) , W ( C ) , REL ( B, C ) T 5 : XLOCK ( B ) , R ( B ) , XLOCK ( C ) , W ( B ) , W ( C ) , REL ( B, C ) T 6 : XLOCK ( A ) , R ( A ) , SLOCK ( C ) , R ( C ) , REL ( C ) , W ( A ) , REL ( A ) Is there any legal schedule of T 4, T 5 and T 6 in which deadlock can arise? Explain your answer. Note that the schedule must execute actions of the transactions in the order shown. ANSWER: No, because locks on the data items are always requested in the order A, B, C and there is no circular waiting possible.

(e) (4 points) Is there an legal execution of T 4, T 5, T 6 as shown above which is not conflict serializable? Explain your answer. ANSWER: Since T 4, T 5 and T 6 are all well-formed and two-phase, by the Two-Phase Locking Theorem any legal schedule is serializable. Note that neither T 4 nor T 6 are strict, since they release some lock early. However, they are two phase since they never acquire a lock after releasing a lock. Part 3. (11 points, B+ Tree) Consider the following B+ Tree This B+ tree has order 2 meaning that each non-root internal node must have between 2 and 4 children) and height 2 (meaning that there are 2 levels of index nodes). Consider now a tree of order 3, with a maximum of 3 data entries that can be stored on a leaf block (as in the previous question). (a) (3 points) What is the minimum number of children each internal node can have? ANSWER: Each node can have a minimum of 3 children, except for the root which can have only 2 children. (b) (4 points) What is the maximum number of data entry records that can be stored in a tree of order 3 and height 2? ANSWER: There are 6 children of the root, each of which can point to 6 leaf blocks, for a total of 36 leaf blocks. Each leaf block can hold 3 data entries, for a total of 108 data entries. (c) (4 points) What is the minimum number of records that can be stored in a tree of order 3 and height 2? ANSWER: There are 2 children of the root, each of which can point to a minimum of 3 leaf blocks for a total of 6 leaf blocks. Each leaf must hold at least 2 data entries, for a total of 12 data entries.

Part 4. (13 points, Relational Algebra) Consider the following schema: Person(PName, State), Votes(PName,CName), Candidate(CName, Position) 1. (5 points) Give a translation of the following query into the relational algebra, in which the projections and selections are pushed down as far as possible towards the base relations. Keep the order of relations the same as in the from clause. select p.PName from Person p, Votes v, Candidate c where p.PName= v.PName and p.State= "PA" and v.CName=c.CName and c.Position= "President"; ANSWER: 2. (4 points) For each of the following join orders of Person, Votes and Candidate say whether or not it would be considered by the System R query optimizer. If not, briefly explain your answer (1 sentence). a. 𝑃????? ⋈ 𝑃𝑁𝑎?? 𝑉???? ( ) ⋈ 𝐶𝑁𝑎?? 𝐶𝑎??𝑖?𝑎?? b. 𝑃????? ⋈ 𝑃?𝑎?? (𝑉???? ⋈ 𝐶𝑁𝑎?? 𝐶𝑎??𝑖?𝑎??) c. 𝑃????? × 𝐶𝑎??𝑖?𝑎?? ( ) ⋈ 𝑃𝑁𝑎??, 𝐶𝑁𝑎?? { } 𝐶𝑎??𝑖?𝑎?? ANSWER: a. Yes, it is left deep. b. No, it is not left deep. c. No, there are no join attributes between Person and Candidate Not covered in F2022, ignore

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

3. (4 points) Assume the following: a. 52 available buffers b. The Person relation fits on 5000 blocks c. The Votes relation fits on 2,000 blocks What is the optimal cost in terms of block I/Os of evaluating the join expression below using Block Nested Loop join? State which relation is the outer and which is inner. 𝑃????? ⋈ 𝑃𝑁𝑎?? 𝑉???? ANSWER: Votes is smaller, so use it as outer. The cost is 2000+(2000/50)5000= 2000+(40*5000)=202,000 Using Person as outer: 5000+(5000/50)2000= 205,000 Part 5. (16 points, Query Cost Estimation) Consider a database with a fixed page (block) size of 4,000 bytes. Suppose the size of table R is 4,000,000 bytes while the size of table S is 8,000,000 bytes. Each tuple of table R is 100 bytes while each tuple of table S is 400 bytes. (a) (2 points) How many tuples of table R can be stored in one page (block)? Calculate the same for table S . ANSWER: table R : 4,000 / 100 = 40 table S : 4,000 / 400 = 10 (b) (3 points) How many tuples of table R x S can be stored in one page (block). ANSWER: size of tuple in table R x S = size of tuple in table R + size of tuple in table S = 500 bytes. number of tuples of table R x S can be stored in one page = 4000 / 500 = 8.

( c) (3 points) How many page reads are needed to retrieve all tuples in R ? Calculate the same for table S . ANSWER: table R : 4,000,000 / 4,000 = 1 , 000 pages table S : 8,000,000 / 4,000 = 2 , 000 pages (d) (3 points) Calculate the I/0 cost (ignoring the output cost) of computing R ⋈ S using Block-Nested Loops using one bu ff er for R , one bu ff er for S , and one bu ff er for output. State at the beginning of your calculation which table you use in the outer loop. ANSWER: Use table R as the outer loop: number of I/0 = b(R) + b(R) * b(S) = 1,000 + 1,000 * 2,000 = 2,001,000 Use table S as the outer loop: number of I/0 = b(S) + b(S) * b(R) = 2,000 + 2,000 * 1,000 = 2,002,000 (e) (3 points) Calculate the I/0 cost (ignoring the output cost) of computing R ⋈ S using Block-Nested Loops join with 20 bu ff ers allocated for R and 5 bu ff ers allocated for S . State at the beginning of your calculation which table you use in the outer loop. ANSWER: Use table R as the outer loop: number of I/0 = b(R) + (b(R)/B(R)) * b(S) = 1,000 + (1,000/20) * 2,000 = 101,000 Use table S as the outer loop: number of I/0 = b(S) + (b(S)/B(S)) * b(R) = 2,000 + (2,000/5) * 1,000 = 402,000 (f) (2 points) Explain the difference between the results in (d) and (e). ANSWER: The use of additional bu ff ers reduces the number of times needed to read the inner

table blocks from disk. Part 6. (17 points, MongoDB and Map Reduce) Use the following “schema" for the below queries. Business { ‘type’: ‘business’, ‘business_id’: (encrypted business id), ‘name’: (business name), ‘neighborhoods’: [(hood names)], ‘full_address’: (localized address), ‘city’: (city), ‘state’: (state), ‘latitude’: latitude, ‘longitude’: longitude, ‘stars’: (star rating, rounded to half-stars), ‘review_count’: review count, ‘categories’: [(localized category names)] ‘open’: True / False (corresponds to closed, not business hours), ‘hours’: { (day_of_week) : { ‘open’: (HH:MM), ‘close’:(HH:MM) }, ... }, ‘attributes’: { (attribute_name): (attribute_value), ... }, } Check-in { ‘type’: ‘checkin’, ‘business_id’: (encrypted business id), ‘checkin_info’: { ‘0-0’: (number of checkins from 00:00 to 01:00 on all Sundays), ‘1-0’: (number of checkins from 01:00 to 02:00 on all Sundays), ... ‘14-4’: (number of checkins from 14:00 to 15:00 on all Thursdays), ... ‘23-6’: (number of checkins from 23:00 to 00:00 on all Saturdays) }, # if there was no checkin for a hour-day block it will not be in the dict }

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

(a) (2 points) Print the number of businesses in the dataset. db.business.count({}) (b) (3 points) Return all businesses in California (CA) that accept credit cards. For this, you will use "attributes" in Business (an unspecified set of key-value pairs, where the value is always "true" meaning that it has the property represented by the key). db.business.find({"attributes.Accepts Credit Cards": true,state:’CA’}) (c) (3 points) Return 2 sports bars located in Arizona (AZ). db.business.find({categories:{$in: [’SportsBars’]}, state:’AZ’}).limit(2) or equally well since you have one value you are testing for: db.business.find({categories: "Sports Bars", state:’AZ’}).limit(2) (d) (4 points) Return the number of restaurants that are either in Phoenix or Scottsdale (Arizona, AZ). db.business.find( {categories: {$in:["Restaurants"]}, state: "AZ", $or: [{city: "Phoenix"}, {city: "Scottsdale"}]}).count() or equally well since you have one value you are testing for: db.business.find( {categories: "Restaurants", state: "AZ", $or: [{city: "Phoenix"}, {city: "Scottsdale"}]} ).count() (e) (5 points) Write the following query using Map Reduce: For each city, return the average, maximum and minimum star rating over all businesses that are “Good for Kids". var mapfunc = function(){ emit(this.city, {sum:this.stars, min:this.stars, max:this.stars, count:1});} var redfunc = function(key,values){ var reduceVal= values[0]; for(var i=1;i<values.length;++i){ var temp = values[i]; reduceVal.sum+=temp.sum; reduceVal.count+=temp.count; countDocuments( ) Use the aggregation pipeline db.business.aggregate([ {$match: {"attributes.GoodForKids": true} }, {$group: {_id: "$city", min_star: {$min: "$star"}, max_star: {$max: "$star"}, avg_star: {$avg: "$star"} } ])

reduceVal.min = Math.min(reduceVal.min,temp.min); reduceVal.max = Math.max(reduceVal.max,temp.max); } return reduceVal;} var finalize = function(key,reduceVal){ reduceVal.avg = reduceVal.sum/reduceVal.count; return reduceVal;} db.business.mapReduce(mapfunc,redfunc,{ out: {inline:1}, query:{"attributes.Good for Kids": true}, finalize:finalize}) Part 7. (10 points, Neo4J) Consider the following graph database: create(:User{name:"Alice"})-[:LAST_POST]-> (:Post{topic:"science"})-[:PREV_POST]->(:Post{topic:"fitness"} ) create(:User{name:"Bob"})-[:LAST_POST]->(:Post{topic:"art"}) create(:User{name:"Charles"})-[:LAST_POST]-> (:Post{topic:"science"})-[:PREV_POST]-> (:Post{topic:"art"})-[:PREV_POST]->(:Post{topic:"health"}) match(p1:User{name:"Bob"}) match(p2:User{name:"Alice"}) create(p1)-[:KNOWS]->(p2) match(p1:User{name:"Alice"}) match(p2:User{name:"Charles"}) create(p1)-[:KNOWS]->(p2) match(p1:User{name:"Bob"}) match(p2:User{name:"Charles"}) create(p1)-[:KNOWS]->(p2)

(a) (2 points) Write a query to return all recent (i.e. last) posts by Bob’s friends. MATCH (p1:User{name:"Bob"})-[:KNOWS]->(p)-[:LAST_POST]->(q) RETURN q.title (b) (4 points) Write a query to return the names of all bloggers who have ever blogged about “science” topics. MATCH (p1:User)-[:LAST_POST]->(q:Post{title:”science”}) RETURN p1.name UNION MATCH (p2:User)-[:LAST_POST]->(q2:Post)-[:PREV_POST*]->(q3{title:”science”}) RETURN p2.name (c) (4 points) Write a query to return all previous (i.e. non-last) posts from users that both Alice and Bob know (i.e. a user that only Alice or only Bob knows does not qualify). MATCH (p1:User{name:"Bob"})-[:KNOWS]->(p)<-[:KNOWS]-(p2:User{name:"Alice"}) MATCH (p)-[:LAST_POST]->(q)-[:PREV_POST*]->(q2) RETURN q2; Part 8 (3 points, Cheat Sheet) Please upload your cheatsheet here. It can be either a PDF file, an image, or a Word document.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Sample Midterm 2 - With Answers, Updated

Related Documents