Sample Midterm 2 - With Answers, Updated
pdf
keyboard_arrow_up
School
University of Pennsylvania *
*We aren’t endorsed by this school
Course
5450
Subject
Computer Science
Date
Feb 20, 2024
Type
Pages
12
Uploaded by momoshen17
CIS 450/550 Database and Information Systems
Sample Midterm 2 - With Answers
Instructions:
The exam is closed book (and closed friend/other source of advice).
Electronic
devices (such as cell phones) must be turned off and stored away; you cannot use the
internet,
except to download the exam and upload your answers to Gradescope.
However you may use a “cheat sheet” prepared according to the guidelines given
earlier, which should be uploaded to Gradescope.
The cheat sheet is worth 3 points.
Part#
Topic
Max. Points
Your Score
1
General MC
10
2
Transactions
20
3
B+ Trees
11
4
Relational Algebra
13
5
Query Cost Estimation
16
6
MongoDB
17
7
Neo4J
10
8
Cheat Sheet
3
Total:
100
Part 1. (10 points, General MC)
Select the best answer or answers for each of the following questions.
1.
(2 points)
Which of the following are true about indexes?
a.
A search key does not have to be a candidate key.
b.
Every sparse index is clustered.
c.
Indexes do not impose additional overhead for record updates.
d.
Every search key in the index must be the search key of some record in the data file.
ANSWER: a, b
Indices must be updated as the tuples in the indexed relation are updated, and therefore
create overhead.
It is possible for a search key to appear in the index but not the file, e.g. a B+ tree where
the tuple was deleted but the index was not updated.
2.
(2 points)
Which of the following are true about Block Nested Loop joins, where B is the
number of buffered pages?
a.
It is always better than the naive Simple Nested Loop join
b.
The cost is the same no matter which relation is used as outer
c.
B-2 pages should be used for the larger relation
d.
The cost decreases as B increases
ANSWER: a, d
A Simple Nested Loop join does a scan of the inner relation
per tuple
of the outer relation,
and a Page Oriented Nested Loop join does a scan of the inner relation
per page
of the
outer relation.
Block Nested Loop (BNL) does one scan of the inner relation
per block
of
the outer relation (where a block is (B-2) pages). Assuming that there is more than one
tuple, Simple Nested Loop join is always worse than BNL. Assuming that there are more
than 3 buffers, Page Oriented Nested Loop is always worse than BNL.
As B increases,
there are fewer “chunks” and the cost decreases.
The cost of any flavor of nested loop join is better if the smaller relation is used as outer.
3.
(2 points)
Which of the following are more true of a relational database than of MongoDB?
a.
Flexible schema
b.
Strong consistency guarantees
c.
Distributed query processing
d.
Joins that are native to the language
ANSWER: b, d
RDBMS have stronger consistency guarantees and join is one of the algebraic operators.
The schema is fixed, so it is not flexible. It is also harder to parallelize the operation.
4.
(2 points)
You develop a database application in which relation R is only accessed using
equality predicates over attribute A. Which file organization is best for R?
a. Heap file with B+ tree index on A.
b. Sorted file, where the sort field is A.
c. Hash file, with hash key A.
d. None of the above.
ANSWER: c
5.
(2 points)
Consider a graph with nodes of type Student with the following properties
(firstName, LastName, ID, Degree, GPA). Which of the following queries is equivalent to
MATCH (s: Student) WHERE s.GPA = 3.0 RETURN s
a. MATCH result= (s: Student {GPA:3.0}) RETURN result
b. MATCH (s: Student {GPA:3.0}) RETURN s
c. MATCH (s: Student) WHERE s.GPA>=3.0 AND s.GPA=<3.1 RETURN s
d. MATCH (s: Student) WHERE s.GPA>2.99 RETURN s
ANSWER: b
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Part 2. (20 points, Transactions)
(a) (4 points)
Consider the following schedule, in which locks have been omitted:
T
1 :
R
(
A
)
, T
2 :
R
(
A
)
, T
2 :
W
(
A
)
, T
2 :
COMMIT, T
1 :
R
(
B
)
, T
1 :
W
(
B
)
, T
1 :
COMMIT
T
2 has isolation level SERIALIZABLE. What isolation level can
T
1 have for this schedule to
have occurred? Explain your answer.
ANSWER: T1 must either not acquire a lock on A or must release it before the end of the
transaction. Since T1 is READ-WRITE, it cannot be READ UNCOMMITTED. Therefore it must
be READ COMMITTED in which the lock on A can be released
early.
(b) (4 points)
Insert lock requests and releases into the schedule above to reflect the isolation
level you chose for
T
1 and SERIALIZABLE for
T
2.
ANSWER:
T1 : SLOCK(A), T1 : R(A), T1 : REL(A), T2 : XLOCK(A), T2 : R(A), T2 : W(A), T2 : REL(A), T2 :
COMMIT, T1 : XLOCK(B), T1 : R(B), T1 : W(B), T1 :
REL(B), T1 : COMMIT
(c) (4 points)
Suppose that
T
1,
T
2 and
T
3 are READ ONLY transactions with isolation level
REPEATABLE READ, each of which reads data items
A
,
B
and
C
. Assuming that there are no
other transactions in the system, is it possible for a deadlock to arise? Explain your answer.
ANSWER: Under REPEATABLE READ, each transaction would acquire a shared lock on A, B
and C and hold it until the end of the transaction. However, shared locks do not block each other
so deadlock is not possible without other transactions in the system.
(d) (4 points)
Given the following transactions, shown with lock requests and releases:
T
4 :
SLOCK
(
A
)
, R
(
A
)
, XLOCK
(
B
)
, XLOCK
(
C
)
, REL
(
A
)
, R
(
B
)
, R
(
C
)
, W
(
B
)
, W
(
C
)
, REL
(
B, C
)
T
5 :
XLOCK
(
B
)
, R
(
B
)
, XLOCK
(
C
)
, W
(
B
)
, W
(
C
)
, REL
(
B, C
)
T
6 :
XLOCK
(
A
)
, R
(
A
)
, SLOCK
(
C
)
, R
(
C
)
, REL
(
C
)
, W
(
A
)
, REL
(
A
)
Is there any legal schedule of
T
4,
T
5 and
T
6 in which deadlock can arise? Explain your answer.
Note that the schedule must execute actions of the transactions in the order shown.
ANSWER: No, because locks on the data items are always requested in the order A, B, C and
there is no circular waiting possible.
(e) (4 points)
Is there an legal execution of
T
4,
T
5,
T
6 as shown above which is not conflict
serializable? Explain your answer.
ANSWER: Since
T
4,
T
5 and
T
6 are all well-formed and two-phase, by the Two-Phase Locking
Theorem any legal schedule is serializable. Note that neither
T
4 nor
T
6 are strict, since they
release some lock early. However, they are two phase since they never acquire a lock after
releasing a lock.
Part 3. (11 points, B+ Tree)
Consider the following B+ Tree
This B+ tree has
order
2 meaning that each non-root internal node must have between 2 and 4
children) and
height
2 (meaning that there are 2 levels of index nodes). Consider now a tree of
order 3, with a maximum of 3 data entries that can be stored on a leaf block (as in the previous
question).
(a) (3 points)
What is the
minimum
number of children each internal node can have?
ANSWER: Each node can have a minimum of 3 children, except for the root which can have
only 2 children.
(b) (4 points)
What is the maximum number of data entry records that can be stored in a
tree of order 3 and height 2?
ANSWER: There are 6 children of the root, each of which can point to 6 leaf blocks, for a
total of 36 leaf blocks. Each leaf block can hold 3 data entries, for a total of 108 data entries.
(c) (4 points)
What is the minimum number of records that can be stored in a tree of order 3
and height 2?
ANSWER: There are 2 children of the root, each of which can point to a minimum of 3 leaf
blocks for a total of 6 leaf blocks. Each leaf must hold at least 2 data entries, for a total of 12
data entries.
Part 4. (13 points, Relational Algebra)
Consider the following schema:
Person(PName, State), Votes(PName,CName), Candidate(CName, Position)
1.
(5 points)
Give a translation of the following query into the relational algebra, in which the
projections and selections are pushed down as far as possible towards the base relations.
Keep the order of relations the same as in the from clause.
select p.PName
from Person p, Votes v, Candidate c
where p.PName= v.PName and p.State= "PA"
and v.CName=c.CName and c.Position= "President";
ANSWER:
2.
(4 points)
For each of the following join orders of Person, Votes and Candidate say
whether or not it would be considered by the System R query optimizer.
If not, briefly
explain your answer (1 sentence).
a.
𝑃????? ⋈
𝑃𝑁𝑎??
𝑉????
(
)
⋈
𝐶𝑁𝑎??
𝐶𝑎??𝑖?𝑎??
b.
𝑃????? ⋈
𝑃?𝑎??
(𝑉???? ⋈
𝐶𝑁𝑎??
𝐶𝑎??𝑖?𝑎??)
c.
𝑃????? × 𝐶𝑎??𝑖?𝑎??
(
) ⋈
𝑃𝑁𝑎??, 𝐶𝑁𝑎??
{
}
𝐶𝑎??𝑖?𝑎??
ANSWER:
a.
Yes, it is left deep.
b.
No, it is not left deep.
c.
No, there are no join attributes between Person and Candidate
Not covered in F2022, ignore
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
3.
(4 points)
Assume the following:
a.
52 available buffers
b.
The Person relation fits on 5000 blocks
c.
The Votes relation fits on 2,000 blocks
What is the optimal cost in terms of block I/Os of evaluating the join expression below
using Block Nested Loop join? State which relation is the outer and which is inner.
𝑃????? ⋈
𝑃𝑁𝑎??
𝑉????
ANSWER:
Votes is smaller, so use it as outer. The cost is 2000+(2000/50)5000=
2000+(40*5000)=202,000
Using Person as outer: 5000+(5000/50)2000= 205,000
Part 5. (16 points, Query Cost Estimation)
Consider a database with a fixed page (block) size of 4,000 bytes. Suppose the size of
table
R
is 4,000,000 bytes while the size of table
S
is 8,000,000 bytes. Each tuple of
table
R
is 100 bytes while each tuple of table
S
is 400 bytes.
(a) (2 points)
How many tuples of table
R
can be stored in one page (block)?
Calculate the same for table
S
.
ANSWER:
table
R
: 4,000 / 100 = 40
table
S
: 4,000 / 400 = 10
(b) (3 points)
How many tuples of table
R
x
S
can be stored in one page (block).
ANSWER:
size of tuple in table
R
x
S
= size of tuple in table
R
+ size of tuple in table
S
= 500
bytes. number of tuples of table
R
x
S
can be stored in one page = 4000 / 500 = 8.
(
c) (3 points)
How many page reads are needed to retrieve all tuples in
R
?
Calculate the same for table
S
.
ANSWER:
table
R
: 4,000,000 / 4,000 = 1
,
000
pages
table
S
: 8,000,000 / 4,000 = 2
,
000
pages
(d) (3 points)
Calculate the I/0 cost (ignoring the output cost) of computing
R
⋈
S
using
Block-Nested Loops using one bu
ff
er for
R
, one bu
ff
er for
S
, and one bu
ff
er for
output. State at the beginning of your calculation which table you use in the outer
loop.
ANSWER:
Use table
R
as the outer loop:
number of I/0 = b(R) + b(R) * b(S) = 1,000 + 1,000 * 2,000 = 2,001,000
Use table
S
as the outer loop:
number of I/0 = b(S) + b(S) * b(R) = 2,000 + 2,000 * 1,000 = 2,002,000
(e) (3 points)
Calculate the I/0 cost (ignoring the output cost) of computing
R
⋈
S
using
Block-Nested Loops join with 20 bu
ff
ers allocated for
R
and 5 bu
ff
ers allocated for
S
.
State at the beginning of your calculation which table you use in the outer loop.
ANSWER:
Use table
R
as the outer loop:
number of I/0 = b(R) + (b(R)/B(R)) * b(S) = 1,000 + (1,000/20) * 2,000 =
101,000 Use table
S
as the outer loop:
number of I/0 = b(S) + (b(S)/B(S)) * b(R) = 2,000 + (2,000/5) * 1,000 = 402,000
(f) (2 points)
Explain the difference between the results in (d) and (e).
ANSWER:
The use of additional bu
ff
ers reduces the number of times needed to read the inner
table blocks from disk.
Part 6. (17 points, MongoDB and Map Reduce)
Use the following “schema" for the below queries.
Business
{
‘type’: ‘business’,
‘business_id’: (encrypted business id),
‘name’: (business name),
‘neighborhoods’: [(hood names)],
‘full_address’: (localized address),
‘city’: (city),
‘state’: (state),
‘latitude’: latitude,
‘longitude’: longitude,
‘stars’: (star rating, rounded to half-stars),
‘review_count’: review count,
‘categories’: [(localized category names)]
‘open’: True / False (corresponds to closed, not business hours),
‘hours’: {
(day_of_week) : {
‘open’: (HH:MM),
‘close’:(HH:MM) },
...
},
‘attributes’: {
(attribute_name): (attribute_value),
...
},
}
Check-in
{
‘type’: ‘checkin’,
‘business_id’: (encrypted business id),
‘checkin_info’: {
‘0-0’: (number of checkins from 00:00 to 01:00 on all Sundays),
‘1-0’: (number of checkins from 01:00 to 02:00 on all Sundays), ...
‘14-4’: (number of checkins from 14:00 to 15:00 on all Thursdays), ...
‘23-6’: (number of checkins from 23:00 to 00:00 on all Saturdays)
}, # if there was no checkin for a hour-day block it will not be in the dict
}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(a) (2 points)
Print the number of businesses in the dataset.
db.business.count({})
(b) (3 points)
Return all businesses in California (CA) that accept credit cards. For this, you
will use
"attributes" in Business (an unspecified set of key-value pairs, where the value
is always "true" meaning that it has the property represented by the key).
db.business.find({"attributes.Accepts Credit Cards": true,state:’CA’})
(c) (3 points)
Return 2 sports bars located in Arizona (AZ).
db.business.find({categories:{$in: [’SportsBars’]}, state:’AZ’}).limit(2)
or equally well since you have one value you are testing for:
db.business.find({categories: "Sports Bars", state:’AZ’}).limit(2)
(d) (4 points)
Return the number of restaurants that are either in Phoenix or Scottsdale
(Arizona, AZ).
db.business.find( {categories: {$in:["Restaurants"]}, state: "AZ",
$or: [{city: "Phoenix"}, {city: "Scottsdale"}]}).count()
or equally well since you have one value you are testing for:
db.business.find( {categories: "Restaurants", state: "AZ",
$or: [{city: "Phoenix"}, {city: "Scottsdale"}]} ).count()
(e) (5 points)
Write the following query using Map Reduce: For each city, return the average,
maximum and minimum star rating over all businesses that are “Good for Kids".
var mapfunc = function(){
emit(this.city, {sum:this.stars, min:this.stars,
max:this.stars, count:1});}
var redfunc = function(key,values){
var reduceVal= values[0];
for(var i=1;i<values.length;++i){
var temp = values[i];
reduceVal.sum+=temp.sum;
reduceVal.count+=temp.count;
countDocuments( )
Use the aggregation pipeline
db.business.aggregate([
{$match: {"attributes.GoodForKids": true} },
{$group: {_id: "$city",
min_star: {$min: "$star"},
max_star: {$max: "$star"},
avg_star: {$avg: "$star"} }
])
reduceVal.min = Math.min(reduceVal.min,temp.min);
reduceVal.max = Math.max(reduceVal.max,temp.max);
}
return reduceVal;}
var finalize = function(key,reduceVal){
reduceVal.avg = reduceVal.sum/reduceVal.count;
return reduceVal;}
db.business.mapReduce(mapfunc,redfunc,{
out: {inline:1},
query:{"attributes.Good for Kids": true}, finalize:finalize})
Part 7. (10 points, Neo4J)
Consider the following graph database:
create(:User{name:"Alice"})-[:LAST_POST]->
(:Post{topic:"science"})-[:PREV_POST]->(:Post{topic:"fitness"}
) create(:User{name:"Bob"})-[:LAST_POST]->(:Post{topic:"art"})
create(:User{name:"Charles"})-[:LAST_POST]->
(:Post{topic:"science"})-[:PREV_POST]->
(:Post{topic:"art"})-[:PREV_POST]->(:Post{topic:"health"})
match(p1:User{name:"Bob"})
match(p2:User{name:"Alice"})
create(p1)-[:KNOWS]->(p2)
match(p1:User{name:"Alice"})
match(p2:User{name:"Charles"})
create(p1)-[:KNOWS]->(p2)
match(p1:User{name:"Bob"})
match(p2:User{name:"Charles"})
create(p1)-[:KNOWS]->(p2)
(a) (2 points)
Write a query to return all recent (i.e. last) posts by Bob’s friends.
MATCH (p1:User{name:"Bob"})-[:KNOWS]->(p)-[:LAST_POST]->(q)
RETURN q.title
(b) (4 points)
Write a query to return the names of all bloggers who have ever blogged about
“science” topics.
MATCH (p1:User)-[:LAST_POST]->(q:Post{title:”science”})
RETURN p1.name
UNION
MATCH (p2:User)-[:LAST_POST]->(q2:Post)-[:PREV_POST*]->(q3{title:”science”})
RETURN p2.name
(c) (4 points)
Write a query to return all previous (i.e. non-last) posts from users that
both
Alice
and Bob know (i.e. a user that only Alice or only Bob knows does not qualify).
MATCH (p1:User{name:"Bob"})-[:KNOWS]->(p)<-[:KNOWS]-(p2:User{name:"Alice"})
MATCH (p)-[:LAST_POST]->(q)-[:PREV_POST*]->(q2)
RETURN q2;
Part 8 (3 points, Cheat Sheet)
Please upload your cheatsheet here. It can be either a PDF file, an image, or a Word document.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/450c0/450c059fcf295507a821eaa4ed2d99dbde539a42" alt="Text book image"
Enhanced Discovering Computers 2017 (Shelly Cashm...
Computer Science
ISBN:9781305657458
Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell
Publisher:Cengage Learning
Microsoft Windows 10 Comprehensive 2019
Computer Science
ISBN:9780357392607
Author:FREUND
Publisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
data:image/s3,"s3://crabby-images/d875a/d875ad727e1dd57e27bb26bd93706ed7d02d4918" alt="Text book image"
Recommended textbooks for you
- Enhanced Discovering Computers 2017 (Shelly Cashm...Computer ScienceISBN:9781305657458Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. CampbellPublisher:Cengage LearningMicrosoft Windows 10 Comprehensive 2019Computer ScienceISBN:9780357392607Author:FREUNDPublisher:CengageNp Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
data:image/s3,"s3://crabby-images/450c0/450c059fcf295507a821eaa4ed2d99dbde539a42" alt="Text book image"
Enhanced Discovering Computers 2017 (Shelly Cashm...
Computer Science
ISBN:9781305657458
Author:Misty E. Vermaat, Susan L. Sebok, Steven M. Freund, Mark Frydenberg, Jennifer T. Campbell
Publisher:Cengage Learning
Microsoft Windows 10 Comprehensive 2019
Computer Science
ISBN:9780357392607
Author:FREUND
Publisher:Cengage
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
data:image/s3,"s3://crabby-images/d875a/d875ad727e1dd57e27bb26bd93706ed7d02d4918" alt="Text book image"