final-S20

docx

School

New York University *

*We aren’t endorsed by this school

Course

6083

Subject

Computer Science

Date

Jan 9, 2024

Type

docx

Pages

9

Uploaded by BaronManateeMaster979

Report
Q3 Problem 1 17 Points Suppose we have an database modeling users and tweets in Twitter, consisting of the following four relations: User( uid , uname, uemail, ucity, country); Tweet( tid , uid, ttext, ttime, date); Follows( uid , followeruid}); Retweet( uid , tid , rttime, rdate); A tweet is identified by a tid. Each tweet is made by a particular user at a certain time and date, and consists of a text of up to 140 characters (varchar(140)). Users can be followed by other users that will then receive the first user’s tweets. Users may also retweet (basically forward) tweets made by other users. This is, of course, a simplified schema, so we do not model issues such as blocking or hashtags. Q3.1 Problem 1 (a) 3 Points List all the foreign keys in the above schema. Q3.2 Problem 1 (b) 2 Points In the above schema, can one user retweet the same tweet several times? Why or why not? Q3.3 Problem 1 (c) 8 Points Write SQL statements for the following queries: (i) For each user, list her uid and name, the number of tweets she has sent, and the number of followers she has. (ii) For each tweet, list the tid, the uid of the user that wrote it, and the number of distinct users who received the tweet. (Users receive a tweet if they either follow the user who wrote it, or follow another user that retweeted it.)
(iii) For each country, list the city with the most tweets sent by people in that city. (iv) List the uid and name of any user who has at least 1000 followers, but where less than ten of the followers live in the United States. You can upload file(s) here for Problem 1(c) if you prefer to upload screenshot/scan of handwriting instead of the text box. No files uploaded Q3.4 Problem 1 (d) 4 Points Write statements in Relational Algebra for the first and second query. (i) For each user, list her uid and name, the number of tweets she has sent, and the number of followers she has. (ii) For each tweet, list the tid, the uid of the user that wrote it, and the number of distinct users who received the tweet. (Users receive a tweet if they either follow the user who wrote it, or follow another user that retweeted it.) You can upload file(s) here for Problem 1(d ) if you prefer to upload screenshot/scan of handwriting instead of text box. No files uploaded Q4 Problem 2 14 Points Suppose you have a database modeling an online auction site somewhat similar to eBay, given by the following schema: USER ( uid , uname, city, state) ITEMTYPE ( iid , itemname, description) AUCTION ( seller, iid, starttime , endtime, minbid, condition) Note: seller references uid in USER, and iid references iid in ITEMTYPE BID ( bidder, seller, iid, starttime, bidtime , bidprice) Note: bidder references uid in USER, and (seller, iid, starttime) references AUCTION In this schema, users can sell items at auction, and can also place bids on auctions by other users. An auction has a start time, when bidding begins, and an end time (both of type datetime). The auction is won by the highest
bid made before the end time, provided the bid is higher than the minimum bid. The description of the item that is being sold in the auction is kept in a separate table ITEMTYPE, because several items of the same type (say, a certain book or model of sneakers) may be sold in different auctions by the same or different users, and this way we avoid storing the same information repeatedly. (Note that the condition of the item, such as new , like new , slightly damaged etc., is kept in the AUCTION table since the same type of item may come in different conditions.) Finally, a bid by a user is for a particular auction, and contains the time of the bid and the bid price (which should be higher than the minimum bid and all previous bids for this item, to have a chance to be successful, but this is not enforced by the schema). SELECT seller FROM AUCTION WHERE minbid > 100 AND condition = "NEW" SELECT U.uid, U.uname FROM USER U, AUCTION A JOIN BID B WHERE U.uid = AUCTION.seller AND B.bidprice > 10000 AND U.state = TX In the following, assume 100 million users, 20 million itemtypes, 100 million auctions, and 1 billion bids. 10% of auctions have minbid > 100, 20% of auctions are for new items, 2% of users are in Texas, and 0.01% of bid prices are > 10000. Assume independence between attributes otherwise. For simplicity, assume that all records are of size 100 bytes, and any IDs and timestamps are 16 bytes. Any index entries are of size 20 bytes, and any index structures have a node size of 4 KB and a height of 4 (the root, the leaf level, and two levels in between). There are 200 MB of main memory available for processing the query, and you are given a disk with 100 MB/s transfer rate and 10ms latency (seek time plus average rotational latency). Q4.1 Problem 2 (a) 6 Points Consider the first query, and estimate its running time under the scan (latency transfer-rate) model of disk performance, for the following three cases: (i) There are no index structures available. (ii) There is a dense clustered B+-tree index on the condition attribute in the Auction table that is used for the selection.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
(iii) There is an unclustered B+-tree index on the minbid attribute in the Auction table that is used for the selection. You can upload file(s) here for Problem 2(a) if you prefer to upload screenshot/scan of handwriting instead of text box. No files uploaded Q4.2 Problem 2 (b) 8 Points Now consider the second query. (i) Draw an optimized query evaluation plan for this query, assuming no index structures are available. For each join, identify the join algorithm that is being used by the plan. No files uploaded (ii) Estimate the running time of the query evaluation plan under the latency transfer-rate model of disk performance. (iii) Suppose you can create two index structure to make this query faster. Which index structures would you choose and why? How would this change the query evaluation plan, in terms of which methods are used for selection and for joins? No files uploaded You can upload file(s) here for Problem 2(b) if you prefer to upload screenshot/scan of handwriting instead of text box. No files uploaded Q5 Problem 3 12 Points In each of the following questions, choose the correct answer for each statement. Q5.1 Problem 3 (a) 3 Points • Index-based joins are not efficient when both relations are very large. TrueFalse • It is always fastest to use an index-based join when an index on the join attribute is available. TrueFalse • Sort-based joins are the most commonly executed joins in database systems. TrueFalse
• Blocked Nested-Loop joins require an index structure on the join attribute. TrueFalse • Clustered and unclustered B+-trees result in the same running time when used for index-based joins. TrueFalse Q5.2 Problem 3 (b) 3 Points Which of the following statements about concurrency control and serializability are true, and which are false? • Every view-serializable schedule is also conflict serializable. TrueFalse • Every view-serializable schedule can be generated by a 2PL scheme. TrueFalse • Every schedule generated by 2PL is conflict-serializable. TrueFalse • Every conflict-serializable schedule is serializable. TrueFalse • 2PL avoids cascading aborts. TrueFalse Q5.3 Problem 3 (c) 3 Points Which of the following statements about database design theory are true, and which are false? • Every schema can be transformed into 3NF. TrueFalse • Every schema can be transformed into BCNF, but the resulting schema may sometimes not be dependency-preserving. TrueFalse • Given a functional dependency that violates BCNF, it is sometimes impossible to find a lossless join decomposition. TrueFalse
• Given a set of functional dependencies F, it is always possible to find a canonical cover of F. TrueFalse • Any schema that is in BCNF is also in 3NF. TrueFalse Q5.4 Problem 3 (d) 3 Points Which of the following statements about index structures are true, and which are false? • Hash indexes are usually good for inequality (range) selections. TrueFalse • Clustered B + -trees are usually better than unclustered B + -trees for inequality selections. TrueFalse • Using an index for an equality selection is always better than scanning the table. TrueFalse • For equality selections, clustered and unclustered B + -trees always have the same performance. TrueFalse • Composite indexes combine ideas from hash indexes and B + -trees. TrueFalse Q6 Problem 4 10 Points Consider the two schedules shown below. Schedule 1 Schedule 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q6.1 Problem 4 (a) 3 Points For each schedule, draw the conflict graphs! (i) Schedule 1 Type your answer in the box and/or upload an image for this question No files uploaded (ii) Schedule 2 Type your answer in the box and/or upload an image for this question No files uploaded Q6.2 Problem 4 (b) 3 Points For each schedule, decide if it is conflict-serializable. (i) Schedule 1 (ii) Schedule 2 You can upload file(s) here for Problem 4(b) if you prefer to upload screenshot/scan of handwriting instead of text box. No files uploaded Q6.3 Problem 4 (c) 4 Points Are the schedules guaranteed to avoid cascading aborts? Are the schedules recoverable? Explain! (i) Schedule 1 (ii) Schedule 2 You can upload file(s) here for Problem 4(c) if you prefer to upload screenshot/scan of handwriting instead of text box. No files uploaded Q7 Problem 5
6 Points The Video Hut chain of home electronics stores maintains a very simple database of sales data that keeps a record of every item that was sold to a customer in one of the stores. The database consists of two relations SALES and STORES, which currently contain the following fields and records: Q7.1 Problem 5 (a) 2 Points Consider the following SQL query: SELECT S.customer, sum(S.price) FROM SALES S, STORES ST WHERE S.storenumber = ST.storenumber AND ST.state = Texas GROUP BY S.customer HAVING sum(S.price) > 200 Explain in one sentence what this query computes. Q7.2 Problem 5 (b) 4 Points Now consider the following four SQL statements, executed consecutively on the above relations. Give the result after the second and after the last query. Also, describe in one sentence what each of the queries does. (You can assume that all of these statements are correct SQL statements - no tricks.)
CREATE VIEW X AS SELECT S.storenum, S.product, avg(S.price) AS avgprice FROM SALES S GROUP BY S.storenum, S.product SELECT * FROM X WHERE X.product = ’TV’ INSERT INTO SALES VALUES (“Bala Murthy”, “TV”, 349, 1082) SELECT * FROM X WHERE X.storenum = 1082 You can upload file(s) here for Problem 5(b) if you prefer to upload screenshot/scan of handwriting instead of text box. No files uploaded
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help