homework 1(DBMS)

docx

School

California State University, East Bay *

*We aren’t endorsed by this school

Course

610

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by DrWorld13108

BAN 610 Problem set 1 – Normalization, ERD, Consistency, and Recovery Edit your submission in this word document, attaching the screenshots of the codes used for each question. Include narrative descriptions, outputs screenshot, or short answers when requested.

Task 1 Suppose we are processing 100 data records on a CPU cache. Each data record is 32 bytes in size. And each I/O operation will fetch a 64-byte cache line. Suppose the 100 data records on the cache are contiguously stored. In this case, is sequential reading faster than random reading? How much faster? What if each data record is 64 bytes in size instead of 32 bytes? Sequential reading is faster than random reading. As when we read sequentially, we will utilize each cache line but, random reading will incur more cache misses as it is not efficiently using the cache lines. So, while reading 64 bytes for every 32 bytes we need, we expect to max out the throughput at least 2x, it will fetch includes two consecutive data records at once. If each data record is 64 bytes in size instead of 32 bytes, sequential and random reading would perform similarly in terms of cache utilization. The main difference would be related to the order in which data records are accessed.

Task 2 Suppose you are hosting a database server. A table in your database is accessed 20 times / sec on average. The size of the table is 1 GB. The hard drive allocated to store this table cost $10, and the throughput of the hard drive is 100 MB / sec. The memory of your server cost $20 / MB. Should you keep the table on the hard disk or in memory? Hint: calculate or identify what is the D, I, X, M, and P in the five-minutes rule formula. Cdisk = D/(IX) D = $10 I = 20 (database is accessed 20 times/sec meaning it can do 20 operations per second) X = 1/20 (database is accessed 20 times/sec meaning it’s getting accessed in every 1/20 seconds) Cdisk =10/ (20 * 1/20) = $10 Cmem = M/P M = $20/MB P = 1 (assuming it holds 1 page) Cmem = 20/1 = $20 As, Cmem > Cdisk we should not keep the table in memory. As the cost of storing the table in memory is more than the cost of storing it on the hard disk, it will be more cost-effective to store the table on the hard disk. Task 3 We are looking to optimize our database performance by delaying the output of frequently used objects. In the following chart, the object X is frequently accessed by many transactions, and we must repeatedly read X from the disk to memory, update the value of X in memory, and then flush the updated value of X to the disk.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

As you can imagine, this is very inefficient because it results in a lot of I/Os, which takes a long time. Please design a better workflow to reduce the number of I/Os for frequently accessed values (considering using pseudocodes). Hint: think about pre-scanning the transactions before executing it, and based on that, optimize the transactions . A better workflow to access object X is storing x into memory (cache). This way we can reduce the number of I/O operations.  When object X is accessed by transaction, the database checks if it is in the cache.  If X is present in the cache, it will directly return to the transaction, remove the need for a disk I/O and if X is not present in the cache, database recovers it from disk and keeps it in the cache before returning it to transaction.  When transaction updates the value of X, the database updates the cache and marks X as dirty, this indicates that block has modified but has not saved, it needs to be flushed to disk at some point.  The database regularly flushes dirty values from the cache to disk, this reduces the number of unnecessary I/O operations. Task 4 We have a table below:

Note that branch No is the primary key of this table. Please answer the following questions: 1. What is the normal form of the table, why? To determine the normal form, we need to check the table to see that the intersection of every column and record contains only one value or atomic value (1NF), and transitive dependencies and partial dependencies (2NF, 3NF). This the table is not in normal form. separating the table into two parts branch and branch telephone. 2. Normalize the table to 3NF. 3. Identify the primary keys and foreign keys in the 3NF relations. The primary key in the branch table is branch no. In the branch address table branch no. primary key in the tables. Task 5 We have a table below:

Note that staffNo and branchNo are the composite primary key of this table. Also note that each member of staff works in each branch for a different hoursPerWeek. 1.What is the normal form of the table? Why? To determine the normal form, we need to check the table to see that the intersection of every column and record contains only one value or atomic value (1NF) the table is in 1NF, but there is transitive dependencies and partial dependencies. This the table is not in (2NF, 3NF). To normalize this table we need to make three separate tables. 2.Normalize this table to 3NF and mark the primary keys and foreign keys.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Task 6 We have a table below (telNo is the branch telephone number): Please answer the following questions: 1. What is the normal form of the table? Why To determine the normal form, we need to check the table to see that the intersection of every column and record contains only one value or atomic value (1NF) the table is in 1NF, partial dependencies 2NF. This the table is not in (3NF) in which all non-primary-key column can be worked out from only the primary key column(s) and no other columns. 2. Normalize this table to 3NF and mark the primary keys and foreign keys

Task 7 Please normalize the in the book loan table in Excel file, and then screenshot the output. In this book loan system. Each unique Book_Title has a unique Book_ISBN, but many different Book_Call_No (a title in the library usually has multiple copies). It is possible for a student to borrow multiple book copies (identified by Book_Call_No) at one time or borrow the same book copies across different times. This normalization eliminates data redundancy and allows for efficient storage and management of student and book-related information.

Task 8 Please draw ER diagram and convert it into relational schema: 1. Each supermarket branch restocks products with multiple suppliers. Each supplier supplies products to multiple supermarket branch. The supermarket branch has attributes BID, and Location; each supplier has attribute SID, and location. 2. Each supermarket branch hires many inventory managers to manage the product restocking. One inventory manager can only be hired by one supermarket branch. Each inventory manager is responsible for coordinating with only one supplier to restock the products. However, each supplier may work with multiple inventory managers at the same time. The supermarket branch has attributes BID, and Location; each inventory manager has attribute MID and name; each supplier has attribute SID, and location. Many to many relationships: the supermarket branch restocks products with multiple suppliers. Each supplier supplies products to multiple supermarket branchs.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Task 9 We store 8 same-sized data blocks (A1, A2, … A8) in three different storage architectures: (1) In a single disk A1 A2 A3 A4 A5 A6 A7 A8 (2) In RAID 0 using two disks A1 A3 A5 A7 A2 A4 A6 A8 (3) In RAID 1 using two disks A1 A2 A3 A4 A5 A6 A7 A8 A1 A2 A3 A4 A5 A6 A7 A8

Suppose reading one block from the disk to the memory takes 1ms, what is the time needed to read A1, A2, A3, and A4 to the memory in each of the three-storage architecture? Ignore seeking time rotary latency. In a single disk - it would take 4ms to read A1, A2, A3, and A4. In RAID 0 - it would take 2ms to read A1, A2, A3, and A 4 simultaneously. In RAID 1- it would take 2ms to read A1, A2, A3, and A 4 because we read from one of the mirrored disks. Task 10: We are using undo/redo logging to recover the database after a system crash. The log on the disk looks like the following: ... <T1, A, 10, 15> <T1, end> <checkpoint> <T2, start> <T2, A, 39, 10> <T3, start> <T3, B, 20,14> <T3, commit> <T2, C, 50, 20> <T4, start> <T3, end> <T4, D, 12, 13> <T2, commit> Crash How to recover the database (do you redo, undo, or ignore T1, T2, T3, and T4)? and what are the values of A, B, C, and D before and after the recovery (if the value is unsure, type “unknown”)? Please fill the two tables below. Transaction Recover Plan (redo, undo, or ignore) T1 ignore T2 Redo