09 Indexes-Key (2)

docx

School

Ohio State University *

*We aren’t endorsed by this school

Course

3241

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

3

Uploaded by JusticeSeahorse1284

Report
CSE 3241 In-Class Assignment – 08 Indexes Names Date For the following questions, suppose we have the following relational database schema. Assume that it takes a constant amount of time C to perform an access through a hash table. 1. For the following query: SELECT Ssn, Fname, Lname, Dname, Salary FROM EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber; Assume the EMPLOYEE table has 1 million records in it and the DEPARTMENT table has 10 thousand records in it. Assume that EMPLOYEES are unrealistically evenly distributed across departments (100 employees per department) a. If there are no indexes in this database at all, how many operations will need to be performed for the above query to return a value? Show your work – don’t just give a number. 1 million x 10,000 = 100 billion operations b. Suppose the DEPARTMENT table was indexed on Dnumber, but that was the only index. How many operations would need to be performed for the above query? Show your work – don’t just give a number. 1 million x C operations c. Suppose that the EMPLOYEE table was indexed on Dno, but that was the only index. How many operations would need to be performed for the above query? Show your work. 10,000 x C x 100 operations (1 million x C operations)
d. Which index would be better for this query to use? Would there be any benefit to using both? If so, how? If not, why not? No benefit to using both – the best we’re going to do is 1 million lookups if we need all employees. The two indexes are equivalent for this query. 2. For the following query: SELECT Fname, Lname, Pname, Salary FROM EMPLOYEE JOIN WORKS_ON ON SSN=ESSN JOIN PROJECT ON Pno=Pnumber WHERE Salary > 30000; Assume the EMPLOYEE table has a million records in it, the PROJECT table has 2 million records in it, and the WORKS_ON table has 10 million records in it. Assume that half of the employees make more than $30K in salary. Assume that EMPLOYEES are (unrealistically) distributed across PROJECTS so that each employee is working on exactly 10 PROJECTS, and each PROJECT has exactly 5 employees working on it. a. If there are no indexes in the database at all, how many operations will need to be performed for this query to return a value? Show your work. 1 million x 10 million x 2 million = 20^18 (or 2e19) b. Suppose there is a tree index on EMPLOYEE.Salary and it is the only index. How many operations will need to be performed for this query to return a value? Show your work. 500,000 x 10 million x 2 million = 10^18 (or half of what we determined in part a) c. For each of the following indexes, assume that it is the only index and determine how many operations it will take for the query to return a value. Show your work in each case: a. Employee.Ssn 2 million x 10 million x (1 x C) = 20^12 x C b. Works_on.Essn 1 million x (10 x C) x 2 million = 20^12 x C c. Works_on.Pno 1 million x (5 x C) x 2 million = 10^12 x C d. Project.Pnumber 1 million x 10 million x (1x C) = 10^12 x C d. Assume that each table will have only one index on it used by a given query. What is the best set of indexes to have on these tables for this query? Best in this case means “indexes that make the query run with the fewest operations”. Justify your answer using numbers and a decent argument for what those numbers mean. The “best” one here would be to use the tree index on Salary, the point index on Works_on.Essn, and the Project.Pnumber index. This would give us:
500K x (10 x C) x (1xC) = 5 million x C operations (If anyone comes up with fewer operations, let me know, they might be right!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help