Final_Exam_FALL_2023_6083B

docx

School

New York University *

*We aren’t endorsed by this school

Course

6083

Subject

Computer Science

Date

Jan 9, 2024

Type

docx

Pages

13

Uploaded by DrDugong3665

Report
Final Exam CS-GY 6083 - B, FALL-2023. Principles of Database Systems. 16th Dec. 2023
FINAL EXAM [100 points with 20% weight] Saturday, 12/16/2023 11:00 AM to 1:30 PM EST (2 ½ HRS. ) Please read the instructions carefully before writing the exam Write your name, student ID, and net ID below. Last Name: Lei First Name: Zhao Net ID: lz2956 Student ID: N16575091 THIS IS AN ONLINE – OPEN BOOK EXAM. PLEASE LOG IN TO THE ZOOM MEETING USING YOUR NET ID (DO NOT LOG IN WITH YOUR PERSONAL EMAIL ACCOUNT). Find the Zoom meeting details under the ZOOM top bar menu of the course website with the title “Final Exam”. Join the Zoom meeting at least 15 minutes before the exam time. Write your answers under each question in this Word document and submit it on or before 1:30 pm to course site > assignments > Final Exam . Save and submit the exam submission document in format <your net id>_final_fall_2023_6083b.pdf. Unlimited submission is allowed before the deadline. The latest submission will be considered for grading . Please mute your microphone during the entire exam duration and keep video on since it is an online exam. This exam has 5 sections A, B, C, D, and E. There are no negative points for any wrong answers. If you have any questions during the exam, please send your question privately to the course assistant on zoom meeting chat window. Do not speak in the microphone. Please keep saving your exam document frequently. Please keep ready the practice schema (that we used for the SQL tutorial) with all tables with your initial. GOOD LUCK! 1 | P a g e
A) Answer the following questions briefly [10 points] i) Explain the difference between Data, Metadata, and Information with an example of each. Data: This refers to raw, unprocessed facts and figures. Data can be quantitative or qualitative and does not carry any specific meaning on its own. It's the raw material that can be analyzed or processed to extract information. Example of Data: A series of numbers like 34, 22, 15, 78, 56. Without context, these numbers don't tell us much. Metadata: This is data about data. Metadata provides context or additional information about the primary data, making it easier to understand and use effectively. It can describe various aspects of the data, like when it was created, by whom, what format it's in, and other characteristics. Example of Metadata: If we consider a digital photograph as our primary data, the metadata would include information such as the date and time the photo was taken, the type of camera used, the file format (e.g., JPEG), and the image resolution. Information: This is what we obtain when data is processed, organized, or presented in a given context to make it meaningful. Information is data that has been interpreted and understood by the recipient in some meaningful way. Example of Information: If the series of numbers (34, 22, 15, 78, 56) represents the ages of a group of employees in a department, when processed and analyzed, we might extract information such as the average age of employees in that department. ii) Explain the difference between Delete, Truncate, and Drop SQL statements. Delete: Removes specific rows from a table. Can use a WHERE clause for targeted deletion. It's slower as it logs each row deletion and is transaction safe (can be rolled back). Truncate: Quickly deletes all rows from a table, without logging individual row deletions. It's faster but not selective (can't use WHERE clause). Resets identity columns and is usually transaction safe but can't roll back individual deletions.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Drop: Completely removes the entire table and its schema from the database. It's irreversible and not transaction safe. Once dropped, the table and its data are permanently deleted.
B) For the HRD database in the relational model above, consider the following transaction and answer the question [10 points] 1. CONNECT apatel/Fa1l2021@HRD 2. UPDATE AP_EMP SET COMM= COMM + SAL*0.12 WHERE DEPTNO=20; 3. DELETE * FROM AP_DEPT WEHRE DEPTNO=40; 4. ALTER TABLE AP_DEPT MODIFY (DNAME VARCHAR2 (40) ); 5. ROLLBACK; 6. CREATE INDEX index_elname ON ap_emp(substr(elname,-2,2)); 7. COMMIT; 8. ROLLBACK; i) For the above set of database activities, list all transactions with transaction numbers e.g., TX1, TX2, etc., along with which activity number each transaction starts and when it ends. [5 points] Transaction Name Started at Activity Number Ended at Activity Number TX1 1 4 TX2 4 5 TX3 5 6 TX4 6 7 TX5 7 8 ii) At the end of these transactions what changes will take place in the HRD database and why? [5 points] At the end of these transactions, the only permanent change to the HRD database will be the creation of the index index_elname on the AP_EMP table. All other attempted changes will not be applied due to rollbacks . C) For the HRD database in the relational model above, correct each SQL [30 points] The following are incorrect SQLs. For each of these SQL, identify and list all mistakes and then write. corrected SQL. State the purpose of the corrected query (what business question that query answers) i) ALTER TABLE AP_DEPT ADD CONSTRAINT ck_dept_loc CHECK (LOC = (‘NEW YORK’,’CHICAGO’,’BOSTON’))
WHERE DEPTNO=10 OR 20 OR 30 ; Syntax for CHECK Constraint: The check constraint syntax is incorrect. The CHECK clause doesn't accept a list of values with the = operator. Instead, the IN clause should be used for specifying multiple possible values for a column. WHERE Clause: The WHERE clause is not used with the ALTER TABLE statement when adding a constraint. Constraints inherently apply to all rows in a table and cannot be conditionally applied to certain rows based on a WHERE clause. Quotation Marks: SQL statements typically use single quotes for string literals. The quotation marks used (‘ and ’) are not standard single quotes ('), which might be a typographical issue that could cause syntax errors. Corrected SQL statement ALTER TABLE AP_DEPT ADD CONSTRAINT ck_dept_loc CHECK (LOC IN ('NEW YORK', 'CHICAGO', 'BOSTON')); Purpose: The purpose of the corrected query is to enforce a business rule that the location of the department (LOC) must be one of the specified cities: New York, Chicago, or Boston. This constraint ensures data integrity by only allowing these values in the LOC column for all rows in the AP_DEPT table. ii) SELECT JOB, DNAME, AVG(SAL) ‘AVERGE SALARY’ FROM AP_EMP A JOIN AP_DEPT B A.DEPTNO=B.DEPTNO WHERE DEPTNO<>50 AND AVG(SAL)>=2000 ORDER BY 4; Alias Syntax: There is a missing ON keyword for the JOIN clause which is required to specify the condition on which the tables AP_EMP and AP_DEPT are to be joined. Grouping Function: The AVG(SAL) function is an aggregate function and should be used in conjunction with a GROUP BY clause to group the results by the non- aggregated columns selected (JOB, DNAME). HAVING Clause: Conditions on aggregate functions should be placed in a HAVING clause, not in the WHERE clause. Alias for Column: The alias for the average salary is specified using quotes that are not standard SQL. It should be enclosed in double quotes or backticks depending on the SQL dialect. Also, there's a typographical error in 'AVERAGE'.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ORDER BY Clause: ORDER BY 4 refers to ordering by the fourth column in the SELECT statement, but there are only three columns. If we want to order by the average salary, we should use the alias of that column. Corrected SQL statement: SELECT JOB, DNAME, AVG(SAL) AS "AVERAGE SALARY" FROM AP_EMP A JOIN AP_DEPT B ON A.DEPTNO = B.DEPTNO WHERE A.DEPTNO <> 50 GROUP BY JOB, DNAME HAVING AVG(SAL) >= 2000 ORDER BY "AVERAGE SALARY"; Purpose: The purpose of the corrected query is to retrieve a list of jobs and department names along with the average salary for those jobs, excluding departments with DEPTNO 50, and only including those jobs where the average salary is at least 2000. It also orders the results by the average salary. iii) SELECT EMPNO, EFNAME, DEPTNO, SAL FROM AP_EMP WHERE SAL>=(SELECT DEPTNO, MIN(SAL) FROM AP_EMP GROUP BY DEPTNO); Subquery Returning Multiple Columns: The subquery within the WHERE clause is attempting to return two columns (DEPTNO, MIN(SAL)), but for comparison purposes in the WHERE clause, it should only return one column. Comparison Logic: The outer query is comparing SAL with a subquery. The intent seems to be to select employees whose salaries are greater than or equal to the minimum salary in their respective departments. This requires correlating the subquery with the outer query. Corrected SQL statement: SELECT EMPNO, EFNAME, DEPTNO, SAL FROM AP_EMP A WHERE SAL >= ( SELECT MIN(SAL) FROM AP_EMP B WHERE A.DEPTNO = B.DEPTNO); Purpose: The purpose of the corrected query is to retrieve the employee number, first name, department number, and salary for those employees whose
salaries are greater or equal to the minimum salary within their respective departments. iv) DELETE * FROM AP_PROEMP WHERE PROJID=100, HOURS<30; DELETE Syntax: The DELETE statement does not use an asterisk (*). The DELETE statement is used to delete rows from a table, and the asterisk is unnecessary and incorrect in this context. Condition Syntax: The WHERE clause uses a comma to separate conditions, which is incorrect. SQL uses logical operators like AND or OR to combine conditions. Corrected SQL statement: DELETE FROM AP_PROEMP WHERE PROJID = 100 AND HOURS < 30; Purpose: The purpose of the corrected query is to delete rows from the AP_PROEMP table where the PROJID is 100 and the HOURS are less than 30. v) SELECT EMPNO, ENAME, DEPTNO, SAL FROM AP_EMP WHERE JOB= 'MANAGER' ORDER BY SAL INTERSECT SELECT ENAME, EMPNO, DEPTNO, SAL FROM AP_EMP WHERE DEPTNO=20 AND COMM=NULL; Inconsistent Column Order and Selection: For an INTERSECT operation, the number and order of columns in both SELECT statements must be the same, and the data types of the columns in each select statement must match. NULL Comparison: We cannot use = to check for NULL values. Instead, the IS NULL operator should be used.
Column Selection Mismatch: The order and columns selected in both queries must match for INTERSECT to work correctly. The first query selects EMPNO, ENAME, DEPTNO, SAL, and the second should select the same columns in the same order. Corrected SQL statement: SELECT EMPNO, ENAME, DEPTNO, SAL FROM AP_EMP WHERE JOB = 'MANAGER' ORDER BY SAL INTERSECT SELECT EMPNO, ENAME, DEPTNO, SAL FROM AP_EMP WHERE DEPTNO = 20 AND COMM IS NULL; Purpose: The purpose of the corrected query is to find employees who are managers and employees in department 20 who have null commission. The INTERSECT operation will return only those rows common to both conditions, i.e., managers in department 20 with null commission. D) Consider the following scenario of transactions in the HRD database . [24 points]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Tc is the time when database has checkpoint. Tf is the time when database has system failure T1, T2, T3, and T4 are database transactions. Transactions T1, T2, T3, and T4 occurred in chronological order. The checkpoint in the database happened at a given time Tc and later on time at Tf, the system crashed due to power failure. Assume that the employee MELVIN with employee ID 7799 is working in department number 20 with a monthly salary of $7000. MELVIN is currently not eligible to earn the commission. The following are the details of the work done in each transaction . T1 : CONNECT apatel/N3wY0rk23@HRD SELECT deptno, sal, comm FROM ap_emp WHERE EMPNO=’7799’; UPDATE ap_emp SET sal=sal+500, comm=sal*0.1 WHRE EMPNO=’7799’; CREATE TABLE AP_EMP_TEST AS SELECT * FROM EMP WHERE “T1”=”TX 1”; T2 : DELETE FROM ap_emp WHERE SAL BETWEEN 8000 and 9000; ALTER TABLE EMP_TEST ADD CONSTRAINT pk_emp_test PRIMARY KEY(EMPNO); T3: UPDATE ap_emp SET deptno=30 WHERE empno=’7799’; UPDATE ap_emp SET sal=sal+sal*0.1 WHERE deptno=30; COMMIT: T4: UPDATE ap_emp SET comm=nvl(comm,0)+100 , sal=sal+300 WHERE empno=7799;
i) Upon system recovery, which transaction(s) will undergo REDO operations, and which transactions will undergo UNDO operations and why? [ 8 points] REDO: T2 and T3. T2 started but didn’t complete at checkpoint Tc. T3 completed before the Tf. UNDO: T4, because it didn’t complete before the Tf. ii) For transaction(s) that will undergo UNDO, what will be written out in the transaction log? [8 points] <T4, COMM, 700> <T4, SAL, 7700> <T4, abort> iii) What will the department number, salary, and commission of MELVIN be after the system is recovered? [8 points] After the system is recovered MELVIN’s department number will be 30, salary will be 7700 and commission will be 700. E) Consider the following dataset . [26 points] This dataset is about employee churn (decision to leave the company) based on gender, age, and income criteria. a) Draw bitmaps for the attributes GENDER, AGE, INCOME, and CHURN. Submit each bitmap. [8 points] Gender Bitmap:
Female: [1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0] Male: [0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1] Age Bitmap: <50: [1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0] >50: [0, 1, 0, 0, 1, 1, 0, 0, 0 , 0, 0, 1, 1] Income Bitmap: <100K: [1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0] >100K: [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1] Churn Bitmap: NO: [1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1] YES: [0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0] b) Using the bitmaps created in step a above, answer the following questions by resolving the bitmaps. Total number of male employees of the age less than 50 and with salary less than 100Kwho left (CHURN) the company. c) Submit intermediate and final bitmaps, and a total number of employees as per given criteria . [8 points] [0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1] (Male) [1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0] (<50) [1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0] (<100K) [0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0] (Churn) Performing the AND operation: [0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0] Now, count the number of 1s in the final bitmap. The total number of male employees who are less than 50 years old, have a salary less than 100K, and left the company (churned) is 4. Use the attached DDL and DML to create a table and data by replacing asp13 with your Net ID. Write a SQL query that finds the answer to the same question (Total number of male employees of the age less than 50 and with salary less than 100K who left (CHURN) the company). Submit the SQL and screenshot of the result. [8 points] SELECT COUNT(*) AS TotalMaleChurned
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
FROM lz2956_emp_churn WHERE GENDER = 'Male' AND AGE = '<50' AND INCOME = '<100K' AND CHURN = 'YES'; d) Write DDL code to create bitmap indexes for each attribute. Submit the DDL code. [2 points] -- Create a bitmap index for the GENDER attribute CREATE BITMAP INDEX idx_gender ON lz2956_emp_churn (GENDER); -- Create a bitmap index for the AGE attribute CREATE BITMAP INDEX idx_age ON lz2956_emp_churn (AGE); -- Create a bitmap index for the INCOME attribute CREATE BITMAP INDEX idx_income ON lz2956_emp_churn (INCOME); -- Create a bitmap index for the CHURN attribute CREATE BITMAP INDEX idx_churn ON lz2956_emp_churn (CHURN);