Fall2023-MidtermAnswers

pdf

School

Dalhousie University *

*We aren’t endorsed by this school

Course

1110

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

14

Uploaded by CommodoreGalaxy10657

Report
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 1 of 14 CSCI 2141 Introduction to Database Systems Fall 2023 Midterm (November 2) Three midterms were used, each with a different colour code. : - / You have 75 minutes to complete the test (2:40 PM – 3:55PM). You may not leave before 2:55 PM. The following will result in removal of your test and a score of zero: - Any use of mobile phones or other electronics - Use of written notes – this test is closed book - Failure to show valid student ID The test is graded out of 100 points. Part 1: Multiple Choice (10 questions; 2 points each; 20% total). Circle the correct answer. (1a) [JOINs] Given a pair of tables T1 and T2, which of the following statements about JOINs is incorrect ? Assume SELECT * and no WHERE clause restrictions. Right answers (i.e., incorrect statements): - T1 INNER JOIN T2 USING ( colname ) and T1 LEFT OUTER JOIN T2 USING ( colname ) are guaranteed to return the same results if every value in T2. colname has a corresponding value in T1. colname - T1 INNER JOIN T2 ON (T1. colname = T2. colname ) will return the same results as T1 INNER JOIN T2 USING ( colname ) - T1 CROSS JOIN T2 will fail if there are no shared columns between T1 and T2 Wrong answers (i.e., correct statements): - T1 INNER JOIN T2 and T2 INNER JOIN T1 will differ only in the ordering of the columns in the joined table - T1 NATURAL JOIN T2 will return the same results as T1 INNER JOIN T2 USING ( colname ) if colname is the only column present in both tables
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 2 of 14 - INNER JOINs can sometimes return the same results as FULL OUTER JOINs - The ON clause can be applied to any columns of T1 and T2, regardless of whether they are primary keys or not - The number of rows returned by T1 CROSS JOIN T2 must be equal to or exceed the number of rows returned by T1 LEFT OUTER JOIN T2 (1b) [File-based databases] What of the following is an inherent limitation of file-based data management? Correct answers: - It is more difficult to ensure referential integrity amongst linked data sets - Attribute constraints such as primary keys are more difficult to enforce - Changes in filesystem structure may require updating scripts and software used to access the files Incorrect answers: - The storage requirements are larger in file-based systems as each conceptual table is stored in a separate file - File-based data systems are more difficult to manipulate using simple Unix commands like ‘grep’ and ‘sort’ - File-based systems are more difficult to embed in complex workflows and server-side computation - The data contained in the rows and columns of a database table cannot be represented in a text file - File-based systems are more difficult to set up initially than database tables (1c) [Conceptual schema] Which of the following statements about conceptual database schemas is false? Correct answers (i.e., false) - The end user needs to understand the conceptual schema in order to interact with the database. - Each entity in the conceptual schema must map to one table in the database.
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 3 of 14 - The conceptual schema always specifies the attributes for each conceptual entity. Incorrect answers (i.e., true) - Conceptual schemas can show many-to-many relationships. - Business rules determine the relationships shown in the conceptual schema. Changes in business rules can impact the structure of the conceptual schema. - The conceptual schema should be independent from the file structure that underpins the database. - The conceptual schema can be visualized using an entity-relationship diagram. - The conceptual schema can have fewer entities than there are tables in the database. (1d) [Data definitions] Which of the following actions will likely require a change to the structure of the database, or a table in the database (i.e., a data-definition command)? Correct answers: - Adding a new attribute to a table - Adding a new constraint such as NOT NULL to an attribute - A change in the relationships between entities in the conceptual schema Incorrect answers: - CASTing a DECIMAL(x,y) value to FLOAT in a SELECT query - Changing the value of the primary key for a row in a table - Executing an INNER JOIN on two tables in the database - Using INSERT IGNORE INTO to add a new row of data to a database table - Removing a row from a table that has a foreign-key constraint (1e) [Data manipulation] Which of the following statements about the GROUP BY clause is true: Correct answers: - The HAVING clause can be used only if GROUP BY is present in a query - It can be applied to multiple columns at the same time - It will return as many rows as there are rows in the queried table, if the grouped attribute contains no repeating values Incorrect answers:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 4 of 14 - The WHERE clause can be used only if GROUP BY is present in a query - The attribute used for grouping cannot be specified in the ORDER BY clause - GROUP BY MAX( colname ) will return the maximum value of each group in the table - The attribute used for grouping must not be referenced in the SELECT clause - GROUP ON JOINED (T1. colname , T2. colname ) can be used to group on two attributes in a JOIN operation (1f) [LIKE conditions] Which text strings will be matched by the clause: WHERE country LIKE “c%a” OR country LIKE “%_ni_”; Assume matching is case insensitive. Correct answers: - Canada and Estonia, but not Eswatini - Cambodia and Romania, but not San Marino (oops) - China and North Macedonia, but not Cameroon - Colombia and Albania, but not Nicaragua Incorrect answers: - Cuba and Dominica, but not Nicaragua - Cape Verde and Lithuania, but not Dominican Republic - Cambodia and Romania, but not San Marino - South Africa and Tanzania, but not Burkina Faso - Central African Republic and Mauritania, but not Namibia (1g) [Self join] Which of the following statements about self joins of T1 with itself is false : Correct answers (i.e., false): - If an SQL query contains a self join, it cannot include joins to other tables as well - You cannot use a NATURAL JOIN because each column name in T1 will trivially match with itself - Any self join on T1 with itself will return double the number of columns in T1 Incorrect answers (i.e., true):
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 5 of 14 - T1 INNER JOIN T1…USING ( colname ) will return the same results as T1 LEFT OUTER JOIN T1 USING ( colname ) if all other clauses are unchanged - You must use table aliases to distinguish the columns that will be returned by the query - Any column name can be specified in the USING() clause - Any JOIN…ON () clause can be recreated with an equivalent WHERE clause - WHERE (table1.colname = table2.colname + 1) can be used to connect different rows in the results (1h) [Keys] Which of the following statements about candidate keys is true : Correct: - Any candidate key not chosen as the primary key is an alternate key - A candidate can have the same number of attributes as a superkey that includes it - The candidate keys in a table may change as a result of the normalization process Incorrect: - If one candidate key in a table is comprised of a single attribute, all candidate keys must be comprised of a single attribute - An attribute in a multi-attribute candidate key can have NULL values as long as other attributes in the same key do not contain NULLs - One attribute in a candidate key can determine the value of another attribute in the same key - Two candidate keys cannot be derived from the same superkey - Candidate keys should be determined by the present values in a table, independent of the business rules (1i) [Database normalization] Which of the following actions can be necessary during the normalization process: Correct: - Assigning attributes to a new table to remove unwanted dependencies - Changing the choice of primary key - Splitting of multi-valued attributes into separate rows to ensure atomicity Incorrect:
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 6 of 14 - Creating derived attributes - Replacing a dependency relationship (A B) with the relationship (B A) - Removing attributes from the database - Adding a NOT NULL constraint to an attribute that was previously not in a candidate key - Removing rows from a table (1j) [Obvious question] Which of the following constraints is imposed by the NOT NULL clause? (a) The column cannot contain any NULL values (b) Most values in the column should be greater than π (c) The column must not updated after midnight (d) The column must be named for a general from the Later Han Dynasty
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 7 of 14 Part 2: Definitions (5 points each; 20% total) Define the following terms: (2a) [Data models] Logical data independence / Physical data independence / End-user model “A condition in which the internal model can be changed without affecting the conceptual model.” “A condition in which the physical model can be explained without changing the internal model”. The portion of the database schema that is visible to the end user of a database. (2b) [DB design theory] Database normalization / First normal form / Transitive dependency "A process that assigns attributes to entities so that data redundancies are reduced or eliminated." “A relation depicted in tabular format, with no repeating groups and a primary key identified.” A dependency where an attribute is determined by a nonkey attribute . (2c) [SQL terminology] The HAVING clause / RIGHT OUTER JOIN / the DECIMAL(x,y) datatype “A clause applied to the output of a GROUP BY operation to restrict selected rows.” “A join operation that yields all of the rows in the right table, including the ones with no matching values in the other table.” An exact numeric datatype where x is the total number of digits, and y is the total number of digits after the decimal point. (2d) [Keys] Primary key / Candidate key / Alternate key “In the relational model, an identifier composed of one or more attributes that uniquely identifies a row. Also, a candidate key selected as a unique entity identifier.” “A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey” “Any candidate key not selected as a primary key”
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 8 of 14 Part 3: The Relational Model (Each question 5 points; 15% total) Here is a small database schema and the corresponding tables: 1: 2: 3:
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 9 of 14 (3a) [relationship cardinality] What type of relationship exists between `province` and `city` / `neighbourhoods` and `city` / `infrastructure` and `city` ? One to many / One to many / One to One (3b) [foreign key] What restrictions does the foreign-key constraint impose on the affected attributes in the parent and dependent table? You must include the names of the attributes in your answer. “A primary key from one table [the parent table ] that is placed into another table [the dependent table ] to create a common attribute.” “The foreign key must be constrained to ensure referential integrity.” (answer should reference the columns specific to the student’s exam) (3c) [***joins] How would the results differ between `province` INNER JOIN `city` USING (province_name) versus `province` LEFT OUTER JOIN `city` USING (province_name)? They would be the same `neighbourhoods` INNER JOIN `city` USING (city_name) versus `neighbourhoods` LEFT OUTER JOIN `city` USING (city_name)? They would be the same `infrastructure` INNER JOIN `city` USING (city_name) versus `infrastructure` LEFT OUTER JOIN `city` USING (province_name)? Would return an error since `province_name` is not defined in `infrastructure` or `city` (part marks if student misreads `province_name` as `city_name`) How many rows and columns would we obtain from each of these? 3 rows, 4 columns 3 rows, 5 columns 0 rows, 0 columns
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 10 of 14 Part 4: Database Normalization: (each question 5 points; 15% total) Here is a dependency diagram showing a database table. Assume the table is in at least first normal form. 1: 2: 3: (a) Is this table in second normal form? Why or why not? No, because it contains a partial dependency (`city_pop` on only one attribute in the composite primary key `city`) Yes, because a table with a single-attribute primary key is automatically in 2NF if it is already in 1NF
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 11 of 14 Yes. Although the table contains a composite primary key, all dependent attributes depend on the entire key. (b) What needs to be done to convert these tables into third normal form? We must put the table in second normal form by creating a new table with `city` as primary key and `city_pop` as dependent attribute. Remove `city_pop` from the parent table. Then, to put the table in third normal form we must create a new table with `customers` as primary key and `sales` as dependent attribute. Then we remove `sales` from the parent table. We must create a new table with `employee ID` as primary key and `employee title` as dependent attribute. Nothing needs to be done – this table is already in 3NF (c) ***What is the key property of third normal form that distinguishes it from second normal form? Draw a dependency diagram that shows the structure of the table(s) once third normal form has been reached. A table is in third normal form if it is in second normal form and additionally contains no transitive dependencies. 1) Customers Sales
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 12 of 14 2) 3) Employee ID Employee Title
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 13 of 14 Part 5: SELECT fraud FROM supermarkets (Each question 10 points; 30 points total) Acting on a tip from a major supplier of salad dressing and salted cashews, Canada’s Competition Bureau is investigating the major grocery chains for evidence of price fixing during a supply-chain interruption in May 2022. The allegations suggest that many grocery companies all raised the prices of a few key products by a similar amount, several days before the interruption began. As per usual, determination of guilt or innocence will be based on a series of SQL queries across three database tables. Here is an E-R diagram showing the database tables that are being used to carry out the investigation. (5a) The following query is meant to [ compare the shortage date versus the price change data for all products offered by all companies / compute the profit margins for all products offered by all companies at the beginning and end of May / identify the companies with the largest jumps in price over all products ], but it will not work correctly due to a syntax or logic error. Why will it not work? Write out an example of a query that will correctly address the question. SELECT company, product, TIMEDIFF(price_change_date, shortage_date_in_may) AS "Price vs Shortage" FROM products INNER JOIN company_products; There is no `ON` or `USING` clause for the INNER JOIN, so all row combinations will be returned. The query can be corrected by adding “…USING(product)” or “…ON(products.product = company_products.product). Note that if using the ON clause, one specific `product` column will need to be specified in the SELECT query. SELECT company, products.product, (price_may1 - avg_supplier_cost_may1) AS "Profit May 1", (price_may31 - avg_supplier_cost_may31) AS "Profit May 31" FROM products INNER JOIN company_products ON (companies.product = products.product); `companies` is not specified in the FROM clause, this should be `company_products.product`.
Name: CSCI 2141 Fall 2023 Midterm Banner ID: Page 14 of 14 (it sounds like some students thought the problem was a logic error involving the profit margin; if they changed to division or other mathematical operator this is fine as long as they also fixed the “companies” error.) SELECT company, MAX(price_may31 - price_may1) AS "Max Price Difference" FROM companies INNER JOIN company_products USING(company); The query is missing a GROUP BY clause. Changing the query to include “GROUP BY company” will make it valid. (5b) What are three possible datatypes you could use for the `product` / `company` / `price_may1` attribute? Explain the advantages and disadvantages of each. Which of these would you choose and why? Basic principles from class: - CHAR(x) reserves enough space for the entirety of x, so can be quite inefficient in terms of memory usage. - VARCHAR(x) can have the same maximum size, but is of variable length and doesn’t reserve the entire length of the string. - TEXT is like VARCHAR(), although its external storage can make it much more inefficient - FLOAT and DOUBLE can be used for prices, but their rounded nature makes them prone to small imprecisions. - DECIMAL or NUMERIC (which do the same thing in MySQL) are exact and suitable for storing prices. - The various types of INT are great but not suitable to represent floating-point values. (5c) Write an ALTER TABLE statement that will add a column (you choose the name) of datatype [***INT / VARCHAR(20) / FLOAT ) to the `companies` table. The column must include a constraint on the range of possible values, or on the set of allowed values. ALTER TABLE whatever ADD colname INT / VARCHAR(20) / FLOAT CONSTRAINT ` something` (CHECK ( colname < 0.5)); - or – CONSTRAINT `something` (CHECK ( colname IN (“A”,”B”,”C”);