RajbirSinghKhasria_Assignment1

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6130

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

11

Uploaded by JudgeRiver8743

Report
Assignment 1 — Tech Crunch By Rajbir Singh Khasria College of Professional Studies: Northeastern University Prof.: Shahram Sattar Subject: ALY6030 Data Warehousing & SQL Date: October 1 st , 2023
Part 1: Normalization Question 1: Finding the Primary key? To find an appropriate primary key for this dataset, we must first investigate the columns and their properties. A primary key is an identifier that is unique to each record in a table. It should be non-null and have unique values across all records. Based on the information supplied, the column 'fund_id' looks to be a good option for a primary key. It includes unique identifiers for each financing event and meets the criteria for a primary key. An example of an attribute that would not be an acceptable primary key is 'company': While each company name is unique within the information, it may not be unique in a larger context. Different companies with the same name may exist, therefore it does not guarantee uniqueness across all entries. To test whether 'fund_id' is a valid primary key, I have sorted the dataset by 'fund_id' in Excel. This helped me to quickly identify any duplicate values or null entries in the 'fund_id' column. As there were no duplicates or null entries, 'fund_id' can be deemed a valid primary key for this dataset. Question 2: Is the table provided in 1 st normal form or not? Yes, the table fulfills the First Normal Form (1NF).
A table must meet the following criteria to be included in 1NF: 1. All column entries must be of the same data type. 2. Each column must have its own name. 3. The sequence in which data is saved is unimportant.] This table includes: 1. Each column's entries all have the same data type. For example, 'fund_id' is an integer, 'company' is a string, 'numEmps' is an integer, and so on. 2. Each column has its own name. There are no column names that are identical. 3. The order in which data is saved has no bearing on its structure or significance. The 'fund_id' column identifies each row as representing a distinct financing event. As a result, this table meets the standards of 1NF. Question 3: Is the table provided satisfy 2 nd normal form? Yes, the table fulfills the Second Normal Form (2NF). A table must meet the following criteria to be included in 2NF: 1. It has to be in 1NF already. 2. A primary key should be specified.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3. All non-key properties (columns) must be completely functionally dependent on the primary key as a whole. This table includes: 1. We've determined that it's in 1NF. 2. 'fund_id' is a good primary key because it uniquely identifies each financing event. 3. All non-key properties ('company', 'numEmps', 'category', 'city','state', 'fundedDate', 'raisedAmt', 'raisedCurrency', and 'round') are functionally dependent on the complete primary key ('fund_id'). That is, all other attributes are uniquely defined for each particular 'fund_id'. As a result, this table meets the standards of 2NF. Question 4: Is the table provided satisfy 3 rd Normal form? The table violates the Third Normal Form (3NF). A table must meet the following criteria to be included in 3NF: 1. It has to be in 2NF already. 2. The primary key must be functionally reliant on all non-key properties (columns). 3. No transitive dependencies should exist. This table includes:
1. We've determined that it's in 2NF. 2. The non-key properties ('company', 'numEmps', 'category', 'city','state', 'fundedDate', 'raisedAmt', 'raisedCurrency', and 'round' are not all functionally dependent on the primary key ('fund_id'). Some of these characteristics are interdependent. For example, 'city' and'state' are both reliant on 'fund_id', but they are also mutually dependent. This results in transitive dependence. The table does not meet 3NF standards since 'city' and'state' are functionally reliant on one other rather than just on the primary key 'fund_id'. To achieve 3NF, we would need to divide the characteristics that have transitive dependencies into different tables and create relationships between them. For example, we could construct a database for firms with characteristics such as 'company,' 'numEmps,' and 'category,' and another table for financing events with properties such as 'fund_id,' 'fundedDate,' 'raisedAmt,' 'raisedCurrency,' and 'round. We would erase the transitive dependency between 'city' and'state' in this manner. Question 5: Proposed ERD that would bring this dataset into 3 rd Normal Form
We would need to develop additional tables to remove transitive dependencies in order to bring the dataset into 3NF. Here's an example of an Entity-Relationship Diagram: Fig 1: ERD Diagram Table 1: Companies primary key: Company_id Attributes: company, numEmps, category, city, state Table 2: Funding Opportunities Primary Key: fund_id - Foreign Key (Company References): company_id Attributes: fundedDate, raisedAmt, increasedRound currency This proposed ERD includes: The "Companies" table covers company information such as their name, number of employees, category, city, and state. The primary key is the 'company_id'.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Each fundraising event is described in the "Funding Events" table, which includes the funding ID, date, amount, currency, and round. The 'fund_id' is the primary key, while the 'company_id' is a foreign key that refers to the company connected with the fundraising event. Because a company might have numerous funding events, there is a one-to-many relationship between the "Companies" table (one side) and the "Funding Events" table (many side). This ERD divides the data into two tables, removing the transitive relationship between 'city' and 'state'. Part 2: Case Study 1. Inserted two new recipes. Here I have inserted two new recipes "Spaghetti Carbonara" and "Chocolate Chip Cookies" using the following query: Fig 2: Query for inserting new recipes.
2. SQL query to show all the details of new added recipes This query pulls information from the recipes such as the title, category, description, preparation and cook times, servings, difficulty, directions, ingredient amounts, and ingredient names. It connects data from the recipe_main, categories, rec_ingredients, and ingredients databases using JOINS. Fig 3: Query to show new recipes. Here is the output: Fig 4: Detail of new added recipes 3. Select Query to show specific Columns
This query returns the recipe name, category name, ingredient name, and ingredient amount from the appropriate tables. It connects the tables through JOINS. The result is then organized according to your preferences: first by descending category name, then by ascending recipe name, and finally by descending ingredient name. Fig 5: Select Query Here is the Output:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Fig 6: Select query output References: Inserting multiple rows in a single SQL query? . (2023). Retrieved from https://stackoverflow.com: https://stackoverflow.com/questions/452859/inserting- multiple-rows-in-a-single-sql-query Normalization - 1NF, 2NF, 3NF and 4NF . (2023). Retrieved from https://www.youtube.com: https://www.youtube.com/watch? v=UrYLYV7WSHM SQL Joins . (2023). Retrieved from https://www.w3schools.com: https://www.w3schools.com/sql/sql_join.asp