IS480_Subqueries_and_CASE_Logic (1)

docx

School

California State University, Long Beach *

*We aren’t endorsed by this school

Course

480

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

13

Uploaded by CorporalGoose3096

Report
Week 3 In-class activity, Advanced SQL Statements: Subqueries & CASE Logic Session Objectives: - Create both correlated and noncorrelated sub-queries. - Use subqueries in different clauses of the SQL statement. - Use CASE Logic to create IF-THEN-ELSE scenarios. PART 1: Comparing Basic Queries with Subqueries Basic statements & clauses in an SQL Query appearance of an SQL Query: SELECT * <-Select all columns from all tables of interests FROM student <- Specify which tables you would like to get the data from (for the SELECT statement) WHERE student.student_age > 19 <- clause used to filter records based on some condition. The condition is mentioned in the form of an equality GROUP BY student <- Specify how to categorize the results. It groups data by the specified value and gives a summarized view HAVING <- Specifies a filter for the groups. ORDER BY <- Sorts the results by some specified criteria ; <-finish your queries with a semicolon. This may or may not be necessary depending on the DBMS, but its best that we get used to including it so that our queries are more reusable regardless of the DBMS
SQL Subqueries: These subqueries can be differentiated into CORRELATED, and NONCORRELATED subqueries: 1) NONCORRELATED subqueries will run first, and the output will be used as an input for the outer query (the main query). These are the typical cases and represent all subqueries that can run on their own. Let’s review the outer query (main query containing the inner query within) and inner query (subquery): SELECT CustomerName, CustomerAddress, CustomerCity, CustomerState, CustomerPostalCode FROM Customer_T WHERE Customer_T.CustomerID = (SELECT Order_T.CustomerID FROM Order_T WHERE OrderID = 1008); Outer query in BLUE . Inner query in RED . In this case the Inner query is not making any references to the outer query and therefore it is a non-correlated query. The Inner query is looking into the Orders table (Order_T), and wants to get the column Customer ID (Order_T.CustomerID) as the output. Before doing that, it will filter the results so that we only see the customer ID of the customer that placed the order number 1008. Therefore, the output will be a column of showing the customer ID, with a single row of data corresponding to the customer who placed the order 1008. It will be a single row because an order is placed by 1 and only 1 customer and therefore only 1 customer can be associated with order 1008. Since it is a non-correlated query, it will run first, and only then the outer query will run. The outer query will then pick up data about the customer from the customer table (Customer_T), and will display the columns with the customer’s name (CustomerName), address (CustomerAddress), the city (CustomerCity), the state (CustomerState), and the postal code (CustomerPostalCode). It will then filter the results (WHERE clause) so that only those rows with a customer ID that matches the output of the Inner query will be shown. Combined, this means we will see the name and address information for the customer who placed the order # 1008.
2) CORRELATED subqueries will run at the same time as the outer query. They will also run for every row of data of the outer query. You can identify them because within the subquery there will be some reference to the outer query (for example, the subquery can use a WHERE clause with a mention of a column or table that is referenced in the outer query but not in the inner/sub query. Therefore, it cannot possibly run without the outer query also running. Now let’s review a CORRELATED Query: SELECT ProductDescription, ProductFinish, ProductStandardPrice FROM Product_T PA WHERE PA.ProductStandardPrice > ALL ( SELECT ProductStandardPrice FROM Product_T PB WHERE PB.ProductID ! = PA.ProductID); Outer query in BLUE . Inner query in RED . In this case we can tell the Inner query is a correlated query since it references the outer query by using the Product_T PA table. Let’s review this in detail: The inner query is getting data from the Product_T table nand calling it “ PB ” (an alias for referencing the table with ease). The output is meant to be the product standard price, and it filters it (where clause) so that the product standard price shown is only for those products with an ID that is present in both the PB table mentioned in the FROM statement, and a PA table that has not been mentioned yet (see PA .ProductID). So where is this PA table coming from? It is a reference to the outer query. The outer query is taking data from a different copy of the Product_T table that it calls PA . The output of the outer query is simply a set of columns with data on the product’s description, finish (wood, metal, ash, etc), and standard price. The output is filtered so that it shows the data for rows that have a higher standard price than all those shown in the Inner query ( WHERE PA.ProductStandardPrice > ALL (Inner Query)). Since the Inner query is listing all product standard prices, the end result is that we only get 1 row of data as an output of the outer query that contains the information of the product with the highest standard price.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
SQL CASE Logic (Equivalent to IF-THEN-ELSE): CASE Clauses allow us to create IF-THEN-ELSE conditions in our queries. These clauses are used within the SELECT statement to create a new column of data with the result of each IF-THEN-ELSE inputted in the corresponding cell of each row. For example: SELECT ProductDescription, ProductFinish, ProductStandardPrice, CASE WHEN ProductStandardPrice > AVG(ProductStandardPrice) THEN ‘Expensive’ WHEN ProductStandardPrice < AVG(ProductStandardPrice) THEN ‘Affordable’ ELSE ‘Average’ END AS ‘Relative_Price’ FROM Product_T CASE Clause is in BLUE . Alias Given to the Column resulting from the Case Clause is in RED . If we desire to filter or group the rows based on the results of the case logic, we can repeat the code in the WHERE clause or the GROUP BY clause respectively. See: -- removing NULLS with WHERE CLAUSE SELECT date , hometeam_id , awayteam_id , CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chelsea home win!' WHEN awayteam_id = 8455 AND home_goal < away_goal THEN 'Chelsea away win!' ELSE 'Loss or tie :(' END AS outcome FROM match WHERE ( CASE WHEN hometeam_id = 8455 AND home_goal > away_goal THEN 'Chelsea home win!' WHEN awayteam_id = 8455 AND home_goal < away_goal THEN 'Chelsea away win!' END ) IS NOT NULL ; -- GROUP by Case result
SELECT CASE WHEN home_goal > away_goal THEN 'Home win!' WHEN home_goal < away_goal THEN 'Home loss :(' ELSE 'Tie' END AS outcome , count ( t . team_long_name ) AS count_games FROM matches_spain AS m -- Left join teams_spain onto matches_spain LEFT JOIN teams_spain AS t ON m . awayteam_id = t . team_api_id GROUP BY ( CASE WHEN home_goal > away_goal THEN 'Home win!' WHEN home_goal < away_goal THEN 'Home loss :(' ELSE 'Tie' END AS outcome , count ( t . team_long_name )); While the code can grow fast, we just need to pay attention to the logic of what is happening: In the filter example, repeating the CASE code tells the system that we want to check if the result of the CASE passes the equality. If it does, we keep the row. If it doesn’t we disregard the row from the output. In the case of the group by, we simply repeat the CASE code in the GROUP BY section so that the database knows it needs to aggregate all rows to the level of the result of the CASE. That is, if there are 3 possible outputs of the CASE logic, there will be 3 rows in the final result of the query. We MUST also combine this with some aggregation function so that the database knows what we intend to do here: Are we counting by the result of the Logic? Are we summing the results? Etc.
PART 2: Examples for Creating Subqueries Available tables: student table: student_ID first_name last_name main_campus advisor 1 ~ ~ ~ ~ 2 ~ ~ ~ ~ 3 ~ ~ ~ ~ 4 ~ ~ ~ ~ course table : [simplified model to maintain 2 tables only, we’ll review joining more tables in the next class] course_ID course_name classroom student_ID 1 ~ ~ 1 2 ~ ~ 1 3 ~ ~ 2 4 ~ ~ 3 course_ID course_name classroom student_ID Student_ID first last main advisor 1 ~ ~ 1 1 2 ~ ~ 1 1 3 ~ ~ 2 2 4 ~ ~ 3 3 NULL NULL NULL NULL 4 ~ ~ ~ ~
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1) Subqueries : We can use the results of a query as the basis for another query. To exemplify this, we 1) first need to create a query to identify the data we are interested in, 2) we need to provide an alias for the results so we can refer to them in the same way we normally refer to a table, and 3) we need to write the query that will use these results. Let us start with identifying the courses that student ID 1 has been taking (we will use this new table to match it with the student’s table and get more data on student 1). SELECT * FROM course WHERE course.student_ID = 1 ; This query will give us a table with all the course table’s attributes but only those rows with student ID 1. course_ID course_name classroom student_ID 1 ~ ~ 1 2 ~ ~ 1 Now we can give this new table an alias so that we can refer to it in other parts of the query: ( SELECT * FROM course WHERE course.student_ID = 1) AS c ; Or (SELECT * FROM course WHERE course.student_ID = 1) AS c ;
Finally, we can write another query that will use this table and match it with a different table (namely the student table) and join them so that we get all of the data for student 1 attached to the table above. SELECT * FROM ( SELECT * FROM course WHERE course.student_ID = 1) AS c <- we rename this table as c so we just need to write “c” to refer to it RIGHT JOIN student AS s <- we rename the student table as s so we just need to write “s” to refer to it ON c.student_ID = s.student_ID <- we match the student_ID in the newly created table, with the student ID in the student table ORDER BY c.course_ID ; ; course_ID course_name classroom student_ID first last Campus colleague 1 ~ ~ 1 ~ ~ ~ ~ 2 ~ ~ 1 ~ ~ ~ ~ 2) Self Join : SELECT * FROM student As s, student As c INNER JOIN course AS c ON s.student_ID = c.collegue_ID ORDER BY s.student_ID ; 3) Identify those with Null values (IS NULL) : SELECT * FROM student As s FULL JOIN course AS c ON s.student_ID = c.student_ID WHERE s.student_ID IS NULL OR c.student_ID IS NULL ; The table showing from this query will have only results with a NULL student ID in either the student table or the course table.
PART 3: Creating Your Own Subqueries Continuing with the “classic models” database, we will create sub-queries (i.e., Inner queries: To create new tables (i.e., derived tables) that are used in other queries (in FROM clauses). To aggregate data in multiple stages (in FROM clauses). To filter results in conditional logic (in WHERE clauses). To Join a table with a derived table (in FROM clauses combined with a Joined). To combine data from 2 separate queries (typically as a result of source data coming from a split dataset). Correlated Subqueries Activity: Q1. Simple subquery. Q1Part A . Create a query to explore the products available from the vendor "Second Gear Diecast". The output should show all attributes from the products table. Q1Part B . Create a query that uses the query from part A as its source data. Then filter the results so that only those with a buyprice higher than 40 are displayed. Q2. Subqueries can be more useful if we use them for more complex processing. For example: Q2Part A. Let us change the query in Q1 part A so that it shows the minimum quantity in stock instead of all attributes/columns. Q2Part B. Create an outer query that will look at the products from Vendor “Second Gear Diecast” and identify the information for that one which has the same quantityInStock as the result of the query in Part A.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
This combination gives us all the information about the product with the lowest availability sold by “Second Gear Diecast”. get what the minimum add more complexity to the queries above by using each step to Q3. Combine 3 queries in 1. Non-Correlated subqueries are useful to add more control about the output of a query by braking up operations into steps. For example, how can we know which is the order number and the price of the most expensive order(s) (or orders if two or more orders tie on the highest price position). This is not a simple query to create, but we can break it into 3 parts to make it simpler. Part A. Create a query that will display the order number and the order total (sum of the multiplication of the price of each item in an order by quantity ordered) for each order. Calculate first column with the order number, and a second column of order totals by multiplying the quantity of each item in the order times its price and summing the results. Part A results in a list of orders with the order total as the first column, and the total spent on that order as the 2 nd column. To identify which of these is the most expensive order(s). Part B. Create a query that looks at the “Order total” (sum of the multiplication of the price of each item in an order by quantity ordered) alone. Calculate single column of order totals by multiplying the quantity of each item in the order times its price and summing the results. This query by itself just tells us what the totals are. Without identifying the orderNumber. To identify the most expensive order, we can combine it with a second query that will look at these results and identify the maximum order total. Part C. Create a query that will use a MAX function to identify the highest order total using the query from part B as its source table (derived Table in FROM clause). MAX order total from the order totals identified in PART B.
Combining a query with the Max Function and the query from Part B helps us identify the most expensive order(s)’ value. We can combine this with the query from Part A to identify orderNumber of any order with the same orderTotal amount. Part D. Use the query from Part A as your outer query. Then use a HAVING clause to filter the groups. The HAVING should filter by ordertotals that match the order total identified in Part C. In other words: <<Outer query from Part A>> HAVING [calculation for order total] = <<query from part C as Inner Query>> This results in 3 queries being combined to identify the information about the highest priced order(s). SUMMARY: Calculate the total spent in an order using the SUM function and multiplying the price of each item by the ordered quantity). Create a query that looks at the total spent on each order. Create an outer query that will use the MAX function to identify which of those totals is the maximum amount. Create one more outer query that will look at all order totals and their corresponding orderNumber by Grouping by the orderNumber. Using the query from the last step, use a HAVING clause to filter the groups by the max amount on a query.
PART 4: Practicing Case Logic Continuing with the “classic models” database, we will now use Case statements to modify the information displayed in the query output to make it more descriptive. Activity: Q5 . Create a query to explores the credit limit for each customer. Use the following logic to overwrite the credit limit with a descriptive word. For credit limits higher than 50,000, the cell should display the word “HIGH”. For credit limits lower than 10,000, the cell should display the word “LOW”. For anything between 10,000 and 50,000, the cell should display the word “MEDIUM”. For any other scenario, the cell should display the phrase “Not Available”. Q6 . Create a query that identifies the level of the employee. For context, our employee table includes a hierarchy of employees where a single (1) employee does not report to anyone since they are at the top of the hierarchy (a chief executive). All other employees are either managers, or regular staff. If an employee does not reports to someone, they are the ‘Top Manager’ of the company. If the job title mentions the keyword Manager, they should be categorized as a manager. Otherwise, the employee should be identified as a staff member. Key Takeaway -> This exercise is unique in that it shows how we can use data from 2 attributes (reportsTo, and jobTitle) to create a meaningful new attribute (ManagementLevel). SELECT employeeNumber, reportsTo, jobtitle, CASE WHEN reportsTo IS NULL THEN 'Top Management' WHEN jobtitle LIKE '%Manager%' THEN 'Manager' ELSE 'Staff' END AS ManagementLevel FROM employees;
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q7 . Create a query that identifies the status of orders in our database. Specifically, check the status attribute of the orders and classify the orders based on: If the status is ‘Shipped’ the cell should display the text ‘Order has been shipped’ If the status is ‘Cancelled’ the cell should display the text ‘Order was cancelled’ In any other case, the cell should display the text ‘Pending shipment’ SELECT status, CASE WHEN status = 'Shipped' THEN 'Shipment in progress.' WHEN status = 'Cancelled' THEN 'Order was cancelled' ELSE 'Pending shipment' END AS shipmentStatus FROM orders; Q8. Create a query to categorize the sales performance of different employees. If the number of orders for a given employee is: Higher than 30, the sales_performance should be “High”. Between 10 and 30, the sales_performance should be “Medium” Any other scenario should be “Low”.