HW1 - SQL
pdf
keyboard_arrow_up
School
University of Pennsylvania *
*We aren’t endorsed by this school
Course
550
Subject
Computer Science
Date
Feb 20, 2024
Type
Pages
11
Uploaded by ProfMaskOctopus37
CIS 5500 Spring 2024
CIS 5500: Database and Information Systems
Homework 1: SQL
This homework is based on material in Module 1 and Module 2 (corresponding questions denoted
with
M1
and
M2
hereafter).
Responses should be submitted via Gradescope using the template
file (
homework1.py
). More detailed submission instructions and autograder specifications can be
found below in the Submission and Autograder sections respectively.
Overview
This assignment comprises three parts.
Before starting Part 1, you should install DataGrip and any relevant drivers required to query an
Oracle SQL or MySQL database - please refer to the
DataGrip Handout
for further instructions.
In Part 1 you will query the IMDb dataset on an existing AWS RDS
1
Oracle SQL database.
In Part 2 you will then create your own MySQL database on RDS using the RDS handout.
You will
create a database instance by defining the tables and uploading the
OpenFlights dataset
.
In Part 3 you will query the instance that you created in Part 2.
Note: For Parts 1 and 3, you may NOT use DDL statements or views. You may, however, use the
‘WITH… AS…’ statement as needed.
Note: Make sure your answers match the provided schema. Schema is not case sensitive in Oracle
but is case sensitive in MySQL.
Advice to Students
While you should be familiar with SQL syntax from the lessons in
M1
and
M2
, you may need to
spend additional time setting up, troubleshooting, and debugging errors with the IDE (Integrated
Development Environment), AWS, data uploading, and so on. You should try to use internet
resources such as the official documentation or Stackoverflow to solve setup or other issues. If
these resources do not solve your problem, you can ask the course staff for assistance via Ed or by
coming to OH.
We understand that for many of you this is the first time using cloud resources, querying a
relational database, and setting up your own database - and these understandably have a steep
1
Relational Database Service
1
CIS 5500 Spring 2024
learning curve. The staff is happy to help you throughout this learning process but we do advise
you to *
please start early*
.
You should also spend time thinking about the correctness of your queries. While we are happy to
help you think through how to approach the query, we will not confirm the correctness of your
answers
Finally, be aware that some of the queries are difficult and may take lots of brainstorming to
complete!
2
CIS 5500 Spring 2024
Part 1: Querying an AWS Oracle Database
(Questions 1-7. 45 points)
IMDb Database.
The database you will use consists of a portion of the
Internet Movie Database
. It
has been uploaded to Amazon's AWS/RDS and conforms to the schema (where keys are underlined)
outlined in
Appendix A: The IMDb Database
of this document.
Please refer to Section 5 of the
DataGrip Handout
for instructions on connecting to the
database, if you have not already done so.
Question 1. (5 points,
M1
)
Print the name of all actors (of all genders) whose name starts with ‘S’ (case-sensitive) and ends
with a lower-case vowel (‘aeiou’, case-sensitive). Eliminate duplicates, and print the result in
alphabetical
order.
Hint
:
See resources on
like
,
substr
, and
in
.
Schema: (name)
Question 2. (5 points,
M1
)
For each movie whose runtime is longer than 100 and released after 2017 (2017 not included),
print all female actors (gender=1 in movie_cast).
Schema: (name)
Question 3. (6 points,
M2
)
For all directors who have directed at least one movie with more than a million ratings, print the
director’s name and the number of movies they directed that have more than a million ratings.
You may assume that director names are unique.
Schema: (name, num_movies)
Question 4. (5 points,
M2
)
Print the total number of crew members who have not participated in any movies since 1950
(1950 included).
Schema: (num)
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CIS 5500 Spring 2024
Question 5. (8 points,
M2
)
For all movies that have a greater rating than at least one movie directed by 'Christopher Nolan'
and were released in the same year as “Inception”, print their title and rating. Do not hard code
the release year of “Inception”.
Hint
:
As a warmup, try writing a query to find all the ratings of movies directed by Christopher
Nolan.
Schema: (title, rating)
Question 6. (8 points,
M2
)
For each movie genre, print the genre name and a list of all the movies of this genre with more
than 3 costume designers (these are people with job ‘Costume Design’ in crew_in). The list should
separate movie titles by a semicolon and space (‘; ’), and should be in alphabetical order. Order
the query result by genre name (column name ‘genre’), and name the list of movie titles
‘movies’.
Note:
For those genres with no movies satisfying the condition, do not include the genre in the
result.
As an example, there are 3 history movies each with more than 3 costume designers: ‘Bang
Rajan’, ‘Napoleon’, ‘Samson and Delilah’, so the following tuple will appear in the result:
genre
movies
History
Bang Rajan; Napoleon; Samson and Delilah
To create the list of movie titles, use the Oracle built-in function
LISTAGG
. Note that Oracle 19c
allows
the
DISTINCT
keyword
to
be
included
directly
in
the
LISTAGG
function
call,
e.g.
LISTAGG(DISTINCT …).
Hint:
As a warm-up using just the material taught in class, you can write a query which returns
all the movies with more than 3 costume designers.
Schema: (genre, movies)
Question 7. (8 points,
M2
)
We define the experience level of a director as 'Novice' if they have directed fewer than 5 movies,
and 'Experienced' otherwise. Print out the average rating of movies directed by directors from
each experience level. Round the average ratings to two decimal points, and print the results in a
descending order based on the average rating.
Hint
:
See details on the
round
function and
case
.
Schema: (exp_level, avg_rating)
4
CIS 5500 Spring 2024
Part 2: Setting up an AWS MySQL Database
The goal of this part is to get experience with MySQL and AWS RDS (Amazon Web Services
Relational Database Service), one of the most popular open-source databases and cloud-based
relational DB services respectively.
The SQL syntax used in MySQL is similar to that used in Oracle in many ways, although there are
some notable
differences
(for example, case sensitivity for object names and or keywords). A
good reference for MySQL syntax can be found
here
.
OpenFlights Database.
You will be creating three tables corresponding to the OpenFlights
database that conform to the schema (where keys are underlined) outlined in
Appendix B: The
OpenFlights Database
of this document.
If you have already not done so, install the MySQL driver for DataGrip as instructed in Section 4a
of the
DataGrip Handout
.
You should follow the
AWS RDS Handout
to create an AWS account and launch a MySQL database
instance on RDS before proceeding. Follow this handout completely before proceeding.
Use the DataGrip console to interact with your AWS RDS instance for the following parts of the
assignment.
Note
: Please fill into db_config (in the template file) the connection details of the database you
set up above. Please leave the instance running while the autograder is still running, but you may
turn it off afterwards (i.e. when the autograder successfully runs all your queries) to avoid
exceeding free tier limits.
Question 8. (15 points,
M1
)
Open an editor console corresponding to your MySQL instance and run the command
CREATE
DATABASE flights
, and then
USE flights
to create and switch to a database named ‘flights’. Refer
to
Appendix B: The OpenFlights Database
, reading very carefully through it (including any
footnotes or remarks). Reading through the specifications a few times is a good starting point.
Write SQL CREATE TABLE statements to create the database schema outlined in the appendix,
enforcing
the
primary
and
foreign
key
constraints
indicated.
Fill
in
these
CREATE
TABLE
statements as
answer8a
-
answer8c
of the template file, assuming that they will be executed in
that order. Note that not all orders of creating tables work, and your response will receive a
deduction for an invalid order!
Note: While designing the schema for these tables (especially when designating keys or other
constraints), follow the relational model as closely as possible including primary/foreign keys and
other constraints (note the requirements for the code_share attribute, which can assume only
two values: ‘Y’ or ‘’ (empty string)).
5
CIS 5500 Spring 2024
Part 3: Querying an AWS MySQL Database
(Questions 9-12. 35 points)
Download and extract the openFlights Dataset zip file, which can be found on Canvas. Once
extracted, you will see three files,
Airports.csv
,
Airlines.csv
and
Routes.csv
corresponding to the
three tables in the database. Follow Section 5 of the
AWS RDS Handout
to import these files into
their respective tables.
You are encouraged to use CTEs (i.e. WITH … AS …) to solve the following problems.
Question 9. (8 points,
M2
)
Select the top 10 airlines whose country is “United States” by the number of routes by that airline
between airports that are not in the United States or Canada. Print the airline name and number
of such routes in decreasing order of number of routes.
Schema: (name, num_routes)
Question 10 (8 points,
M2
)
Consider only international routes, i.e. ones that go between two different countries.
For each
airline, count the number of distinct destinations of international routes (call this num_cities).
Return all airlines with more than 150 num_cities, ordered by num_cities from largest to smallest,
and secondarily by the airline’s name in alphabetical order.
Schema: (airline_name, num_cities)
Question 11 (9 points,
M2
)
For each city in Mexico, return the number of other cities in Mexico that it cannot fly to by a
direct flight (Routes). (By ‘other cities’, you should not count the city itself.) Order by the
number of such cities from smallest to largest, and secondarily by the city's name in alphabetical
order.
Schema: (city, num_cities)
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CIS 5500 Spring 2024
Question 12. (10 points,
M2
)
For this question, you will restrict Routes to those using Airlines from the United Kingdom (call
this UKRoutes). This reduces the size of the relation to make the query below run faster.
You have a friend in New Jersey who says Newark (iata code “EWR”) is a “great airport”. To prove
it, he will pay for a free flight to take you from your home in Washington D.C. (city name
“Washington”) to Newark Airport. You want to know all cities that you can reach by either one
total (direct) flight or two
total flights
(where the target_id of the first is the same as the
source_id of the second).
For each city that is reachable from Washington (not including Newark or Washington) via one or
two
total
flights, return the city name and the minimum number of
paid
flights (1 or 2), where all
flights in the database are paid.
As an example, there is a direct flight from Washington to London, so the tuple (‘London’, 1)
should be in the result. There is also a direct, connecting flight from this London airport to
Brussels (but none from Newark or Washington) so the tuple (‘Brussels’, 2) should be in the result.
There is a direct flight from Newark to Paris (though none from Washington), so the tuple (‘Paris’,
1) should be in the result, as we take the free flight from Washington to Newark and then the
paid flight from Newark to Paris. However, even though there is a direct flight from the Paris
airport to Naples, (‘Naples’, 2) is not in the result since it would entail three flights total, the
first of which is free.
Hint: There are three cases that an eligible city can fall under while remaining at most two flights
away: Washington -> City, Washington -> Newark -> City, Washington -> (some other airport) ->
City. Consider how many paid flights are used for each.
Schema: (city, paid_flights)
Part 4: Generative AI Use
Question 13. (5 points)
Describe how, if at all, you used generative AI to assist with completing this homework. Recall
that you cannot use generative AI to give you the answers to homework questions, but can use it
to help understand the SQL constructs taught in class and/or introduced in this homework.
7
CIS 5500 Spring 2024
Submission
Please submit your responses using the template file (
homework1.py
).
Do not
rename this file.
You may add comments to the file as required or change the order of answers if needed, but you
may not
rename any variables corresponding to answers (this includes the key parameters in
db_config
- fill in the required values only).
You must also fill in your PennKey and your 8-digit Penn ID at the top of the file.
As mentioned in Part 2, once the autograder has finished running and displays the results of your
queries, you can terminate your RDS instance to avoid exceeding free tier limits. We will not be
able to reimburse you for any costs incurred due to exceeding free-tier limits.
Autograder
You may submit your responses multiple times.
Upon submission, the autograder will confirm if
your queries were executed successfully; if a query is executed successfully it will also tell you
the number of rows in the query result, and a (possibly condensed) preview of the result of
executing your query. It will
not
tell you whether or not the query itself is correct.
Please note that queries that do not execute successfully will be heavily penalized.
Manual grading
: Your responses are only partially autograded. Once the submission deadline is
past, we will do a round of manual checking to adjust the autograded score to adjust (usually,
add) credit based on the quality of the responses submitted.
8
CIS 5500 Spring 2024
Appendix A: The IMDb Database
Attribute
Type
genre
name
varchar(255)
Attribute
Type
movie
movie_id
2
number(7)
title
varchar(255)
runtime
number(7)
release_year
number(7)
rating
float(2)
num_ratings
number(7)
Attribute
Type
crew
id
number(7)
gender
number(7)
name
varchar(255)
Attribute
Type
movie_
cast
3
id
number(7)
gender
4
number(7)
name
varchar(255)
Attribute
Type
cast_in
movie_id
number(7)
cast_id
5
number(7)
charac
varchar(255)
Attribute
Type
crew_in
movie_id
number(7)
crew_id
6
number(7)
job
varchar(255)
Attribute
Type
movie_
genre
movie_id
number(7)
genre_name
7
varchar(255)
Attribute
Type
movie_k
eyword
kwd_name
varchar(255)
movie_id
number(7)
* Filled colors in the ‘Attribute’ columns indicate references between relations
* Keys are underlined
Notes
:
●
In this part of the homework, “crew” refers to staff members (e.g. directors, electricians,
music editors) whereas “cast” refers to actors and actresses in a movie.
●
The
relations
‘crew’
and
‘movie_cast’
contain
information
about
the
person
who
contributed as a staff or as an actor/actress, respectively.
●
The relations
‘crew_in’
and
‘cast_in’
describe the participation between a staff and a
movie, or between an actor/actress and a movie, respectively.
7
genre_name references
genre
(name)
6
crew_id references
crew
(id)
5
cast_id references
movie_cast
(id)
4
0 = Other/Not specified, 1 = Female, 2 = Male
3
Think of this as a relation of actors
2
cast_in
(movie_id),
crew_in
(movie_id),
movie_genre
(movie_id), and
movie_keyword
(movie_id) - all
reference
movie
(movie_id)
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CIS 5500 Spring 2024
●
Recall the definition of primary key. In the relation
‘movie’
, the same movie name can
correspond to multiple release years. We always identify ‘a movie’ by its movie_id,
instead of the movie name, as suggested by the primary key constraint.
●
You can ignore the table
keyword
, as it is empty.
10
CIS 5500 Spring 2024
Appendix B: The OpenFlights Database
Attribute
Type
Airlines
id
int
name
varchar(255)
alias
varchar(255)
iata
char(2)
icao
char(3)
callsign
varchar(255)
country
varchar(255)
active
char(1)
Attribute
Type
Airports
id
int
name
varchar(255)
city
varchar(255)
country
varchar(255)
iata
char(3)
icao
char(4)
lat
decimal(8,6)
lon
decimal(9,6)
alt
int
timezone
decimal(3,1)
dst
char(1)
tz
varchar(255)
Attribute
Type
Routes
airline_iata
char(3)
airline_id
8
int
src_iata_icao
char(4)
source_id
9
int
target_iata_icao
char(4)
target_id
10
int
code_share
char(1)
equipment
char(20)
* Filled colors in the ‘Attribute’ columns indicate references between relations
* Keys are underlined
* Relation names are case-sensitive in MySQL
NOTE
: The attribute codeshare in the table Routes can only assume two values: ‘Y’ or ‘’ (empty string)
corresponding to whether or not the flight is a codeshare, that is, not operated by Airline but another
carrier.
Read up
on the ‘CHECK’ syntax and include it in your DDL statement(s).
10
target_id is a foreign key referencing
Airports
(id)
9
source_id is a foreign key referencing
Airports
(id)
8
airline_id is a foreign key referencing
Airlines
(id)
11
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/afea1/afea10491f15304b6bbfa1832aa7a5981316582f" alt="Text book image"
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
data:image/s3,"s3://crabby-images/b907a/b907ada1f4be11d175260bd2a8acbc475b9f1fe1" alt="Text book image"
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/f69b6/f69b6127845775e68542aa44ed44f5dcebe26fad" alt="Text book image"
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Recommended textbooks for you
- Programming with Microsoft Visual Basic 2017Computer ScienceISBN:9781337102124Author:Diane ZakPublisher:Cengage LearningCOMPREHENSIVE MICROSOFT OFFICE 365 EXCEComputer ScienceISBN:9780357392676Author:FREUND, StevenPublisher:CENGAGE LNp Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:Cengage
- Systems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage LearningMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,
data:image/s3,"s3://crabby-images/afea1/afea10491f15304b6bbfa1832aa7a5981316582f" alt="Text book image"
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
data:image/s3,"s3://crabby-images/b907a/b907ada1f4be11d175260bd2a8acbc475b9f1fe1" alt="Text book image"
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/f69b6/f69b6127845775e68542aa44ed44f5dcebe26fad" alt="Text book image"
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,