hw04
pdf
keyboard_arrow_up
School
Laney College *
*We aren’t endorsed by this school
Course
6
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
13
Uploaded by BaronPelican2498
hw04
December 2, 2023
[1]:
# Initialize Otter
import
otter
grader
=
otter
.
Notebook(
"hw04.ipynb"
)
1
Homework 4: Functions, Tables, and Groups
Please complete this notebook by filling in the cells provided. Before you begin, execute the previous
cell to load the provided tests.
Helpful Resource:
-
Python Reference
: Cheat sheet of helpful array & table methods used in
Data 8!
Recommended Readings
:
•
Visualizing Numerical Distributions
•
Functions and Tables
Please complete this notebook by filling in the cells provided.
Before you begin, execute the
following cell to setup the notebook by importing some helpful libraries. Each time you start your
server, you will need to execute this cell again.
For all problems that you must write explanations and sentences for, you
must
provide your answer
in the designated space.
Moreover, throughout this homework and all future ones, please
be sure to not re-assign variables throughout the notebook!
For example, if you use
max_temperature
in your answer to one question, do not reassign it later on. Otherwise, you will
fail tests that you thought you were passing previously!
Note: This homework has hidden tests on it. That means even though the tests may
say 100% passed, it doesn’t mean your final grade will be 100%. We will be running
more tests for correctness once everyone turns in the homework.
Directly sharing answers is not okay, but discussing problems with the course staff or with other
students is encouraged.
You should start early so that you have time to get help if you’re stuck.
1
1.1
1. Burrito-ful San Diego
[2]:
# Run this cell to set up the notebook, but please don't change it.
# These lines import the Numpy and Datascience modules.
import
numpy
as
np
from
datascience
import
*
import
d8error
# These lines do some fancy plotting magic.
import
matplotlib
%
matplotlib
inline
import
matplotlib.pyplot
as
plt
plt
.
style
.
use(
'fivethirtyeight'
)
import
warnings
warnings
.
simplefilter(
'ignore'
,
FutureWarning
)
warnings
.
filterwarnings(
"ignore"
)
Mira,
Sofia,
and
Sara
are
trying
to
use
Data
Science
to
find
the
best
burritos
in
San
Diego!
Their
friends
Jessica
and
Sonya
provided
them
with
two
comprehensive
datasets
on
many
burrito
establishments
in
the
San
Diego
area
taken
from
(and
cleaned
from):
https://www.kaggle.com/srcole/burritos-in-san-diego/data
The following cell reads in a table called
ratings
which contains names of burrito restaurants, their
Yelp rating, Google rating, as well as their overall rating. The
Overall
rating is not an average of
the
Yelp
and
Google
ratings, but rather it is the overall rating of the customers that were surveyed
in the study above.
It also reads in a table called
burritos_types
which contains names of burrito restaurants, their
menu items, and the cost of the respective menu item at the restaurant.
[3]:
# Just run this cell
ratings
=
Table
.
read_table(
"ratings.csv"
)
ratings
.
show(
5
)
burritos_types
=
Table
.
read_table(
"burritos_types.csv"
)
.
drop(
0
)
burritos_types
.
show(
5
)
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
Question 1.
It would be easier if we could combine the information in both tables.
Assign
burritos
to the result of joining the two tables together, so that we have a table with the ratings
for every corresponding menu item from every restaurant. Each menu item has the same rating as
the restaurant from which it is from.
(8 Points)
Note:
It doesn’t matter which table you put in as the argument to the table method, either order
will work for the autograder tests.
Hint:
If you need help on using the
join
method, refer to the
Python Reference Sheet
or
Section
2
8.4
in the textbook.
[4]:
burritos
=
burritos_types
.
join(
'Name'
, ratings)
burritos
.
show(
5
)
<IPython.core.display.HTML object>
[5]:
grader
.
check(
"q1_1"
)
[5]:
q1_1 results: All test cases passed!
Question 2.
Let’s look at how the Yelp scores compare to the Google scores in the
burritos
table.
First, assign
yelp_and_google
to a table only containing the columns
Yelp
and
Google
.
Then, make a scatter plot with Yelp scores on the x-axis and the Google scores on the y-axis.
(8
Points)
[6]:
yelp_and_google
=
burritos
.
select(
'Yelp'
,
'Google'
)
yelp_and_google
.
scatter(
'Yelp'
,
'Google'
)
# Don't change/edit/remove the following line.
# To help you make conclusions, we have plotted a straight line on the graph
␣
↪
(y=x).
plt
.
plot(np
.
arange(
2.5
,
5
,
.5
), np
.
arange(
2.5
,
5
,
.5
));
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[7]:
grader
.
check(
"q1_2"
)
[7]:
q1_2 results: All test cases passed!
Question 3.
Looking at the scatter plot you just made in Question 1.2, do you notice any pattern(s)
(i.e. is one of the two types of scores consistently higher than the other one)? If so, describe them
briefly
in the cell below.
(8 Points)
The pattern I noticed on the scatter plot is that Google on the y-axis is higher then Yelp that is
on x-axis
Here’s a refresher on how
.group
works! You can read how
.group
works in the
textbook
, or you
can view the video below. The video resource was made by a past staff member, Divyesh Chotai!
You can also use the
Table Functions Visualizer
to get some more hands-on experience with the
.group
function.
[8]:
from
IPython.display
import
YouTubeVideo
YouTubeVideo(
"HLoYTCUP0fc"
)
[8]:
4
Question 4.
There are so many types of California burritos in the
burritos
table! Sara wants
to know which type is the highest rated across all restaurants. For the sake of these questions, we
are treating each menu item’s rating the same as its respective restaurant’s, as we do not have the
rating of every single item at these restaurants. You do not need to worry about this fact, but we
thought to mention it!
Create a table with two columns: the first column include the names of the burritos and the second
column should contain the average overall rating of that burrito across restaurants.
In your
calculations, you should only compare burritos that contain the word “California”.
For example, there are “California” burritos, “California Breakfast” burritos, “California Surf And
Turf” burritos, etc.
(9 Points)
Hint:
If multiple restaurants serve the “California - Chicken” burrito, what table method can we
use to aggregate those together and find the average overall rating?
Note:
For reference, the staff solution only used one line.
However, feel free to break up
the solution into multiple lines and steps; just make sure you assign the final output table to
california_burritos
!
[9]:
california_burritos
=
burritos
.
where(
'Menu_Item'
, are
.
containing(
'California'
))
.
↪
select(
'Menu_Item'
,
'Overall'
)
.
group(
'Menu_Item'
,np
.
average)
california_burritos
5
[9]:
Menu_Item
| Overall average
California
| 3.5242
California (Only Cheese)
| 4.1
California + Guac + Sour Cream | 3.4
California - Chicken
| 3.45839
California - Pork Adobada
| 3.26429
California - Steak
| 3.26429
California Breakfast
| 2.75833
California Chicken
| 3.54815
California Chipotle
| 4.36667
California Everything
| 4.1
… (9 rows omitted)
[10]:
grader
.
check(
"q1_4"
)
[10]:
q1_4 results: All test cases passed!
Question 5.
Given this new table
california_burritos
, Sara can figure out the name of the
California burrito with the highest overall average rating! Assign
best_california_burrito
to
a line of code that outputs the string that represents the name of the California burrito with the
highest overall average rating. If multiple burritos satisfy this criteria, you can output any one of
them.
(8 Points)
[11]:
best_california_burrito
=
california_burritos
.
sort(
'Overall average'
,
␣
↪
descending
=
True
)
.
column(
'Menu_Item'
)
.
item(
0
)
best_california_burrito
[11]:
'California Chipotle'
[12]:
grader
.
check(
"q1_5"
)
[12]:
q1_5 results: All test cases passed!
Question 6.
Mira thinks that burritos in San Diego are cheaper (and taste better) than the
burritos in Berkeley. Plot a histogram that visualizes that distribution of the costs of the burritos
from San Diego in the
burritos
table.
Also use the provided
bins
variable when making your
histogram, so that the histogram is more visually informative.
(8 Points)
[14]:
bins
=
np
.
arange(
0
,
15
,
1
)
# Please also use the provided bins
burritos
.
hist(
'Cost'
, bins
=
np
.
arange(
0
,
15
,
1
))
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question
7.
What
percentage
of
burritos
in
San
Diego
are
less
than
$6?
Assign
burritos_less_than_6
to your answer, which should be between 0 and 100.
You should only
use the histogram above to answer the question.
Do not use code on the table to to find the
answer, just eyeball the heights and use Python to evaluate your arithmetic!
(8 Points)
Note
: Your answer does not have to be exact, but it should be within a couple percentages of the
staff answer.
[15]:
burritos_less_than_6
= 23
[81]:
grader
.
check(
"q1_7"
)
[81]:
q1_7 results: All test cases passed!
1.2
2. San Francisco City Employee Salaries
This exercise is designed to give you practice with using the Table methods
.pivot
and
.group
.
Here is a link to the
Python Reference Sheet
in case you need a quick refresher. The
Table Function
Visualizer
may also be a helpful tool.
Run the cell below to view a demo on how you can use pivot on a table. (Thank you to past staff
Divyesh Chotai!)
[82]:
from
IPython.display
import
YouTubeVideo
YouTubeVideo(
"4WzXo8eKLAg"
)
7
[82]:
The data source we will use within this portion of the homework is
publicly provided
by the City
of San Francisco. We have filtered it to retain just the relevant columns and restricted the data to
the calendar year 2019. Run the following cell to load our data into a table called
full_sf
.
[83]:
full_sf
=
Table
.
read_table(
"sf2019.csv"
)
full_sf
.
show(
5
)
<IPython.core.display.HTML object>
The table has one row for each of the 44,525 San Francisco government employees in 2019.
The first four columns describe the employee’s job. For example, the employee in the third row of
the table had a job called “IS Business Analyst-Senior”. We will call this the employee’s
position
or
job title
. The job was in a Job Family called Information Systems (hence the IS in the job title),
and was in the Adult Probation Department that is part of the Public Protection Organization
Group of the government. You will mostly be working with the
Job
column.
The next three columns contain the dollar amounts paid to the employee in the calendar year 2019
for salary, overtime, and benefits. Note that an employee’s salary does not include their overtime
earnings.
The last column contains the total compensation paid to the employee. It is the sum of the previous
three columns:
8
Total Compensation
=
Salary
+
Overtime
+
Benefits
For this homework, we will be using the following columns: 1.
Organization Group
: A group of
departments. For example, the
Public Protection
Org. Group includes departments such as the
Police, Fire, Adult Protection, District Attorney, etc. 2.
Department
: The primary organizational
unit used by the City and County of San Francisco.
3.
Job
: The specific position that a given
worker fills. 4.
Total Compensation
: The sum of a worker’s salary, overtime, and benefits in 2019.
Run the following cell to select the relevant columns and create a new table named
sf
.
[84]:
sf
=
full_sf
.
select(
"Job"
,
"Department"
,
"Organization Group"
,
"Total
␣
↪
Compensation"
)
sf
.
show(
5
)
<IPython.core.display.HTML object>
We want to use this table to generate arrays with the job titles of the members of each
Organiza-
tion Group
.
Question 1.
Set
job_titles
to a table with two columns.
The first column should be called
Organization Group
and have the name of every “Organization Group” once, and the second
column should be called
Jobs
with each row in that second column containing an
array
of the
names of all the job titles within that “Organization Group”. Don’t worry if there are multiple of
the same job titles.
(9 Points)
Hint:
Think about how
group
works: it collects values into an array and then applies a function
to that array. We have defined two functions below for you, and you will need to use one of them
in your call to
group
.
[85]:
# Pick one of the two functions defined below in your call to group.
def
first_item
(array):
'''Returns the first item'''
return
array
.
item(
0
)
def
full_array
(array):
'''Returns the array that is passed through'''
return
array
# Make a call to group using one of the functions above when you define
␣
↪
job_titles
job_titles
=
sf
.
select(
'Organization Group'
,
'Job'
)
.
group(
'Organization Group'
,
␣
↪
collect
=
full_array)
.
relabeled(
'Job full_array'
,
'Jobs'
)
job_titles
[85]:
Organization Group
| Jobs
Community Health
| ['Painter Supervisor 1' 'Painter'
'Painter' …, 'Nursin …
Culture & Recreation
| ['Electrician' 'Executive Secretary
2' 'Bldgs & Grounds
…
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
General Administration & Finance
| ['Painter' 'Painter' 'Electrician'
…, 'Investigator, T …
Human Welfare & Neighborhood Development | ['Dept Head I' 'Administrative
Analyst' 'Community Devel …
Public Protection
| ['IS Trainer-Journey' 'IS Engineer-
Assistant' 'IS Busine …
Public Works, Transportation & Commerce
| ['Heavy Equip Ops Asst Sprv' 'Heavy
Equipment Ops Sprv'
…
[86]:
grader
.
check(
"q2_1"
)
[86]:
q2_1 results: All test cases passed!
[87]:
temp
=
sf
.
select(
'Organization Group'
,
'Job'
)
.
sort(
'Job'
)
.
group(
'Organization
␣
↪
Group'
, collect
=
full_array)
.
relabeled(
'Job full_array'
,
'Jobs'
)
temp
[87]:
Organization Group
| Jobs
Community Health
| ['Account Clerk' 'Account Clerk'
'Account Clerk' …, 'U …
Culture & Recreation
| ['Account Clerk' 'Account Clerk'
'Accountant II' …,
' …
General Administration & Finance
| ['ASR Operations Supervisor' 'ASR
Operations Supervisor' …
Human Welfare & Neighborhood Development | ['Account Clerk' 'Account Clerk'
'Account Clerk' …,
' …
Public Protection
| ['ACPO,JuvP, Juv Prob (SFERS)'
'Account Clerk' 'Account
…
Public Works, Transportation & Commerce
| ['Account Clerk' 'Account Clerk'
'Account Clerk' …,
' …
Understanding the code you just wrote in 2.1 is important for moving forward with
the class! If you made a lucky guess, take some time to look at the code, step by step.
Offce hours is always a great resource!
Question 2.
At the moment, the
Job
column of the
sf
table is not sorted (no particular order). Would the arrays you generated in the
Jobs
column of
the previous question be the same if we had sorted alphabetically instead before generating them?
Explain your answer. To receive full credit, your answer should reference
how
the
.group
method
works, and how sorting the
Jobs
column would affect this.
(8 Points)
Note:
Two arrays are the
same
if they contain the same number of elements and the elements
located at corresponding indexes in the two arrays are identical. An example of arrays that are
NOT the same:
array([1,2]) != array([2,1])
.
The group was used to generated all the jobs positon in the organization Group, and using sort
10
affect Job table by making it in ascending order from Alphabetical.
Question 3.
Set
department_ranges
to a table containing departments as the rows, and the orga-
nization groups as the columns. The values in the rows should correspond to a total compensation
range, where range is defined as the
difference between the highest total compensation
and the lowest total compensation in the department for that organization group
.
(9
Points)
Hint 1:
First you’ll need to define a new function
compensation_range
which takes in an array of
compensations and returns the range of compensations in that array.
Hint 2:
What table function allows you to specify the rows and columns of a new table?
You
probably watched a video on it earlier in the homework!
[88]:
# Define compensation_range first
def
compensation_range
(salary_range):
temp
=
max
(salary_range)
-
min
(salary_range)
return
temp
department_ranges
=
sf
.
pivot(
'Organization Group'
,
'Department'
,
'Total
␣
↪
Compensation'
, compensation_range)
department_ranges
[88]:
Department
| Community Health | Culture & Recreation | General
Administration & Finance | Human Welfare & Neighborhood Development | Public
Protection | Public Works, Transportation & Commerce
Academy Of Sciences
| 0
| 199121
| 0
| 0
| 0
| 0
Administrative Services | 0
| 0
| 478784
| 0
| 0
| 0
Adult Probation
| 0
| 0
| 0
| 0
| 303419
| 0
Airport Commission
| 0
| 0
| 0
| 0
| 0
| 445092
Art Commission
| 0
| 251823
| 0
| 0
| 0
| 0
Asian Art Museum
| 0
| 298230
| 0
| 0
| 0
| 0
Assessor
| 0
| 0
| 277385
| 0
| 0
| 0
Board Of Appeals
| 0
| 0
| 0
| 0
| 0
| 243582
Board Of Supervisors
| 0
| 0
| 293773
| 0
| 0
| 0
Building Inspection
| 0
| 0
| 0
| 0
| 0
| 340852
… (41 rows omitted)
[89]:
grader
.
check(
"q2_3"
)
11
[89]:
q2_3 results: All test cases passed!
Question 4.
Give an explanation as to why some of the row values are
0
in the
department_ranges
table from the previous question.
(8 Points)
The reason why some of the row values are 0 because I believe that within the department, each
of the organizaton group has most total compensation.
Question 5.
Find the number of departments appearing in the
sf
table that have an average total
compensation of greater than 125,000 dollars; assign this value to the variable
num_over_125k
.
(9
Points)
Hint:
The variable names provided are meant to help guide the intermediate steps and general
thought process. Feel free to delete them if you’d prefer to start from scratch, but make sure your
final answer is assigned to
num_over_125k
!
[102]:
depts_and_comp
=
sf
.
select(
'Department'
,
'Total Compensation'
)
avg_of_depts
=
depts_and_comp
.
group (
'Department'
, collect
=
np
.
mean)
.
↪
relabeled(
'Total Compensation mean'
,
'Average Compensation'
)
num_over_125k
=
avg_of_depts
.
where(
'Average Compensation'
,are
.
above(
125000
))
.
↪
num_rows
num_over_125k
[102]:
23
[103]:
grader
.
check(
"q2_5"
)
[103]:
q2_5 results: All test cases passed!
You’re done with Homework 4!
Important submission steps:
1. Run the tests and verify that they all pass. 2. Choose
Save
Notebook
from the
File
menu, then
run the final cell
. 3. Click the link to download the zip
file.
4.
Then submit the zip file to the corresponding assignment according to your instructor’s
directions.
It is your responsibility to make sure your work is saved before running the last cell.
1.3
Submission
Make sure you have run all cells in your notebook in order before running the cell below, so that
all images/graphs appear in the output. The cell below will generate a zip file for you to submit.
Please save before exporting!
[104]:
# Save your notebook first, then run this cell to export your submission.
grader
.
export(pdf
=
False
, run_tests
=
True
)
Running your submission against local test cases…
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Your submission received the following results when run against available test
cases:
q1_1 results: All test cases passed!
q1_2 results: All test cases passed!
q1_4 results: All test cases passed!
q1_5 results: All test cases passed!
q1_7 results: All test cases passed!
q2_1 results: All test cases passed!
q2_3 results: All test cases passed!
q2_5 results: All test cases passed!
<IPython.core.display.HTML object>
13
Related Documents
Recommended textbooks for you

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Recommended textbooks for you
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENT
- Microsoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:CengageSystems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage Learning

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning