hw04
pdf
keyboard_arrow_up
School
University of California, Berkeley *
*We aren’t endorsed by this school
Course
C8
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
12
Uploaded by AdmiralAtom103517
hw04
November 30, 2023
[1]:
# Initialize Otter
import
otter
grader
=
otter
.
Notebook(
"hw04.ipynb"
)
1
Homework 4: Functions, Tables, and Groups
Please complete this notebook by filling in the cells provided. Before you begin, execute the previous
cell to load the provided tests.
Helpful Resource:
-
Python Reference
: Cheat sheet of helpful array & table methods used in
Data 8!
Recommended Readings
:
•
Visualizing Numerical Distributions
•
Functions and Tables
Please complete this notebook by filling in the cells provided.
Before you begin, execute the
cell below to setup the notebook by importing some helpful libraries.
Each time you
start your server, you will need to execute this cell again.
For all problems that you must write explanations and sentences for, you
must
provide your
answer in the designated space. Moreover, throughout this homework and all future ones,
please
be sure to not re-assign variables throughout the notebook!
For example, if you use
max_temperature
in your answer to one question, do not reassign it later on. Otherwise, you will
fail tests that you thought you were passing previously!
Deadline:
This assignment is
due Wednesday, 9/20 at 11:00pm PT
. Turn it in by Tuesday, 9/19 at
11:00pm PT for 5 extra credit points. Late work will not be accepted as per the
policies
page.
Note: This homework has hidden tests on it. That means even though tests may say
100% passed, it doesn’t mean your final grade will be 100%. We will be running more
tests for correctness once everyone turns in the homework.
Directly sharing answers is not okay, but discussing problems with the course staff or with other
students is encouraged. Refer to the
policies
page to learn more about how to learn cooperatively.
You should start early so that you have time to get help if you’re stuck.
Offce hours are held
Monday through Friday in
Warren Hall
101B. The offce hours schedule appears on
our OH page
.
1
1.1
1. Burrito-ful San Diego
[2]:
# Run this cell to set up the notebook, but please don't change it.
# These lines import the Numpy and Datascience modules.
import
numpy
as
np
from
datascience
import
*
# These lines do some fancy plotting magic.
import
matplotlib
%
matplotlib
inline
import
matplotlib.pyplot
as
plt
plt
.
style
.
use(
'fivethirtyeight'
)
import
warnings
warnings
.
simplefilter(
'ignore'
,
FutureWarning
)
warnings
.
filterwarnings(
"ignore"
)
Eunice,
Ciara and Kanchana are trying to use Data Science to find the best burritos in
San Diego!
Their friends Jessica and Sonya provided them with two comprehensive datasets
on
many
burrito
establishments
in
the
San
Diego
area
taken
from
(and
cleaned
from):
https://www.kaggle.com/srcole/burritos-in-san-diego/data
The following cell reads in a table called
ratings
which contains names of burrito restaurants, their
Yelp rating, Google rating, as well as their overall rating. The
Overall
rating is not an average of
the
Yelp
and
Google
ratings, but rather it is the overall rating of the customers that were surveyed
in the study above.
It also reads in a table called
burritos_types
which contains names of burrito restaurants, their
menu items, and the cost of the respective menu item at the restaurant.
[3]:
# Just run this cell
ratings
=
Table
.
read_table(
"ratings.csv"
)
ratings
.
show(
5
)
burritos_types
=
Table
.
read_table(
"burritos_types.csv"
)
.
drop(
0
)
burritos_types
.
show(
5
)
<IPython.core.display.HTML object>
<IPython.core.display.HTML object>
Question 1.
It would be easier if we could combine the information in both tables.
Assign
burritos
to the result of joining the two tables together, so that we have a table with the ratings
for every corresponding menu item from every restaurant. Each menu item has the same rating as
the restaurant from which it is from.
(8 Points)
Note:
It doesn’t matter which table you put in as the argument to the table method, either order
will work for the autograder tests.
Hint:
Which function lets us combine information from two tables? Refer to the
Python Reference
Sheet
if you’re unsure!
2
[8]:
burritos
=
ratings
.
join(
"Name"
,burritos_types,
"Name"
)
burritos
.
show(
10
)
<IPython.core.display.HTML object>
[9]:
grader
.
check(
"q1_1"
)
[9]:
q1_1 results: All test cases passed!
Question 2.
Let’s look at how the Yelp scores compare to the Google scores in the
burritos
table.
First, assign
yelp_google_tbl
to a table only containing the columns
Yelp
and
Google
.
Then, make a scatter plot with Yelp scores on the x-axis and the Google scores on the y-axis.
(8
Points)
[10]:
yelp_google_tbl
=
burritos
.
select(
"Yelp"
,
"Google"
)
yelp_google_tbl
.
scatter(
"Yelp"
,
"Google"
)
# Don't change/edit/remove the following line.
# To help you make conclusions, we have plotted a straight line on the graph
␣
↪
(y=x).
plt
.
plot(np
.
arange(
2.5
,
5
,
.5
), np
.
arange(
2.5
,
5
,
.5
));
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 3.
Looking at the scatter plot you just made in Question 1.2, do you notice any pattern(s)
or relationships between Yelp and Google ratings (i.e. is one of the two types of scores consistently
higher than the other one)? If so, describe them
briefly
in the cell below.
(8 Points)
One trend is that as the Yelp score increases, the Google score tends to increase. Another trend is
that the Google scores are consistently higher (only one score below 3.0)
Here’s a refresher on how
.group
works! You can read how
.group
works in the
textbook
, or you
can view the video below. The video resource was made by a past staff member, Divyesh Chotai!
You can also use the
Table Functions Visualizer
to get some more hands-on experience with the
.group
function.
[11]:
# Don't worry about what this code does! It simply embeds a YouTube video into
␣
↪
a code cell.
from
IPython.display
import
YouTubeVideo
YouTubeVideo(
"HLoYTCUP0fc"
)
[11]:
Question 4.
There are so many types of California burritos in the
burritos
table!
Kanchana
wants to consider her options for burritos based on rankings. For the sake of these questions, we
4
are treating each menu item’s rating the same as its respective restaurant’s, as we do not have the
rating of every single item at these restaurants. You do not need to worry about this fact, but we
thought to mention it!
Create a table with two columns: the first column include the
names of the burritos
and the
second column should contain the
average overall rating
of that burrito across restaurants.
In
your calculations, you should only compare burritos that contain the word “California”.
For example, there are “California” burritos, “California Breakfast” burritos, “California Surf And
Turf” burritos, etc.
(9 Points)
Hint:
“California” is case sensitive (“California” is not the same as “california”)!
Note:
For reference, the staff solution only used one line.
However, feel free to break up
the solution into multiple lines and steps; just make sure you assign the final output table to
california_burritos
!
[16]:
california_burritos
=
burritos
.
where(
"Menu_Item"
,are
.
containing(
"California"
))
.
↪
drop(
"Name"
,
"Yelp"
,
"Google"
,
"Cost"
)
.
group(
"Menu_Item"
, np
.
mean)
california_burritos
[16]:
Menu_Item
| Overall mean
California
| 3.5242
California (Only Cheese)
| 4.1
California + Guac + Sour Cream | 3.4
California - Chicken
| 3.45839
California - Pork Adobada
| 3.26429
California - Steak
| 3.26429
California Breakfast
| 2.75833
California Chicken
| 3.54815
California Chipotle
| 4.36667
California Everything
| 4.1
… (9 rows omitted)
[17]:
grader
.
check(
"q1_4"
)
[17]:
q1_4 results: All test cases passed!
Question 5.
Given this new table
california_burritos
, Ciara can figure out the name of the
California burrito with the highest overall average rating! Assign
best_california_burrito
to
a line of code that outputs the string that represents the name of the California burrito with the
highest overall average rating. If multiple burritos satisfy this criteria, you can output any one of
them.
(8 Points)
[19]:
best_california_burrito
=
california_burritos
.
sort(
"Overall mean"
,
␣
↪
descending
=
True
)
.
column(
"Menu_Item"
)
.
item(
0
)
best_california_burrito
[19]:
'California Chipotle'
5
[20]:
grader
.
check(
"q1_5"
)
[20]:
q1_5 results: All test cases passed!
Question 6.
Eunice thinks that burritos in San Diego are cheaper (and taste better) than the
burritos in Berkeley. Plot a histogram that visualizes that distribution of the costs of the burritos
from San Diego in the
burritos
table. Also use the provided
cost_bins
variable when making
your histogram, so that the histogram is more visually informative.
(8 Points)
[23]:
cost_bins
=
np
.
arange(
0
,
15
,
1
)
# Do not change this line
# Please also use the provided bins
burritos
.
hist(
"Cost"
,bins
=
cost_bins)
Question
7.
What
percentage
of
burritos
in
San
Diego
are
less
than
$6?
Assign
burritos_less_than_six
to your answer,
which should be between 0 and 100
. You should
only use the histogram above to answer the question. Do not use code on the table to to find the
answer, just eyeball the heights and use Python to evaluate your arithmetic!
(8 Points)
Note
: Your answer does not have to be exact, but it should be within a couple percentages of the
staff answer.
Hint:
It might be helpful to review
Section 7.2
of the textbook about calculating percentages from
histograms.
[29]:
burritos_less_than_six
= 1+4+17
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[28]:
grader
.
check(
"q1_7"
)
[28]:
q1_7 results: All test cases passed!
1.2
2. San Francisco City Employee Salaries
This exercise is designed to give you practice with using the Table methods
.pivot
and
.group
.
Here is a link to the
Python Reference
in case you need a quick refresher.
The
Table Function
Visualizer
may also be a helpful tool.
Run the cell below to view a demo on how you can use pivot on a table. (Thank you to past staff
Divyesh Chotai!)
[30]:
# Don't worry about what this code does! It simply embeds a YouTube video into
␣
↪
a code cell.
from
IPython.display
import
YouTubeVideo
YouTubeVideo(
"4WzXo8eKLAg"
)
[30]:
The data source we will use within this portion of the homework is
publicly provided
by the City
of San Francisco. We have filtered it to retain just the relevant columns and restricted the data to
the calendar year 2019. Run the following cell to load our data into a table called
full_sf
.
7
[31]:
full_sf
=
Table
.
read_table(
"sf2019.csv"
)
full_sf
.
show(
5
)
<IPython.core.display.HTML object>
The table has one row for each of the 44,525 San Francisco government employees in 2019.
The first four columns describe the employee’s job. For example, the employee in the third row of
the table had a job called “IS Business Analyst-Senior”. We will call this the employee’s
position
or
job title
. The job was in a Job Family called Information Systems (hence the IS in the job title),
and was in the Adult Probation Department that is part of the Public Protection Organization
Group of the government. You will mostly be working with the
Job
column.
The next three columns contain the dollar amounts paid to the employee in the calendar year 2019
for salary, overtime, and benefits. Note that an employee’s salary does not include their overtime
earnings.
The last column contains the total compensation paid to the employee. It is the sum of the previous
three columns:
Total Compensation
=
Salary
+
Overtime
+
Benefits
For this homework, we will be using the following columns: 1.
Organization Group
: A group of
departments. For example, the Public Protection Org. Group includes departments such as the
Police, Fire, Adult Protection, District Attorney, etc. 2.
Department
: The primary organizational
unit used by the City and County of San Francisco.
3.
Job
: The specific position that a given
worker fills. 4.
Total Compensation
: The sum of a worker’s salary, overtime, and benefits in 2019.
Run the following cell to select the relevant columns and create a new table named
sf
.
[32]:
sf
=
full_sf
.
select(
"Job"
,
"Department"
,
"Organization Group"
,
"Total
␣
↪
Compensation"
)
sf
.
show(
5
)
<IPython.core.display.HTML object>
We want to use this table to generate arrays with the job titles of the members of each
Organiza-
tion Group
.
Question 1.
Set
job_titles
to a table with two columns.
The first column should be called
Organization Group
and have the name of every “Organization Group” each listed only once in
this column, and the second column should be called
Jobs
with each row in that second column
containing an
array
of the names of all the job titles within that “Organization Group”.
Don’t
worry if there are multiple of the same job titles.
(9 Points)
Hint 1:
Think about how
group
works: it collects values into an array and then applies a function
to that array. We have defined two functions below for you, and you will need to use one of them
in your call to
group
.
Hint 2:
You might need to rename one of the columns.
8
[55]:
# Pick one of the two functions defined below in your call to group.
def
first_item
(array):
'''Returns the first item'''
return
array
.
item(
0
)
def
full_array
(array):
'''Returns the array that is passed through'''
return
array
# Make a call to group using one of the functions above when you define
␣
↪
job_titles
job_titles
=
Table()
.
with_columns(
"Organization Group"
, sf
.
group(
"Organization
␣
↪
Group"
)
.
column(
"Organization Group"
),
"Jobs"
, sf
.
drop(
"Department"
,
"Total
␣
↪
Compensation"
)
.
group(
"Organization Group"
, full_array)
.
↪
column(
"Job full_array"
))
job_titles
[55]:
Organization Group
| Jobs
Community Health
| ['Painter Supervisor 1' 'Painter'
'Painter' … 'Nursing …
Culture & Recreation
| ['Electrician' 'Executive Secretary
2' 'Bldgs & Grounds
…
General Administration & Finance
| ['Painter' 'Painter' 'Electrician'
… 'Investigator, Ta …
Human Welfare & Neighborhood Development | ['Dept Head I' 'Administrative
Analyst' 'Community Devel …
Public Protection
| ['IS Trainer-Journey' 'IS Engineer-
Assistant' 'IS Busine …
Public Works, Transportation & Commerce
| ['Heavy Equip Ops Asst Sprv' 'Heavy
Equipment Ops Sprv'
…
[56]:
grader
.
check(
"q2_1"
)
[56]:
q2_1 results: All test cases passed!
Question 2.
At the moment, the
Job
column of the
sf
table is not sorted (no particular order).
Would the arrays you generated in the
Jobs
column of the previous question be the same if we had
sorted alphabetically instead before generating them? Explain your answer. To receive full credit,
your answer should reference
how
the
.group
method works, and how sorting the
Jobs
column
would affect this.
(8 Points)
Note:
Two arrays are the
same
if they contain the same number of elements and the elements
located at corresponding indexes in the two arrays are identical. An example of arrays that are
NOT the same:
array([1,2]) != array([2,1])
.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
No, they would not be the same arrays because the .group method works by grouping all of the
same values in one column together (in this case the organization group column) and then applying
the inputted function to aggregate the values in the other columns of the table. In this case, we
used the function full_array which aggregates the other columns by creating an array with all of
their values for a given organization group, listing them from top to bottom, and thus if we sorted
the job titles alphabetically, the titles more towards the beginning of the alphabet would be listed
towards the top and then maintain this order when they are put into their respective arrays. Thus,
the elements of the arrays if we sorted alphabetically would be in a different order.
Question 3.
Set
department_ranges
to a table containing departments as the rows, and the orga-
nization groups as the columns. The values in the rows should correspond to a total compensation
range, where range is defined as the
difference between the highest total compensation
and the lowest total compensation in the department for that organization group
.
(9
Points)
Hint:
First you’ll need to define a new function
compensation_range
which takes in an array of
compensations and returns the range of compensations in that array.
[60]:
# Define compensation_range first
def
compensation_range
(arr):
return
(
max
(arr)
-
min
(arr))
department_ranges
=
sf
.
pivot(
"Organization Group"
,
"Department"
, values
=
␣
↪
"Total Compensation"
, collect
=
compensation_range)
department_ranges
[60]:
Department
| Community Health | Culture & Recreation | General
Administration & Finance | Human Welfare & Neighborhood Development | Public
Protection | Public Works, Transportation & Commerce
Academy Of Sciences
| 0
| 199121
| 0
| 0
| 0
| 0
Administrative Services | 0
| 0
| 478784
| 0
| 0
| 0
Adult Probation
| 0
| 0
| 0
| 0
| 303419
| 0
Airport Commission
| 0
| 0
| 0
| 0
| 0
| 445092
Art Commission
| 0
| 251823
| 0
| 0
| 0
| 0
Asian Art Museum
| 0
| 298230
| 0
| 0
| 0
| 0
Assessor
| 0
| 0
| 277385
| 0
| 0
| 0
Board Of Appeals
| 0
| 0
| 0
| 0
| 0
| 243582
Board Of Supervisors
| 0
| 0
| 293773
| 0
| 0
| 0
Building Inspection
| 0
| 0
| 0
10
| 0
| 0
| 340852
… (41 rows omitted)
[61]:
grader
.
check(
"q2_3"
)
[61]:
q2_3 results: All test cases passed!
Question 4.
Why might some of the row values be
0
in the
department_ranges
table from the
previous question.
(8 Points)
Not all of the departments have all of these organization groups, so the table just puts 0 as the
compensation range for organization groups that don’t exist in that department
Question 5.
Find the number of departments appearing in the
sf
table that have an average total
compensation of greater than 125,000 dollars; assign this value to the variable
num_over_125k
.
(9
Points)
Note:
The variable names provided are meant to help guide the intermediate steps and general
thought process. Feel free to delete them if you’d prefer to start from scratch, but make sure your
final answer is assigned to
num_over_125k
!
[77]:
sums
=
sf
.
select(
"Department"
,
"Total Compensation"
)
.
group(
"Department"
, np
.
sum)
lens
=
sf
.
select(
"Department"
,
"Job"
)
.
group(
"Department"
)
department_avg
=
sums
.
with_column(
"Department Avg"
, sums
.
column(
"Total
␣
↪
Compensation sum"
)
/
lens
.
column(
"count"
))
num_over_125k
=
department_avg
.
where(
"Department Avg"
, are
.
above(
125000
))
.
↪
num_rows
num_over_125k
[77]:
23
[78]:
grader
.
check(
"q2_5"
)
[78]:
q2_5 results: All test cases passed!
You’re done with Homework 4!
Important submission steps:
1. Run the tests and verify that they all pass. 2. Choose
Save
Notebook
from the
File
menu, then
run the final cell
. 3. Click the link to download the zip
file. 4. Go to
Gradescope
and submit the zip file to the corresponding assignment. The name of
this assignment is “Homework 4 Autograder”.
It is your responsibility to make sure your work is saved before running the last cell.
1.3
Pets of Data 8
Congrats on finishing Homework 4!
Pet of the week:
Gus
11
1.4
Submission
Make sure you have run all cells in your notebook in order before running the cell below, so that
all images/graphs appear in the output. The cell below will generate a zip file for you to submit.
Please save before exporting!
[79]:
# Save your notebook first, then run this cell to export your submission.
grader
.
export(pdf
=
False
, run_tests
=
True
)
Running your submission against local test cases…
Your submission received the following results when run against available test
cases:
q1_1 results: All test cases passed!
q1_4 results: All test cases passed!
q1_5 results: All test cases passed!
q1_7 results: All test cases passed!
q2_1 results: All test cases passed!
q2_3 results: All test cases passed!
q2_5 results: All test cases passed!
<IPython.core.display.HTML object>
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/1d7e7/1d7e7583d6f456277727f8d158d820c51233aa30" alt="Text book image"
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
data:image/s3,"s3://crabby-images/7459b/7459bf678b74427bda237ab38d4b5d3949952a7e" alt="Text book image"
C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/76250/762503ef8bed15d929593c1ab492e2e2028e039d" alt="Text book image"
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
data:image/s3,"s3://crabby-images/f69b6/f69b6127845775e68542aa44ed44f5dcebe26fad" alt="Text book image"
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
data:image/s3,"s3://crabby-images/b907a/b907ada1f4be11d175260bd2a8acbc475b9f1fe1" alt="Text book image"
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Recommended textbooks for you
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENT
- Microsoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,Np Ms Office 365/Excel 2016 I NtermedComputer ScienceISBN:9781337508841Author:CareyPublisher:CengageSystems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage Learning
data:image/s3,"s3://crabby-images/1d7e7/1d7e7583d6f456277727f8d158d820c51233aa30" alt="Text book image"
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
data:image/s3,"s3://crabby-images/7459b/7459bf678b74427bda237ab38d4b5d3949952a7e" alt="Text book image"
C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/76250/762503ef8bed15d929593c1ab492e2e2028e039d" alt="Text book image"
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
data:image/s3,"s3://crabby-images/f69b6/f69b6127845775e68542aa44ed44f5dcebe26fad" alt="Text book image"
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
data:image/s3,"s3://crabby-images/b907a/b907ada1f4be11d175260bd2a8acbc475b9f1fe1" alt="Text book image"
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning