hw3
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
3000
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
Pages
14
Uploaded by ngocminhphan02
2/8/24, 8
:
33 PM
hw3
Page 1 of 14
about:srcdoc
DS 3000 HW 3
Due: Thursday Feb 8th @ 11
:
59 PM EST
Submission Instructions
Submit this ipynb
file and the a PDF
file included with the coding results to
Gradescope (this can also be done via the assignment on Canvas). To ensure that your
submitted files represent your latest code, make sure to give a fresh Kernel > Restart & Run All
just before uploading the files to gradescope.
Tips for success
Start early (even though you have two weeks on this homework)
Make use of Piazza
Make use of Office hour
Remember to use cells and headings to make the notebook easy to read (if a grader
cannot find the answer to a problem, you will receive no points for it)
Under no circumstances may one student view or share their ungraded homework
or quiz with another student (see also)
, though you are welcome to talk about
(not
show each other) the problems.
Part 1: Plotting Warm Up (18 points)
Plot each of the functions below over 100 evenly spaced points in the domain $
[0, 10]
$
on the same
graph.
Be sure to use the line specifications given below:
Name
Value
Color
Line Width
Style
sinusoid
3 * sin (2/3 x)
Red
4
dotted
polynomial
(x-3) (x - 2) (x-8) / 10
Blue
2
solid
abs value
min(abs(x - 3), abs(x - 8))
Green
3
dashed
add a legend which specifies the name of each function
use seaborn's sns.set()
before plotting to make the graph look nice
2/8/24, 8
:
33 PM
hw3
Page 2 of 14
about:srcdoc
Make sure that the axes are labeled x
and f(x)
You may find the arithmetic functions needed in numpy (sin, abs, minimum)
import
numpy as
np
import
matplotlib.pyplot as
plt
import
seaborn as
sns
sns
.
set
()
x =
np
.
linspace
(
0
, 10
, 100
)
plt
.
figure
(
figsize
=
(
10
, 6
))
plt
.
plot
(
x
, 3 *
np
.
sin
(
2
/
3 *
x
), 'r:'
, label
=
'sinusoid'
, linewidth
=
4
) # Red
plt
.
plot
(
x
, (
x -
3
) *
(
x -
2
) *
(
x -
8
) /
10
, 'b-'
, label
=
'polynomial'
, line
plt
.
plot
(
x
, np
.
minimum
(
np
.
abs
(
x -
3
), np
.
abs
(
x -
8
)), 'g--'
, label
=
'abs valu
plt
.
legend
()
plt
.
xlabel
(
'x'
)
plt
.
ylabel
(
'f(x)'
)
plt
.
show
()
Part 2: FIFA Players (22 points)
In [1]:
2/8/24, 8
:
33 PM
hw3
Page 3 of 14
about:srcdoc
Create a plotly scatter plot which shows the mean Overall
rating for all soccer players
(rows) of a particular Age
. Color your scatter plot per Nationality
of the player,
focusing on three countries (
England
, Germany
, Spain
). Download the players_fifa23.csv
from Canvas and make sure it is in the same directory as this
notebook file.
Export your graph as an html file age_ratings_nationality.html
and submit it
with your completed homework ipynb
to gradescope.
Hints:
There may be multiple ways/approaches to accomplish this task.
One approach: you may use groupby()
and boolean indexing to build these
values in a loop which runs per each Nationality
.
px.scatter()
will only graph data from columns (not the index). Some
approaches may need to graph data from the index. You can use df.reset_index() to
make your index a new column as shown in this example
In some approaches you may need to pass multiple rows to df.append() if need be
as shown in this example
In some approaches you may need to go from "wide" data to "long" data by using
df.melt() as discussed here
The first few code cells below get you started with looking at the data set.
import
warnings
warnings
.
simplefilter
(
action
=
'ignore'
, category
=
FutureWarning
)
# use pandas to read in the data
import
pandas as
pd
df_fifa =
pd
.
read_csv
(
'players_fifa23.csv'
, index_col =
'ID'
)
df_fifa
.
head
()
In [2]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2/8/24, 8
:
33 PM
hw3
Page 4 of 14
about:srcdoc
import
plotly.express as
px
filtered_df =
df_fifa
[
df_fifa
[
'Nationality'
]
.
isin
([
'England'
, 'Germany'
, 'Sp
grouped_df =
filtered_df
.
groupby
([
'Age'
, 'Nationality'
])[
'Overall'
]
.
mean
()
.
r
fig =
px
.
scatter
(
grouped_df
, x
=
'Age'
, y
=
'Overall'
, color
=
'Nationality'
, labels
=
{
'Overall'
: 'Mean Overall Rating'
}, title
=
'Mean Overall Rating by Age and Nationality'
)
fig
.
write_html
(
'age_ratings_nationality.html'
)
Part 3: Daylight through the year
The remainder of the homework asks you to complete the pipeline which, given the
lattitude / longitude and timezone of some cities:
loc_dict =
{
'Boston'
: (
42.3601
, -
71.0589
, 'US/Eastern'
),
'Lusaka'
: (
-
15.3875
, 28.3228
, 'Africa/Lusaka'
),
'Sydney'
: (
-
33.8688
, 151.2093
, 'Australia/Sydney'
)}
the keys are the name
of the city and the values are tuples of `lat, lon, timezone_name
is able to:
query a sunrise / sunset API
clean and process data (timezone management & building datetime
objects)
Name
FullName
Age
Height
Weight
ID
165153
K. Benzema
Karim
Benzema
34
185
81
https://cdn.sofifa.net/players/16
158023
L. Messi
Lionel Messi
35
169
67
https://cdn.sofifa.net/players/15
231747
K. Mbappé
Kylian
Mbappé
23
182
73
https://cdn.sofifa.net/players/2
192985
K. De Bruyne
Kevin De
Bruyne
31
181
70
https://cdn.sofifa.net/players/19
188545
R.
Lewandowski
Robert
Lewandowski
33
185
81
https://cdn.sofifa.net/players/18
5 rows ×
89 columns
Out[2]:
In [3]:
2/8/24, 8
:
33 PM
hw3
Page 5 of 14
about:srcdoc
For extra credit: produce the following graph of daylight through the year:
Part 3.1: Getting Sunrise Sunset via API (16 points)
Write the get_sunrise_sunset()
function below so that it uses this sunrise sunset
API to produce produce the output shown in the test case below.
It may be helpful to know that this particular API...
requires no api key
returns about 2.5 queries per second
did not block me when I tried to make 100 consecutive calls as quickly as possible
# you will need to run pip install requests in the terminal
# no need to install json, it is built into python
import
requests
import
json
# make sure to write a good docstring! I will do this for you for the other def
get_sunrise_sunset
(
lat
, lng
, date
):
""" fetches the sunrise sunset API information on a particular date for Args:
lat (float): latitude of interest
lng (float): longitude of interest
In [4]:
2/8/24, 8
:
33 PM
hw3
Page 6 of 14
about:srcdoc
date (str): date of interest
Returns:
gss_dict (dictionary): a dictionary that contains the API informatio
""" url =
f'https://api.sunrise-sunset.org/json?lat={
lat
}&lng={
lng
}&date={
da
response =
requests
.
get
(
url
)
data =
response
.
json
()
data
.
pop
(
'tzid'
, None
)
data
[
'lat-lng'
] =
(
lat
, lng
)
data
[
'date'
] =
date
return
data
sun_dict =
get_sunrise_sunset
(
lat
=
42.3601
, lng
=-
71.0589
, date
=
'2022-02-15'
)
sun_dict_expected =
{
'results'
: {
'sunrise'
: '11:38:48 AM'
,
'sunset'
: '10:17:50 PM'
,
'solar_noon'
: '4:58:19 PM'
,
'day_length'
: '10:39:02'
,
'civil_twilight_begin'
: '11:11:30 AM'
,
'civil_twilight_end'
: '10:45:08 PM'
,
'nautical_twilight_begin'
: '10:38:37 AM'
,
'nautical_twilight_end'
: '11:18:00 PM'
,
'astronomical_twilight_begin'
: '10:06:05 AM'
,
'astronomical_twilight_end'
: '11:50:33 PM'
},
'status'
: 'OK'
,
'lat-lng'
: (
42.3601
, -
71.0589
),
'date'
: '2022-02-15'
}
assert
sun_dict ==
sun_dict_expected
, 'get_sunrise_sunset() error'
Part 3.2: (14 points)
It may appear the test case above is in error, but a look at the API's documentation
reminds us:
"NOTE: All times are in UTC and summer time adjustments are not included in the returned data."
Complete the change_tz()
below so that it passes the given test case.
import
pytz
from
datetime import
datetime
In [5]:
In [6]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2/8/24, 8
:
33 PM
hw3
Page 7 of 14
about:srcdoc
def
change_tz
(
dt
, timezone_from
, timezone_to
):
""" converts timezone of a timezone naive datetime object
Args:
dt (datetime): datetime (or time) object without timezone
timezone_from (str): timezone of input
timezone_to (str): timezone of output datetime
Returns:
dt (datetime): datetime object corresponding to unix_time
"""
from_zone =
pytz
.
timezone
(
timezone_from
)
to_zone =
pytz
.
timezone
(
timezone_to
)
dt_with_timezone =
from_zone
.
localize
(
dt
)
converted_dt =
dt_with_timezone
.
astimezone
(
to_zone
)
return
converted_dt
dt_naive =
datetime
(
2022
, 2
, 15
, 11
, 38
, 48
) # This is a naive datetime obj
timezone_from =
'UTC'
timezone_to =
'America/New_York'
converted_dt =
change_tz
(
dt_naive
, timezone_from
, timezone_to
)
print
(
f"Converted datetime: {
converted_dt
}"
)
Converted datetime: 2022-02-15 06:38:48-05:00
Part 3.3: (20 points)
Build clean_sun_dict()
to pass each of the two test cases below. Note that:
sunrise and sunset are time
objects which account for daylight's saving:
include the date when building these objects
use change_tz()
above to cast them to the proper timezone
build time
objects by calling datetime.time()
to discard the date of a datetime
importing pandas as pd
and using pd.to_datetime
may also be helpful
sunrise_hr
and sunset_hr
are the hours since the day began in local
timezone (more easily graphed)
you may use .strftime()
and int()
to cast time objects to strings and
then integers (which may be helpful)
NOTE:
There may be more than one way to accomplish writing this function; as long as
In [7]:
2/8/24, 8
:
33 PM
hw3
Page 8 of 14
about:srcdoc
the function passes both assert
test cases, you may continue. Just do be sure to
comment and present your code as cleanly as possible.
from
datetime import
datetime
, time
import
pandas as
pd
def
clean_sun_dict
(
sun_dict
, timezone_to
):
""" builds pandas series and cleans output of API
Args:
sun_dict (dict): dict of json (see ex below)
timezone_to (str): timezone of outputs (API returns
UTC times)
Returns:
sun_series (pd.Series): all times converted to
time objects
example sun_series:
date 2021-02-13 00:00:00
lat-lng (36.72016, -4.42034)
sunrise 02:11:06
sunrise_hr 2.185
sunset 13:00:34
sunset_hr 13.0094
dtype: object
"""
date_str =
sun_dict
[
'date'
]
date_dt =
datetime
.
strptime
(
date_str
, '%Y-%m-%d'
)
timezone_from =
pytz
.
timezone
(
'UTC'
)
timezone_to =
pytz
.
timezone
(
timezone_to
)
# Function to convert time string to timezone-aware datetime object
def
convert_time
(
time_str
, date
, tz_from
, tz_to
):
dt_naive =
datetime
.
strptime
(
f"{
date
} {
time_str
}"
, '%Y-%m-%d %I:%M:%
dt_aware =
tz_from
.
localize
(
dt_naive
)
dt_converted =
dt_aware
.
astimezone
(
tz_to
)
return
dt_converted
sunrise_converted =
convert_time
(
sun_dict
[
'results'
][
'sunrise'
], date_st
sunset_converted =
convert_time
(
sun_dict
[
'results'
][
'sunset'
], date_str
,
sun_series =
pd
.
Series
({
'date'
: date_dt
,
'lat-lng'
: sun_dict
[
'lat-lng'
],
'sunrise'
: sunrise_converted
.
time
(),
In [8]:
2/8/24, 8
:
33 PM
hw3
Page 9 of 14
about:srcdoc
'sunrise_hr'
: sunrise_converted
.
hour +
sunrise_converted
.
minute /
60
'sunset'
: sunset_converted
.
time
(),
'sunset_hr'
: sunset_converted
.
hour +
sunset_converted
.
minute /
60 +
})
return
sun_series
sun_dict =
{
'results'
: {
'sunrise'
: '11:38:48 AM'
,
'sunset'
: '10:17:50 PM'
,
'solar_noon'
: '4:58:19 PM'
,
'day_length'
: '10:39:02'
,
'civil_twilight_begin'
: '11:11:30 AM'
,
'civil_twilight_end'
: '10:45:08 PM'
,
'nautical_twilight_begin'
: '10:38:37 AM'
,
'nautical_twilight_end'
: '11:18:00 PM'
,
'astronomical_twilight_begin'
: '10:06:05 AM'
,
'astronomical_twilight_end'
: '11:50:33 PM'
},
'status'
: 'OK'
,
'lat-lng'
: (
42.3601
, -
71.0589
),
'date'
: '2022-02-15'
}
# test without timezone conversion
sun_series =
clean_sun_dict
(
sun_dict
, timezone_to
=
'GMT'
)
sun_series_exp =
pd
.
Series
(
{
'date'
: datetime
(
year
=
2022
, month
=
2
, day
=
15
),
'lat-lng'
: (
42.3601
, -
71.0589
),
'sunrise'
: time
(
hour
=
11
, minute
=
38
, second
=
48
),
'sunrise_hr'
: 11.646666666666667
,
'sunset'
: time
(
hour
=
22
, minute
=
17
, second
=
50
),
'sunset_hr'
: 22.297222222222224
})
assert
sun_series
.
eq
(
sun_series_exp
)
.
all
(), 'clean_sun_dict() error (GMT)'
# test with timezone conversion
sun_series =
clean_sun_dict
(
sun_dict
, timezone_to
=
'US/Eastern'
,)
sun_series_exp =
pd
.
Series
(
{
'date'
: datetime
(
year
=
2022
, month
=
2
, day
=
15
),
'lat-lng'
: (
42.3601
, -
71.0589
),
'sunrise'
: time
(
hour
=
6
, minute
=
38
, second
=
48
),
'sunrise_hr'
: 6.6466666666666665
,
'sunset'
: time
(
hour
=
17
, minute
=
17
, second
=
50
),
'sunset_hr'
: 17.297222222222224
})
assert
sun_series
.
eq
(
sun_series_exp
)
.
all
(), 'clean_sun_dict() error (EST)'
Part 3.4: (10 points)
In [13]:
In [10]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2/8/24, 8
:
33 PM
hw3
Page 10 of 14
about:srcdoc
Write the get_annual_sun_data()
function so that it produces the outputs shown
below. This function should make use of:
get_sunrise_sunset()
clean_sun_dict()
as built above.
The following snippet:
loc_dict =
{
'Boston'
: (
42.3601
, -
71.0589
, 'US/Eastern'
),
'Lusaka'
: (
-
15.3875
, 28.3228
, 'Africa/Lusaka'
),
'Sydney'
: (
-
33.8688
, 151.2093
, 'Australia/Sydney'
)}
df_annual_sun =
get_annual_sun_data
(
loc_dict
, year
=
2021
, period_day
=
30
)
df_annual_sun
.
head
(
15
)
should generate:
city
date
lat-lng
sunrise
sunrise_hr
sunset
sunset_hr
0
Boston
2021-01-
01
(42.3601,
-71.0589)
07
:
11
:
49
7.196944
16
:
24
:
12
16.403333
1
Lusaka
2021-01-
01
(-15.3875,
28.3228)
05
:
38
:
33
5.642500
18
:
42
:
09
18.702500
2
Sydney
2021-01-
01
(-33.8688,
151.2093)
05
:
46
:
24
5.773333
20
:
10
:
53
20.181389
3
Boston
2021-01-
31
(42.3601,
-71.0589)
06
:
56
:
43
6.945278
16
:
58
:
42
16.978333
4
Lusaka
2021-01-
31
(-15.3875,
28.3228)
05
:
55
:
43
5.928611
18
:
44
:
35
18.743056
5
Sydney
2021-01-
31
(-33.8688,
151.2093)
06
:
14
:
24
6.240000
20
:
02
:
42
20.045000
6
Boston
2021-03-
02
(42.3601,
-71.0589)
06
:
15
:
41
6.261389
17
:
36
:
50
17.613889
7
Lusaka
2021-03-
02
(-15.3875,
28.3228)
06
:
06
:
23
6.106389
18
:
31
:
11
18.519722
8
Sydney
2021-03-
02
(-33.8688,
151.2093)
06
:
42
:
34
6.709444
19
:
32
:
04
19.534444
9
Boston
2021-
04-01
(42.3601,
-71.0589)
06
:
24
:
21
6.405833
19
:
11
:
35
19.193056
10
Lusaka
2021-
04-01
(-15.3875,
28.3228)
06
:
11
:
08
6.185556
18
:
09
:
54
18.165000
2/8/24, 8
:
33 PM
hw3
Page 11 of 14
about:srcdoc
11
Sydney
2021-
04-01
(-33.8688,
151.2093)
07
:
06
:
04
7.101111
18
:
52
:
05
18.868056
12
Boston
2021-05-
01
(42.3601,
-71.0589)
05
:
37
:
09
5.619167
19
:
45
:
25
19.756944
13
Lusaka
2021-05-
01
(-15.3875,
28.3228)
06
:
16
:
13
6.270278
17
:
51
:
21
17.855833
14
Sydney
2021-05-
01
(-33.8688,
151.2093)
06
:
28
:
28
6.474444
17
:
16
:
05
17.268056
from
datetime import
timedelta
def
get_annual_sun_data
(
loc_dict
, year
=
2021
, period_day
=
30
): """ pulls evenly spaced sunrise / sunsets from API over year per city
Args:
loc_dict (dict): keys are cities, values are tuples of (lat, lon, tz_str) where tz_str is a timezone
string included in pytz.all_timezones
year (int): year to query
period_day (int): how many days between data queries
(i.e. period_day=1 will get every day for the year)
Returns:
df_annual_sun (DataFrame): each row represents a sunrise / sunset datapoint, see get_sunrise_sunset()
"""
data =
[]
for
city
, (
lat
, lon
, tz_str
) in
loc_dict
.
items
():
current_date =
datetime
(
year
, 1
, 1
) # Start date
while
current_date
.
year ==
year
:
date_str =
current_date
.
strftime
(
'%Y-%m-%d'
)
sun_dict =
get_sunrise_sunset
(
lat
, lon
, date_str
)
sun_series =
clean_sun_dict
(
sun_dict
, tz_str
)
data
.
append
([
city
,
current_date
,
sun_series
[
'lat-lng'
],
sun_series
[
'sunrise'
],
sun_series
[
'sunrise_hr'
],
sun_series
[
'sunset'
],
sun_series
[
'sunset_hr'
]
])
In [11]:
2/8/24, 8
:
33 PM
hw3
Page 12 of 14
about:srcdoc
current_date +=
timedelta
(
days
=
period_day
)
df_annual_sun =
pd
.
DataFrame
(
data
, columns
=
[
'city'
, 'date'
, 'lat-lng'
, '
return
df_annual_sun
loc_dict =
{
'Boston'
: (
42.3601
, -
71.0589
, 'US/Eastern'
),
'Lusaka'
: (
-
15.3875
, 28.3228
, 'Africa/Lusaka'
),
'Sydney'
: (
-
33.8688
, 151.2093
, 'Australia/Sydney'
)
}
df_annual_sun =
get_annual_sun_data
(
loc_dict
, year
=
2021
, period_day
=
30
)
print
(
df_annual_sun
.
head
())
city date lat-lng sunrise sunrise_hr sunset \
0 Boston 2021-01-01 (42.3601, -71.0589) 07:11:49 7.196944 16:24:12 1 Boston 2021-01-31 (42.3601, -71.0589) 06:56:43 6.945278 16:58:42 2 Boston 2021-03-02 (42.3601, -71.0589) 06:15:41 6.261389 17:36:50 3 Boston 2021-04-01 (42.3601, -71.0589) 06:24:21 6.405833 19:11:35 4 Boston 2021-05-01 (42.3601, -71.0589) 05:37:09 5.619167 19:45:25 sunset_hr 0 16.403333 1 16.978333 2 17.613889 3 19.193056 4 19.756944 Extra Credit: (+5 points)
Using plt.fillbetween()
, like this example (or like we did in class in Lecture notes), write
the plot_daylight()
function so that:
plot_daylight
(
df_annual_sun
)
produces a similar graph to:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
2/8/24, 8
:
33 PM
hw3
Page 13 of 14
about:srcdoc
Be sure that your graph displays in Jupyter notebook (no need to save it in another
form).
import
seaborn as
sns
import
matplotlib.pyplot as
plt
sns
.
set
(
font_scale
=
1.2
)
def
plot_daylight
(
df_annual_sun
):
""" produces a plot of daylight seen across cities
Args:
df_annual_sun (DataFrame): each row represents a sunrise / sunset datapoint, see get_sunrise_sunset()
"""
df_annual_sun
[
'date'
] =
pd
.
to_datetime
(
df_annual_sun
[
'date'
])
plt
.
figure
(
figsize
=
(
12
, 8
))
cities =
df_annual_sun
[
'city'
]
.
unique
()
for
city in
cities
:
city_data =
df_annual_sun
[
df_annual_sun
[
'city'
] ==
city
]
plt
.
fill_between
(
city_data
[
'date'
], city_data
[
'sunrise_hr'
], city_da
plt
.
title
(
'Daylight Hours Through the Year'
)
In [12]:
2/8/24, 8
:
33 PM
hw3
Page 14 of 14
about:srcdoc
plt
.
xlabel
(
'Date'
)
plt
.
ylabel
(
'Hours of the Day'
)
plt
.
gca
()
.
xaxis
.
set_major_locator
(
mdates
.
MonthLocator
())
plt
.
gca
()
.
xaxis
.
set_major_formatter
(
mdates
.
DateFormatter
(
'%b'
))
plt
.
ylim
(
0
, 24
)
plt
.
yticks
(
range
(
0
, 25
, 3
))
plt
.
grid
(
True
, which
=
'both'
, linestyle
=
'--'
, linewidth
=
0.5
)
plt
.
gcf
()
.
autofmt_xdate
()
plt
.
legend
(
title
=
'City'
)
plt
.
show
()
In [ ]: