hw3

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

3000

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

14

Uploaded by ngocminhphan02

Report
2/8/24, 8 : 33 PM hw3 Page 1 of 14 about:srcdoc DS 3000 HW 3 Due: Thursday Feb 8th @ 11 : 59 PM EST Submission Instructions Submit this ipynb file and the a PDF file included with the coding results to Gradescope (this can also be done via the assignment on Canvas). To ensure that your submitted files represent your latest code, make sure to give a fresh Kernel > Restart & Run All just before uploading the files to gradescope. Tips for success Start early (even though you have two weeks on this homework) Make use of Piazza Make use of Office hour Remember to use cells and headings to make the notebook easy to read (if a grader cannot find the answer to a problem, you will receive no points for it) Under no circumstances may one student view or share their ungraded homework or quiz with another student (see also) , though you are welcome to talk about (not show each other) the problems. Part 1: Plotting Warm Up (18 points) Plot each of the functions below over 100 evenly spaced points in the domain $ [0, 10] $ on the same graph. Be sure to use the line specifications given below: Name Value Color Line Width Style sinusoid 3 * sin (2/3 x) Red 4 dotted polynomial (x-3) (x - 2) (x-8) / 10 Blue 2 solid abs value min(abs(x - 3), abs(x - 8)) Green 3 dashed add a legend which specifies the name of each function use seaborn's sns.set() before plotting to make the graph look nice
2/8/24, 8 : 33 PM hw3 Page 2 of 14 about:srcdoc Make sure that the axes are labeled x and f(x) You may find the arithmetic functions needed in numpy (sin, abs, minimum) import numpy as np import matplotlib.pyplot as plt import seaborn as sns sns . set () x = np . linspace ( 0 , 10 , 100 ) plt . figure ( figsize = ( 10 , 6 )) plt . plot ( x , 3 * np . sin ( 2 / 3 * x ), 'r:' , label = 'sinusoid' , linewidth = 4 ) # Red plt . plot ( x , ( x - 3 ) * ( x - 2 ) * ( x - 8 ) / 10 , 'b-' , label = 'polynomial' , line plt . plot ( x , np . minimum ( np . abs ( x - 3 ), np . abs ( x - 8 )), 'g--' , label = 'abs valu plt . legend () plt . xlabel ( 'x' ) plt . ylabel ( 'f(x)' ) plt . show () Part 2: FIFA Players (22 points) In [1]:
2/8/24, 8 : 33 PM hw3 Page 3 of 14 about:srcdoc Create a plotly scatter plot which shows the mean Overall rating for all soccer players (rows) of a particular Age . Color your scatter plot per Nationality of the player, focusing on three countries ( England , Germany , Spain ). Download the players_fifa23.csv from Canvas and make sure it is in the same directory as this notebook file. Export your graph as an html file age_ratings_nationality.html and submit it with your completed homework ipynb to gradescope. Hints: There may be multiple ways/approaches to accomplish this task. One approach: you may use groupby() and boolean indexing to build these values in a loop which runs per each Nationality . px.scatter() will only graph data from columns (not the index). Some approaches may need to graph data from the index. You can use df.reset_index() to make your index a new column as shown in this example In some approaches you may need to pass multiple rows to df.append() if need be as shown in this example In some approaches you may need to go from "wide" data to "long" data by using df.melt() as discussed here The first few code cells below get you started with looking at the data set. import warnings warnings . simplefilter ( action = 'ignore' , category = FutureWarning ) # use pandas to read in the data import pandas as pd df_fifa = pd . read_csv ( 'players_fifa23.csv' , index_col = 'ID' ) df_fifa . head () In [2]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/8/24, 8 : 33 PM hw3 Page 4 of 14 about:srcdoc import plotly.express as px filtered_df = df_fifa [ df_fifa [ 'Nationality' ] . isin ([ 'England' , 'Germany' , 'Sp grouped_df = filtered_df . groupby ([ 'Age' , 'Nationality' ])[ 'Overall' ] . mean () . r fig = px . scatter ( grouped_df , x = 'Age' , y = 'Overall' , color = 'Nationality' , labels = { 'Overall' : 'Mean Overall Rating' }, title = 'Mean Overall Rating by Age and Nationality' ) fig . write_html ( 'age_ratings_nationality.html' ) Part 3: Daylight through the year The remainder of the homework asks you to complete the pipeline which, given the lattitude / longitude and timezone of some cities: loc_dict = { 'Boston' : ( 42.3601 , - 71.0589 , 'US/Eastern' ), 'Lusaka' : ( - 15.3875 , 28.3228 , 'Africa/Lusaka' ), 'Sydney' : ( - 33.8688 , 151.2093 , 'Australia/Sydney' )} the keys are the name of the city and the values are tuples of `lat, lon, timezone_name is able to: query a sunrise / sunset API clean and process data (timezone management & building datetime objects) Name FullName Age Height Weight ID 165153 K. Benzema Karim Benzema 34 185 81 https://cdn.sofifa.net/players/16 158023 L. Messi Lionel Messi 35 169 67 https://cdn.sofifa.net/players/15 231747 K. Mbappé Kylian Mbappé 23 182 73 https://cdn.sofifa.net/players/2 192985 K. De Bruyne Kevin De Bruyne 31 181 70 https://cdn.sofifa.net/players/19 188545 R. Lewandowski Robert Lewandowski 33 185 81 https://cdn.sofifa.net/players/18 5 rows × 89 columns Out[2]: In [3]:
2/8/24, 8 : 33 PM hw3 Page 5 of 14 about:srcdoc For extra credit: produce the following graph of daylight through the year: Part 3.1: Getting Sunrise Sunset via API (16 points) Write the get_sunrise_sunset() function below so that it uses this sunrise sunset API to produce produce the output shown in the test case below. It may be helpful to know that this particular API... requires no api key returns about 2.5 queries per second did not block me when I tried to make 100 consecutive calls as quickly as possible # you will need to run pip install requests in the terminal # no need to install json, it is built into python import requests import json # make sure to write a good docstring! I will do this for you for the other def get_sunrise_sunset ( lat , lng , date ): """ fetches the sunrise sunset API information on a particular date for Args: lat (float): latitude of interest lng (float): longitude of interest In [4]:
2/8/24, 8 : 33 PM hw3 Page 6 of 14 about:srcdoc date (str): date of interest Returns: gss_dict (dictionary): a dictionary that contains the API informatio """ url = f'https://api.sunrise-sunset.org/json?lat={ lat }&lng={ lng }&date={ da response = requests . get ( url ) data = response . json () data . pop ( 'tzid' , None ) data [ 'lat-lng' ] = ( lat , lng ) data [ 'date' ] = date return data sun_dict = get_sunrise_sunset ( lat = 42.3601 , lng =- 71.0589 , date = '2022-02-15' ) sun_dict_expected = { 'results' : { 'sunrise' : '11:38:48 AM' , 'sunset' : '10:17:50 PM' , 'solar_noon' : '4:58:19 PM' , 'day_length' : '10:39:02' , 'civil_twilight_begin' : '11:11:30 AM' , 'civil_twilight_end' : '10:45:08 PM' , 'nautical_twilight_begin' : '10:38:37 AM' , 'nautical_twilight_end' : '11:18:00 PM' , 'astronomical_twilight_begin' : '10:06:05 AM' , 'astronomical_twilight_end' : '11:50:33 PM' }, 'status' : 'OK' , 'lat-lng' : ( 42.3601 , - 71.0589 ), 'date' : '2022-02-15' } assert sun_dict == sun_dict_expected , 'get_sunrise_sunset() error' Part 3.2: (14 points) It may appear the test case above is in error, but a look at the API's documentation reminds us: "NOTE: All times are in UTC and summer time adjustments are not included in the returned data." Complete the change_tz() below so that it passes the given test case. import pytz from datetime import datetime In [5]: In [6]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/8/24, 8 : 33 PM hw3 Page 7 of 14 about:srcdoc def change_tz ( dt , timezone_from , timezone_to ): """ converts timezone of a timezone naive datetime object Args: dt (datetime): datetime (or time) object without timezone timezone_from (str): timezone of input timezone_to (str): timezone of output datetime Returns: dt (datetime): datetime object corresponding to unix_time """ from_zone = pytz . timezone ( timezone_from ) to_zone = pytz . timezone ( timezone_to ) dt_with_timezone = from_zone . localize ( dt ) converted_dt = dt_with_timezone . astimezone ( to_zone ) return converted_dt dt_naive = datetime ( 2022 , 2 , 15 , 11 , 38 , 48 ) # This is a naive datetime obj timezone_from = 'UTC' timezone_to = 'America/New_York' converted_dt = change_tz ( dt_naive , timezone_from , timezone_to ) print ( f"Converted datetime: { converted_dt }" ) Converted datetime: 2022-02-15 06:38:48-05:00 Part 3.3: (20 points) Build clean_sun_dict() to pass each of the two test cases below. Note that: sunrise and sunset are time objects which account for daylight's saving: include the date when building these objects use change_tz() above to cast them to the proper timezone build time objects by calling datetime.time() to discard the date of a datetime importing pandas as pd and using pd.to_datetime may also be helpful sunrise_hr and sunset_hr are the hours since the day began in local timezone (more easily graphed) you may use .strftime() and int() to cast time objects to strings and then integers (which may be helpful) NOTE: There may be more than one way to accomplish writing this function; as long as In [7]:
2/8/24, 8 : 33 PM hw3 Page 8 of 14 about:srcdoc the function passes both assert test cases, you may continue. Just do be sure to comment and present your code as cleanly as possible. from datetime import datetime , time import pandas as pd def clean_sun_dict ( sun_dict , timezone_to ): """ builds pandas series and cleans output of API Args: sun_dict (dict): dict of json (see ex below) timezone_to (str): timezone of outputs (API returns UTC times) Returns: sun_series (pd.Series): all times converted to time objects example sun_series: date 2021-02-13 00:00:00 lat-lng (36.72016, -4.42034) sunrise 02:11:06 sunrise_hr 2.185 sunset 13:00:34 sunset_hr 13.0094 dtype: object """ date_str = sun_dict [ 'date' ] date_dt = datetime . strptime ( date_str , '%Y-%m-%d' ) timezone_from = pytz . timezone ( 'UTC' ) timezone_to = pytz . timezone ( timezone_to ) # Function to convert time string to timezone-aware datetime object def convert_time ( time_str , date , tz_from , tz_to ): dt_naive = datetime . strptime ( f"{ date } { time_str }" , '%Y-%m-%d %I:%M:% dt_aware = tz_from . localize ( dt_naive ) dt_converted = dt_aware . astimezone ( tz_to ) return dt_converted sunrise_converted = convert_time ( sun_dict [ 'results' ][ 'sunrise' ], date_st sunset_converted = convert_time ( sun_dict [ 'results' ][ 'sunset' ], date_str , sun_series = pd . Series ({ 'date' : date_dt , 'lat-lng' : sun_dict [ 'lat-lng' ], 'sunrise' : sunrise_converted . time (), In [8]:
2/8/24, 8 : 33 PM hw3 Page 9 of 14 about:srcdoc 'sunrise_hr' : sunrise_converted . hour + sunrise_converted . minute / 60 'sunset' : sunset_converted . time (), 'sunset_hr' : sunset_converted . hour + sunset_converted . minute / 60 + }) return sun_series sun_dict = { 'results' : { 'sunrise' : '11:38:48 AM' , 'sunset' : '10:17:50 PM' , 'solar_noon' : '4:58:19 PM' , 'day_length' : '10:39:02' , 'civil_twilight_begin' : '11:11:30 AM' , 'civil_twilight_end' : '10:45:08 PM' , 'nautical_twilight_begin' : '10:38:37 AM' , 'nautical_twilight_end' : '11:18:00 PM' , 'astronomical_twilight_begin' : '10:06:05 AM' , 'astronomical_twilight_end' : '11:50:33 PM' }, 'status' : 'OK' , 'lat-lng' : ( 42.3601 , - 71.0589 ), 'date' : '2022-02-15' } # test without timezone conversion sun_series = clean_sun_dict ( sun_dict , timezone_to = 'GMT' ) sun_series_exp = pd . Series ( { 'date' : datetime ( year = 2022 , month = 2 , day = 15 ), 'lat-lng' : ( 42.3601 , - 71.0589 ), 'sunrise' : time ( hour = 11 , minute = 38 , second = 48 ), 'sunrise_hr' : 11.646666666666667 , 'sunset' : time ( hour = 22 , minute = 17 , second = 50 ), 'sunset_hr' : 22.297222222222224 }) assert sun_series . eq ( sun_series_exp ) . all (), 'clean_sun_dict() error (GMT)' # test with timezone conversion sun_series = clean_sun_dict ( sun_dict , timezone_to = 'US/Eastern' ,) sun_series_exp = pd . Series ( { 'date' : datetime ( year = 2022 , month = 2 , day = 15 ), 'lat-lng' : ( 42.3601 , - 71.0589 ), 'sunrise' : time ( hour = 6 , minute = 38 , second = 48 ), 'sunrise_hr' : 6.6466666666666665 , 'sunset' : time ( hour = 17 , minute = 17 , second = 50 ), 'sunset_hr' : 17.297222222222224 }) assert sun_series . eq ( sun_series_exp ) . all (), 'clean_sun_dict() error (EST)' Part 3.4: (10 points) In [13]: In [10]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/8/24, 8 : 33 PM hw3 Page 10 of 14 about:srcdoc Write the get_annual_sun_data() function so that it produces the outputs shown below. This function should make use of: get_sunrise_sunset() clean_sun_dict() as built above. The following snippet: loc_dict = { 'Boston' : ( 42.3601 , - 71.0589 , 'US/Eastern' ), 'Lusaka' : ( - 15.3875 , 28.3228 , 'Africa/Lusaka' ), 'Sydney' : ( - 33.8688 , 151.2093 , 'Australia/Sydney' )} df_annual_sun = get_annual_sun_data ( loc_dict , year = 2021 , period_day = 30 ) df_annual_sun . head ( 15 ) should generate: city date lat-lng sunrise sunrise_hr sunset sunset_hr 0 Boston 2021-01- 01 (42.3601, -71.0589) 07 : 11 : 49 7.196944 16 : 24 : 12 16.403333 1 Lusaka 2021-01- 01 (-15.3875, 28.3228) 05 : 38 : 33 5.642500 18 : 42 : 09 18.702500 2 Sydney 2021-01- 01 (-33.8688, 151.2093) 05 : 46 : 24 5.773333 20 : 10 : 53 20.181389 3 Boston 2021-01- 31 (42.3601, -71.0589) 06 : 56 : 43 6.945278 16 : 58 : 42 16.978333 4 Lusaka 2021-01- 31 (-15.3875, 28.3228) 05 : 55 : 43 5.928611 18 : 44 : 35 18.743056 5 Sydney 2021-01- 31 (-33.8688, 151.2093) 06 : 14 : 24 6.240000 20 : 02 : 42 20.045000 6 Boston 2021-03- 02 (42.3601, -71.0589) 06 : 15 : 41 6.261389 17 : 36 : 50 17.613889 7 Lusaka 2021-03- 02 (-15.3875, 28.3228) 06 : 06 : 23 6.106389 18 : 31 : 11 18.519722 8 Sydney 2021-03- 02 (-33.8688, 151.2093) 06 : 42 : 34 6.709444 19 : 32 : 04 19.534444 9 Boston 2021- 04-01 (42.3601, -71.0589) 06 : 24 : 21 6.405833 19 : 11 : 35 19.193056 10 Lusaka 2021- 04-01 (-15.3875, 28.3228) 06 : 11 : 08 6.185556 18 : 09 : 54 18.165000
2/8/24, 8 : 33 PM hw3 Page 11 of 14 about:srcdoc 11 Sydney 2021- 04-01 (-33.8688, 151.2093) 07 : 06 : 04 7.101111 18 : 52 : 05 18.868056 12 Boston 2021-05- 01 (42.3601, -71.0589) 05 : 37 : 09 5.619167 19 : 45 : 25 19.756944 13 Lusaka 2021-05- 01 (-15.3875, 28.3228) 06 : 16 : 13 6.270278 17 : 51 : 21 17.855833 14 Sydney 2021-05- 01 (-33.8688, 151.2093) 06 : 28 : 28 6.474444 17 : 16 : 05 17.268056 from datetime import timedelta def get_annual_sun_data ( loc_dict , year = 2021 , period_day = 30 ): """ pulls evenly spaced sunrise / sunsets from API over year per city Args: loc_dict (dict): keys are cities, values are tuples of (lat, lon, tz_str) where tz_str is a timezone string included in pytz.all_timezones year (int): year to query period_day (int): how many days between data queries (i.e. period_day=1 will get every day for the year) Returns: df_annual_sun (DataFrame): each row represents a sunrise / sunset datapoint, see get_sunrise_sunset() """ data = [] for city , ( lat , lon , tz_str ) in loc_dict . items (): current_date = datetime ( year , 1 , 1 ) # Start date while current_date . year == year : date_str = current_date . strftime ( '%Y-%m-%d' ) sun_dict = get_sunrise_sunset ( lat , lon , date_str ) sun_series = clean_sun_dict ( sun_dict , tz_str ) data . append ([ city , current_date , sun_series [ 'lat-lng' ], sun_series [ 'sunrise' ], sun_series [ 'sunrise_hr' ], sun_series [ 'sunset' ], sun_series [ 'sunset_hr' ] ]) In [11]:
2/8/24, 8 : 33 PM hw3 Page 12 of 14 about:srcdoc current_date += timedelta ( days = period_day ) df_annual_sun = pd . DataFrame ( data , columns = [ 'city' , 'date' , 'lat-lng' , ' return df_annual_sun loc_dict = { 'Boston' : ( 42.3601 , - 71.0589 , 'US/Eastern' ), 'Lusaka' : ( - 15.3875 , 28.3228 , 'Africa/Lusaka' ), 'Sydney' : ( - 33.8688 , 151.2093 , 'Australia/Sydney' ) } df_annual_sun = get_annual_sun_data ( loc_dict , year = 2021 , period_day = 30 ) print ( df_annual_sun . head ()) city date lat-lng sunrise sunrise_hr sunset \ 0 Boston 2021-01-01 (42.3601, -71.0589) 07:11:49 7.196944 16:24:12 1 Boston 2021-01-31 (42.3601, -71.0589) 06:56:43 6.945278 16:58:42 2 Boston 2021-03-02 (42.3601, -71.0589) 06:15:41 6.261389 17:36:50 3 Boston 2021-04-01 (42.3601, -71.0589) 06:24:21 6.405833 19:11:35 4 Boston 2021-05-01 (42.3601, -71.0589) 05:37:09 5.619167 19:45:25 sunset_hr 0 16.403333 1 16.978333 2 17.613889 3 19.193056 4 19.756944 Extra Credit: (+5 points) Using plt.fillbetween() , like this example (or like we did in class in Lecture notes), write the plot_daylight() function so that: plot_daylight ( df_annual_sun ) produces a similar graph to:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2/8/24, 8 : 33 PM hw3 Page 13 of 14 about:srcdoc Be sure that your graph displays in Jupyter notebook (no need to save it in another form). import seaborn as sns import matplotlib.pyplot as plt sns . set ( font_scale = 1.2 ) def plot_daylight ( df_annual_sun ): """ produces a plot of daylight seen across cities Args: df_annual_sun (DataFrame): each row represents a sunrise / sunset datapoint, see get_sunrise_sunset() """ df_annual_sun [ 'date' ] = pd . to_datetime ( df_annual_sun [ 'date' ]) plt . figure ( figsize = ( 12 , 8 )) cities = df_annual_sun [ 'city' ] . unique () for city in cities : city_data = df_annual_sun [ df_annual_sun [ 'city' ] == city ] plt . fill_between ( city_data [ 'date' ], city_data [ 'sunrise_hr' ], city_da plt . title ( 'Daylight Hours Through the Year' ) In [12]:
2/8/24, 8 : 33 PM hw3 Page 14 of 14 about:srcdoc plt . xlabel ( 'Date' ) plt . ylabel ( 'Hours of the Day' ) plt . gca () . xaxis . set_major_locator ( mdates . MonthLocator ()) plt . gca () . xaxis . set_major_formatter ( mdates . DateFormatter ( '%b' )) plt . ylim ( 0 , 24 ) plt . yticks ( range ( 0 , 25 , 3 )) plt . grid ( True , which = 'both' , linestyle = '--' , linewidth = 0.5 ) plt . gcf () . autofmt_xdate () plt . legend ( title = 'City' ) plt . show () In [ ]: