STEP 1: Begin work within your Jupyter Notebook by importing the following modules: import numpy as np import pandas as pd from matplotlib import pyplot as plt import re Jupyter Notebooks Q1. Within your Jupyter Notebook, write the code for a Python function called def parseWeatherByYear(year) : This function will parse an html page containing weather for an entire year of data for the city of Toronto. The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023 The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023). A series of lines containing where to begin extracting data is listed below: January 1 5.0 2.7 0.15 Notice the marker in the lines above: /cities/toronto/day/month-n In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group). As always, the website https://regex101.com will be invaluable in helping you to achieve your solution with this. The data to be extracted must range from january 1, 2023 to the cutoff date for this file of march 16, 2023. It would be helpful to create a Numpy array of the number of days in each month of the year and then to investigate the Pandas date_range( ) function and the Series.dt.month_name attribute to allow you to programmatically capture the month names. https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month_name.html The html file itself must be opened and the entire contents read into a string. As the data from the html file is extracted, your function must also write the data into a CSV (comma separated values) file using the initial heading (title) of: City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation You are to write each field separated by commas (,) and followed by the new line. The first 10 records of the resultant file should be exactly as listed below: City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation Toronto,1,january,1,2023,5.0,2.7,0.15 Toronto,2,january,2,2023,5.6,3.5,0.00 Toronto,3,january,3,2023,4.4,2.8,0.33 Toronto,4,january,4,2023,4.4,2.5,2.11 Toronto,5,january,5,2023,4.8,3.2,0.02 Toronto,6,january,6,2023,5.1,2.9,0.00 Toronto,7,january,7,2023,3.2,-4.1,0.00 Toronto,8,january,8,2023,-1.5,-4.8,0.00 Toronto,9,january,9,2023,2.2,-1.7,0.01 There are exactly 75 records in the html file to extract and therefore 75 records are to be written to the CSV file. Once the file has been created and all records written, your function must load the CSV file into a Pandas data frame and display ALL records in the data frame using the functions: pd.read_csv(csvFile) # read csv file into Data Frame pd.set_option('display.max_rows', None) # set a flag to display all rows in the output The data frame's shape attribute and describe( ) method must also be invoked and displayed. The exact output on the command line should be as listed below: City dayOfyear month dayOfMonth Year highTemp lowTemp precipitation 0 Toronto 1 january 1 2023 5.0 2.7 0.15 1 Toronto 2 january 2 2023 5.6 3.5 0.00 2 Toronto 3 january 3 2023 4.4 2.8 0.33 3 Toronto 4 january 4 2023 4.4 2.5 2.11 4 Toronto 5 january 5 2023 4.8 3.2 0.02 5 Toronto 6 january 6 2023 5.1 2.9 0.00 ...TO 75

STEP 1: Begin work within your Jupyter Notebook by importing the following modules: import numpy as np import pandas as pd from matplotlib import pyplot as plt import re Jupyter Notebooks Q1. Within your Jupyter Notebook, write the code for a Python function called def parseWeatherByYear(year) : This function will parse an html page containing weather for an entire year of data for the city of Toronto. The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023 The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023). A series of lines containing where to begin extracting data is listed below: January 1 5.0 2.7 0.15 Notice the marker in the lines above: /cities/toronto/day/month-n In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group). As always, the website https://regex101.com will be invaluable in helping you to achieve your solution with this. The data to be extracted must range from january 1, 2023 to the cutoff date for this file of march 16, 2023. It would be helpful to create a Numpy array of the number of days in each month of the year and then to investigate the Pandas date_range( ) function and the Series.dt.month_name attribute to allow you to programmatically capture the month names. https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month_name.html The html file itself must be opened and the entire contents read into a string. As the data from the html file is extracted, your function must also write the data into a CSV (comma separated values) file using the initial heading (title) of: City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation You are to write each field separated by commas (,) and followed by the new line. The first 10 records of the resultant file should be exactly as listed below: City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation Toronto,1,january,1,2023,5.0,2.7,0.15 Toronto,2,january,2,2023,5.6,3.5,0.00 Toronto,3,january,3,2023,4.4,2.8,0.33 Toronto,4,january,4,2023,4.4,2.5,2.11 Toronto,5,january,5,2023,4.8,3.2,0.02 Toronto,6,january,6,2023,5.1,2.9,0.00 Toronto,7,january,7,2023,3.2,-4.1,0.00 Toronto,8,january,8,2023,-1.5,-4.8,0.00 Toronto,9,january,9,2023,2.2,-1.7,0.01 There are exactly 75 records in the html file to extract and therefore 75 records are to be written to the CSV file. Once the file has been created and all records written, your function must load the CSV file into a Pandas data frame and display ALL records in the data frame using the functions: pd.read_csv(csvFile) # read csv file into Data Frame pd.set_option('display.max_rows', None) # set a flag to display all rows in the output The data frame's shape attribute and describe( ) method must also be invoked and displayed. The exact output on the command line should be as listed below: City dayOfyear month dayOfMonth Year highTemp lowTemp precipitation 0 Toronto 1 january 1 2023 5.0 2.7 0.15 1 Toronto 2 january 2 2023 5.6 3.5 0.00 2 Toronto 3 january 3 2023 4.4 2.8 0.33 3 Toronto 4 january 4 2023 4.4 2.5 2.11 4 Toronto 5 january 5 2023 4.8 3.2 0.02 5 Toronto 6 january 6 2023 5.1 2.9 0.00 ...TO 75

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

Related questions

Concept explainers

Question

STEP 1: Begin work within your Jupyter Notebook by importing the following modules:

import numpy as np

import pandas as pd

from matplotlib import pyplot as plt

import re

Jupyter Notebooks

Q1. Within your Jupyter Notebook, write the code for a Python function called

def parseWeatherByYear(year) :

This function will parse an html page containing weather for an entire year of data for the city of Toronto.

The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023

The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html

The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023).

A series of lines containing where to begin extracting data is listed below:

<td><div class='width-130'><a href='/cities/toronto/day/january-1'>January 1</a></div></td>

</tr>

Notice the marker in the lines above:

/cities/toronto/day/month-n

In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group).

As always, the website https://regex101.com will be invaluable in helping you to achieve your solution with this.

The data to be extracted must range from january 1, 2023 to the cutoff date for this file of march 16, 2023.

It would be helpful to create a Numpy array of the number of days in each month of the year and then to investigate the Pandas date_range( ) function and the Series.dt.month_name attribute to allow you to programmatically capture the month names.

https://pandas.pydata.org/docs/reference/api/pandas.Series.dt.month_name.html

The html file itself must be opened and the entire contents read into a string.

As the data from the html file is extracted, your function must also write the data into a CSV (comma separated values) file using the initial heading (title) of:

City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation

You are to write each field separated by commas (,) and followed by the new line.

The first 10 records of the resultant file should be exactly as listed below:

City,dayOfyear,month,dayOfMonth,Year,highTemp,lowTemp,precipitation

Toronto,1,january,1,2023,5.0,2.7,0.15

Toronto,2,january,2,2023,5.6,3.5,0.00

Toronto,3,january,3,2023,4.4,2.8,0.33

Toronto,4,january,4,2023,4.4,2.5,2.11

Toronto,5,january,5,2023,4.8,3.2,0.02

Toronto,6,january,6,2023,5.1,2.9,0.00

Toronto,7,january,7,2023,3.2,-4.1,0.00

Toronto,8,january,8,2023,-1.5,-4.8,0.00

Toronto,9,january,9,2023,2.2,-1.7,0.01

There are exactly 75 records in the html file to extract and therefore 75 records are to be written to the CSV file.

Once the file has been created and all records written, your function must load the CSV file into a Pandas data frame and display ALL records in the data frame using the functions:

pd.read_csv(csvFile) # read csv file into Data Frame

pd.set_option('display.max_rows', None) # set a flag to display all rows in the output

The data frame's shape attribute and describe( ) method must also be invoked and displayed.

The exact output on the command line should be as listed below:

City dayOfyear month dayOfMonth Year highTemp lowTemp precipitation

0 Toronto 1 january 1 2023 5.0 2.7 0.15

1 Toronto 2 january 2 2023 5.6 3.5 0.00

2 Toronto 3 january 3 2023 4.4 2.8 0.33

3 Toronto 4 january 4 2023 4.4 2.5 2.11

4 Toronto 5 january 5 2023 4.8 3.2 0.02

5 Toronto 6 january 6 2023 5.1 2.9 0.00

...TO 75

Expert Solution

Step by step

Solved in 4 steps with 3 images

SEE SOLUTION Check out a sample Q&A here

Knowledge Booster

Learn more about

Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.