1. Within your Jupyter Notebook, write the code for a Python function called def parseWeatherByYear(year) : This function will parse an html page containing weather for an entire year of data for the city of Toronto. The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023 The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023). A series of lines containing where to begin extracting data is listed below: January 1 5.0 2.7 0.15 Notice the marker in the lines above: /cities/toronto/day/month-n In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group).
Q1. Within your Jupyter Notebook, write the code for a Python function called
def parseWeatherByYear(year) :
This function will parse an html page containing weather for an entire year of data for the city of Toronto.
The html pages containing weather data can be downloaded from: https://www.extremeweatherwatch.com/cities/toronto/year-2023
The file to parse for this lab however can be downloaded here: https://matrix.senecacollege.ca/~danny.abesdris/prg550.232/labs/lab6/torontoWeather.2023.html
The html file itself contains markers as where to begin parsing the data to extract. The 3 pieces of data that must be extracted consist of the high and low temperatures (in degrees Celsius) as well as the amount of precipitation (in cm) for every day so far in the current year (2023).
A series of lines containing where to begin extracting data is listed below:
<td><div class='width-130'><a href='/cities/toronto/day/january-1'>January 1</a></div></td>
<td class='text-right temp40'>5.0</td>
<td class='text-right temp30'>2.7</td>
<td class='text-right rainsnow1'>0.15</td>
</tr>
Notice the marker in the lines above:
/cities/toronto/day/month-n
In the example above, the data to extract would be: 5.0, 2.7, and 0.15. The extraction can be achieved in several ways, but a carefully structured regular expression (using the match.group( ) directive as well as the re.S and re.M flags) is recommended for speed and simplicity. The trick here is to match text up to the point where the data begins (as groups) and then forming another regular expression that matches the data (again as a group).
Step by step
Solved in 3 steps