Requirements In this exercise, the learner should write a mass downloader script. The script should download all links of a certain type from a given website. Therefore, the script should do the following: Ask a user for a valid URL address. Download the HTML file of that address. Extract all URLS from the HTML file. Ask the user to select a file type (use .pdf file type). Retrieve the objects of all links that match the selected file type. Requirements: Ask a user for a valid URL address You have to validate the URL entered by user. You can use urlparse available in urllib package to manipulate the URL string entered by the user. Note: there is more than one way to validate an URL address and you are free to use the easiest way you prefer. (Regex can also be used). Download the HTML file of that address For this option you have to download the HTML file of the webpage that the user entered its URL address in the previous option. Note that the HTML file should be stored in your machine. You can use urlretrieve and pass the URL address as its first parameter. Extract all URLS from the HTML file In the HTML files the href attribute specifies the URL of the page that the link goes to: Example: test.pdf Some developers specify the URL of an object in the src attribute in the HTML file: Example: In our exercise; and for the sake of simplicity you have to take into consideration the href attribute only when extracting html links.

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

hello guys, please help me solving this python script

Thanks for helping me

Requirements
In this exercise, the learner should write a mass downloader script. The script should
download all links of a certain type from a given website.
Therefore, the script should do the following:
Ask a user for a valid URL address.
Download the HTML file of that address.
Extract all URLS from the HTML file.
Ask the user to select a file type (use .pdf file type).
Retrieve the objects of all links that match the selected file type.
Requirements:
Ask a user for a valid URL address
You have to validate the URL entered by user. You can use urlparse available in urllib package
to manipulate the URL string entered by the user.
Note: there is more than one way to validate an URL address and you are free to use the easiest
way you prefer. (Regex can also be used).
Download the HTML file of that address
For this option you have to download the HTML file of the webpage that the user entered its
URL address in the previous option. Note that the HTML file should be stored in your machine.
You can use urlretrieve and pass the URL address as its first parameter.
Extract all URLS from the HTML file
In the HTML files the href attribute specifies the URL of the page that the link goes to:
Example: <a href="data/test.pdf">test.pdf</a>
Some developers specify the URL of an object in the src attribute in the HTML file:
Example: <embed src="study/sample.pdf" type="application/pdf" >
In our exercise; and for the sake of simplicity you have to take into consideration the href attribute only
when extracting html links.
Transcribed Image Text:Requirements In this exercise, the learner should write a mass downloader script. The script should download all links of a certain type from a given website. Therefore, the script should do the following: Ask a user for a valid URL address. Download the HTML file of that address. Extract all URLS from the HTML file. Ask the user to select a file type (use .pdf file type). Retrieve the objects of all links that match the selected file type. Requirements: Ask a user for a valid URL address You have to validate the URL entered by user. You can use urlparse available in urllib package to manipulate the URL string entered by the user. Note: there is more than one way to validate an URL address and you are free to use the easiest way you prefer. (Regex can also be used). Download the HTML file of that address For this option you have to download the HTML file of the webpage that the user entered its URL address in the previous option. Note that the HTML file should be stored in your machine. You can use urlretrieve and pass the URL address as its first parameter. Extract all URLS from the HTML file In the HTML files the href attribute specifies the URL of the page that the link goes to: Example: <a href="data/test.pdf">test.pdf</a> Some developers specify the URL of an object in the src attribute in the HTML file: Example: <embed src="study/sample.pdf" type="application/pdf" > In our exercise; and for the sake of simplicity you have to take into consideration the href attribute only when extracting html links.
Ask the user to select a file type (use .pdf file type).
The user should select pdf file type for the objects he is willing to retrieve.
Retrieve the objects of all links that match the selected file type
All pdf files available on the web page should be retrieved and stored in your machine. (Regex
can be used to find all the links that matches the user selection).
Transcribed Image Text:Ask the user to select a file type (use .pdf file type). The user should select pdf file type for the objects he is willing to retrieve. Retrieve the objects of all links that match the selected file type All pdf files available on the web page should be retrieved and stored in your machine. (Regex can be used to find all the links that matches the user selection).
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Similar questions
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY