Requirements In this exercise, the learner should write a mass downloader script. The script should download all links of a certain type from a given website. Therefore, the script should do the following: Ask a user for a valid URL address. Download the HTML file of that address. Extract all URLS from the HTML file. Ask the user to select a file type (use .pdf file type). Retrieve the objects of all links that match the selected file type. Requirements: Ask a user for a valid URL address You have to validate the URL entered by user. You can use urlparse available in urllib package to manipulate the URL string entered by the user. Note: there is more than one way to validate an URL address and you are free to use the easiest way you prefer. (Regex can also be used). Download the HTML file of that address For this option you have to download the HTML file of the webpage that the user entered its URL address in the previous option. Note that the HTML file should be stored in your machine. You can use urlretrieve and pass the URL address as its first parameter. Extract all URLS from the HTML file In the HTML files the href attribute specifies the URL of the page that the link goes to: Example: test.pdf Some developers specify the URL of an object in the src attribute in the HTML file: Example: In our exercise; and for the sake of simplicity you have to take into consideration the href attribute only when extracting html links.
Requirements In this exercise, the learner should write a mass downloader script. The script should download all links of a certain type from a given website. Therefore, the script should do the following: Ask a user for a valid URL address. Download the HTML file of that address. Extract all URLS from the HTML file. Ask the user to select a file type (use .pdf file type). Retrieve the objects of all links that match the selected file type. Requirements: Ask a user for a valid URL address You have to validate the URL entered by user. You can use urlparse available in urllib package to manipulate the URL string entered by the user. Note: there is more than one way to validate an URL address and you are free to use the easiest way you prefer. (Regex can also be used). Download the HTML file of that address For this option you have to download the HTML file of the webpage that the user entered its URL address in the previous option. Note that the HTML file should be stored in your machine. You can use urlretrieve and pass the URL address as its first parameter. Extract all URLS from the HTML file In the HTML files the href attribute specifies the URL of the page that the link goes to: Example: test.pdf Some developers specify the URL of an object in the src attribute in the HTML file: Example: In our exercise; and for the sake of simplicity you have to take into consideration the href attribute only when extracting html links.
Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
Related questions
Question
hello guys, please help me solving this python script
Thanks for helping me
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by step
Solved in 2 steps
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY