Problem D. Find URLs HTML is a language for representing documents designed to be displayed by a web browser. In many browsers, you can see the HTML source code by right-clicking somewhere on the page and clicking “View Page Source”. If you try this on a web page with links to other websites, you’ll notice that the URL of the link is usually formatted in the following way: href="https://some.website.com/subfolder/more_stuff.txt" Write a function find_url(html) that takes in a string of html text that contains exactly one external link URL formatted as above, and returns just the URL string (in the above example, that would be https://some.website.com/subfolder/more_stuff.txt). You can assume that the only place in the string where the substring href=" occurs is right before the URL, and that the next quotation mark after that point denotes the end of the URL. Hints: ● The .find method and string slicing will likely make this easier. ● Remember that in order to use a double quote mark (") in a string, you either need to escape it with a backslash ("\""), or just use single quotes to begin/end the string ('"'). Examples: >>> find_url('title="Association for Computing Machinery">ACM DL: 81100248871') 'https://dl.acm.org/profile/81100248871' >>> find_url(']Intact Forest Landscapes') 'http://www.intactforests.org/'
Problem D. Find URLs
HTML is a language for representing documents designed to be displayed by a web browser. In
many browsers, you can see the HTML source code by right-clicking somewhere on the page and clicking “View Page Source”.
If you try this on a web page with links to other websites, you’ll notice that the URL of the link is usually formatted in the following way: href="https://some.website.com/subfolder/more_stuff.txt"
Write a function find_url(html) that takes in a string of html text that contains exactly one external link URL formatted as above, and returns just the URL string (in the above example, that would be https://some.website.com/subfolder/more_stuff.txt).
You can assume that the only place in the string where the substring href=" occurs is right before the URL, and that the next quotation mark after that point denotes the end of the URL.
Hints:
-
● The .find method and string slicing will likely make this easier.
-
● Remember that in order to use a double quote mark (") in a string, you either need to
escape it with a backslash ("\""), or just use single quotes to begin/end the string ('"').
Examples:
>>> find_url('title="Association for Computing Machinery">ACM DL</a>: <span class="uid"><a rel="nofollow" class="external text" href="https://dl.acm.org/profile/81100248871">81100248871</a></span></ span></li>') 'https://dl.acm.org/profile/81100248871'>>> find_url('</a><span class="mw-editsection-bracket">]</span></span></h2><ul><li><a rel="nofollow" class="external text" href="http://www.intactforests.org/">Intact Forest Landscapes</a></li>')
'http://www.intactforests.org/'
Trending now
This is a popular solution!
Step by step
Solved in 2 steps with 2 images