Info Finder Challenges
After completing the code-along, attempt the challenges below.
Optional Practice: Bug Fixing
Find and fix all of the bugs in the programs below. Note that the projects may have multiple bugs - fix all of them!
1. All Paragraphs
Get back into the Info Finder code. Update it so that instead of simply displaying the first paragraph, it displays all of them.
Updating the Function
- Find the
get_informationfunction in the code - In the body of the function, find the
for paragraph in paragraphsloop - Above the loop, create a new variable named
para_texts - Set the variable equal to a new empty list:
[]- This will hold the text from each paragraph
- In the body of the loop, find the
if clean_text:statement - In the body of the
if, remove thereturnstatement - In its place, add
clean_textto thepara_textslist - Outside of the
forloop, remove thereturn "No Info..."statement - In its place, return the
para_textslist
The program won't work just yet... it will be necessary to update the call to the get_information function.
Updating the Call
- Find where the
get_informationfunction is called - Change the variable to be named
info_paras - Create a new
forloop to loop through each paragraph text - For each text, wrap it in a call to the
fillfunction and print it - Above the loop, create an
ifstatement - For the condition, check if
len(info_paras)is0 - If it is, print a message saying "No Information Found"
Run the program and verify that all the paragraphs appear!
BONUS: Custom Number of Paragraphs
Instead of simply printing out all of the paragraphs, ask the user how many to print! In the for loop body, after printing each paragraph, ask the user if they would like to continue. If they choose not to continue, break out of the loop.
2. Printing a Table of Contents
In addition to finding the paragraphs of information on each page, find the table of contents information.
- In the
get_informationfunction, find thehtml_documentvariable - Under that variable, create a variable named
toc_search - Set the
toc_searchvariable to be a new dictionary:{} - In the
toc_searchdictionary, add a key of"id"with a value of"toc" - Under that, create a variable named
toc - Set the
tocvariable to equal a call tohtml_document.find - For the
findcall, pass in"div"for the element type as the first parameter - For the second parameter, pass in the
toc_searchvariable - Under that, create a variable named
toc_text - Set the
toc_textvariable to equaltoc.get_text()- This will return the text content for the table of contents
- Finally, print out the
toc_textvariable
Run the code, enter a search term, and verify that the table of contents text appears!
3. Philosophy Game
This challenge will be quite challenging.
One interesting thing about Wikipedia articles is that if you click on the first non-parenthesized, non-italicized link on the page, and repeat the process, it will almost always lead to the Philosophy page. Check out this article for more information. There is also this video about the subject:
Recreate the philosophy game using a Python script! The program should work as follows:
- Ask the user for a page to visit
- Load the page using requests/BeautifulSoup
- Find the first link on the page
- Avoid references to the same page, and citation links
- For an extra challenge, avoid parenthetical links as well
- Load the page at the new URL
- Repeat the process!
Note that "Philosophy" may be unreachable if parenthetical links are included. It will still be interesting to see where the journey goes though!
4. Scraping Another Webpage
This challenge could be challenging depending on the website.
Experiment with different websites! Try to extract data from an interesting place. Try to make it dynamic; create programs that ask the user what to do. Theoretically, any website could be scraped; it's all about looking at the HTML and figuring out how to grab the information.