Info Finder Challenges
After completing the code-along, attempt the challenges below.
Optional Practice: Bug Fixing
Find and fix all of the bugs in the programs below. Note that the projects may have multiple bugs - fix all of them!
1. All Paragraphs
Get back into the Info Finder code. Update it so that instead of simply displaying the first paragraph, it displays all of them.
Updating the Function
- Find the
get_information
function in the code - In the body of the function, find the
for paragraph in paragraphs
loop - Above the loop, create a new variable named
para_texts
- Set the variable equal to a new empty list:
[]
- This will hold the text from each paragraph
- In the body of the loop, find the
if clean_text:
statement - In the body of the
if
, remove thereturn
statement - In its place, add
clean_text
to thepara_texts
list - Outside of the
for
loop, remove thereturn "No Info..."
statement - In its place, return the
para_texts
list
The program won't work just yet... it will be necessary to update the call to the get_information
function.
Updating the Call
- Find where the
get_information
function is called - Change the variable to be named
info_paras
- Create a new
for
loop to loop through each paragraph text - For each text, wrap it in a call to the
fill
function and print it - Above the loop, create an
if
statement - For the condition, check if
len(info_paras)
is0
- If it is, print a message saying "No Information Found"
Run the program and verify that all the paragraphs appear!
BONUS: Custom Number of Paragraphs
Instead of simply printing out all of the paragraphs, ask the user how many to print! In the for
loop body, after printing each paragraph, ask the user if they would like to continue. If they choose not to continue, break
out of the loop.
2. Printing a Table of Contents
In addition to finding the paragraphs of information on each page, find the table of contents information.
- In the
get_information
function, find thehtml_document
variable - Under that variable, create a variable named
toc_search
- Set the
toc_search
variable to be a new dictionary:{}
- In the
toc_search
dictionary, add a key of"id"
with a value of"toc"
- Under that, create a variable named
toc
- Set the
toc
variable to equal a call tohtml_document.find
- For the
find
call, pass in"div"
for the element type as the first parameter - For the second parameter, pass in the
toc_search
variable - Under that, create a variable named
toc_text
- Set the
toc_text
variable to equaltoc.get_text()
- This will return the text content for the table of contents
- Finally, print out the
toc_text
variable
Run the code, enter a search term, and verify that the table of contents text appears!
3. Philosophy Game
This challenge will be quite challenging.
One interesting thing about Wikipedia articles is that if you click on the first non-parenthesized, non-italicized link on the page, and repeat the process, it will almost always lead to the Philosophy page. Check out this article for more information. There is also this video about the subject:
Recreate the philosophy game using a Python script! The program should work as follows:
- Ask the user for a page to visit
- Load the page using requests/BeautifulSoup
- Find the first link on the page
- Avoid references to the same page, and citation links
- For an extra challenge, avoid parenthetical links as well
- Load the page at the new URL
- Repeat the process!
Note that "Philosophy" may be unreachable if parenthetical links are included. It will still be interesting to see where the journey goes though!
4. Scraping Another Webpage
This challenge could be challenging depending on the website.
Experiment with different websites! Try to extract data from an interesting place. Try to make it dynamic; create programs that ask the user what to do. Theoretically, any website could be scraped; it's all about looking at the HTML and figuring out how to grab the information.