Chocolate Scraping with Beautiful Soup

Hey!
I’m doing the Chocolate Scraping with Beautiful Soup project in the Data Science path here https://www.codecademy.com/paths/data-science/tracks/learn-web-scraping/modules/beautiful-soup/projects/chocolate-scraping-with-beautiful-soup
and am stuck.
I was able to create a Beautiful Soup object and find the appropriate tag but now I don’t know how to loop through the tags and put the values in a list.

webpage = requests.get('https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/cacao/index.html')
soup = BeautifulSoup(webpage.content, "html.parser")
all_td_elements = soup.find_all(attrs={"class": "Rating"})
ratings = []
print(all_td_elements)

This prints out a list that looks like: 3.75
My instructions now are:

“Loop through the ratings tags and get the text contained in each one. Add it to the ratings list.
As you do this, convert the rating to a float, so that the ratings list will be numerical. This should help with calculations later.”

I’m grasping at straws with the loop and tried the following:

for rating in ratings:
  ratings = all_td_elements.select("Rating")[1].get_text()

I know that I have to convert the ratings into a float but I don’t know how to do this or at what point in the coding.

Can someone please point me in the right direction? There’s no walk through for this project and the hints are not so instructive.

Thanks!

  • Ian

Hi, @jewsnackman,

One problem is that you are repeatedly overwriting ratings within this loop:

for rating in ratings:
  ratings = all_td_elements.select("Rating")[1].get_text()

Edited on June 30, 2020 to add the following:

Instead of looping through ratings, you need to loop through all_td_elements. Take into account that at the top of the table there are column headings that do not contain actual data.

Hey, thanks for the response!

I’ve changed the loop to this:

for rating in ratings:
 all_td_elements = soup.select("Rating")[1].get_text() 
ratings = all_td_elements.append

The message I got after print(all_td_elements) is:

<built-in method append of ResultSet object at 0x7ff6ccd5cdb8>

Update: I realized that the last line of the loop wasn’t indented. So now I’m getting an empty list after calling print(ratings)

Here you initialized ratings to an empty list, which is the right thing to do:

ratings = []

However, here you are looping through that empty list, which does not accomplish anything:

for rating in ratings:

Instead, loop through a slice of all_td_elements that excludes the first element, at index 0, which contains the column headings. The loop header, which begins with the element at index 1, would be:

for rating in all_td_elements[1:]:

Inside the loop, append to the ratings list the text from each rating converted to a float, as follows:

  ratings.append(float(rating.get_text()))
2 Likes

Thanks for the help! The looping process here makes much more sense now that I can see it and follow the steps.
Cheers!

  • Ian
1 Like