Chocolate Scraping with BeautifulSoup

I need help with the BeautifulSoup project. here is the link
https://www.codecademy.com/paths/data-science/tracks/learn-web-scraping/modules/beautiful-soup/projects/chocolate-scraping-with-beautiful-soup
i don’t know how to do #7…"Loop through the ratings tags and get the text contained in each one. Add it to the ratings list.

As you do this, convert the rating to a float, so that the ratings list will be numerical. This should help with calculations later."
any advice on how to setup the loop to get all the ratings and change them to floats would be helpful

2 Likes

Hi, welcome to the forums!

Without seeing the code you would do something like

  • write a for-loop that iterates through the ratings tag
  • filter for the desired text
  • convert the filtered text rating into a float (e.g. float('5.6'))
  • add the converted filter to the ratings list.

Which of these do you have trouble with?

this is what i have so far on the project.
webpage = requests.get(‘https://s3.amazonaws.com/codecademy-
content/courses/beautifulsoup/cacao/index.html’)
soup = BeautifulSoup(webpage.content, ‘html.parser’)
rating_links = soup.find_all(attrs={‘class’:‘Rating’})
ratings=
rating = rating_links[1].get_text()
rating = float(rating)
print(rating)

that works but i cant figure out how to do a loop to find every rating after rating_links[1] and then insert all of them into ratings

One type of basic for-loop:

test = ['hi', 'how', 'are', 'you', 'doing?']

for i in test:
	print(i)
#output
'''
hi
how
are
you
doing?
'''

#or same thing a different way
for i in range(0, len(test)):
	print(test[i])

#output
'''
hi
how
are
you
doing?
'''

#note if you just print i here:

for i in range(0, len(test)):
	print(i)

#output
0
1
2
3
4

Actually, you need to find every rating, beginning with rating_links[1].

To do that, you need a for loop that looks at each rating within the slice rating_links[1:]. That slice excludes rating_links[0], which contains the column headings that should not be processed as numerical data.

1 Like

I cant figure out how to get “rating_links[1:]” to work.
i tried
for rating in rating_links:
rating = rating_links[1].get_text()
ratings[rating] = rating_links[1:].get_text().split(’|’)

and i know thats wrong but I’m stuck. Anytime I try to do the range it tells me im crazy.

Hi @presragan,

Your source of data for the iteration is rating_links[1:]. Accordingly, this would be a good loop header:

for rating in rating_links[1:]:

Given that header, your loop variable is rating, and it needs to be converted to a float. Therefore, the loop should contain this:

  ratings.append(float(rating.get_text()))
1 Like

thank you! that worked. I had been stuck on that forever. nice to finally move on to the next part.

one last question though. so on #14 of the same project, it asks you to do the same thing for the percentages and i put
cocoa_links = soup.select(".CocoaPercent")

cocoa_percents =

for td in cocoa_links[1:]:

percent = int(td.get_text().strip(’%’))

cocoa_percents.append(percent)

and i get a valueError when i try to use it. can you see anything wrong with it?

Do any of the values contain decimal points? If so, you’ll need to convert them to float instead of directly to int.

thats what i had to do. thanks again

Hi, I’m sorry to necro a board that’s a few months old, but I’m having trouble getting past the exact same step on the same problem myself.
Following the solutions in this thread, I’ve gotten this so far:

#1. Import libraries.

import codecademylib3_seaborn

from bs4 import BeautifulSoup

import requests

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

#2. Make request to site, GET raw HTML.

webpage_response = requests.get('https://content.codecademy.com/courses/beautifulsoup/cacao/index.html')

#2.5 Store CONTENT

webpage = webpage_response.content

#3. Create BeautifulSoup object from HTML.

soup = BeautifulSoup(webpage, "html.parser")

# (check)

#print(soup)

#4. Going to put all ratings into a list...

#Looks like the class "Rating" is in <td> tags.

td_ratings = soup.find_all(attrs={"class": "Rating"})

# (check)

#print(td_rating)

#5. Create empty list to store those ratings in.

ratings = []

for rating in td_ratings[1:]:

  ratings.append(float(td_ratings.get_text()))

print(ratings)

Parts 4 and 5 in my lines of code are the relevant ones (I think) with this step. Anyways, when I try to run this code, it tells me that I’m treating a list of items as a single item–so I try changing .find_all() to .find() (but I don’t think this is the solution, since the hint for the previous step said to use .find_all()) and then I get an error saying
TypeError: unhashable type: 'slice'
for my
for rating in td_ratings[1:]: line.

I’ve tried making my for loop look like this too…

for rating in td_ratings:

  ratings_text = td_ratings[1:].get_text()

  ratings.append(float(ratings_text))

which results in an error. This works…

for rating in td_ratings:

  ratings_text = td_ratings[1].get_text()

  ratings.append(float(ratings_text))

but of course, I just get answer of several hundred 3.75s, not what I’m actually trying to do. Please help, I’m still learning. :disappointed_relieved:

Hi @joyce3216280928,

Check this for loop:

for rating in td_ratings[1:]:

  ratings.append(float(td_ratings.get_text()))

You are attempting to call the get_text method on td_ratings. Consider calling it on each rating, instead.

Wow, thank you so much! These forums are the greatest. And more importantly I know where and how I went wrong. :blush:

1 Like