Chocolate Scraping with Beautiful Soup (Converting to Float)

https://www.codecademy.com/paths/data-science/tracks/learn-web-scraping/modules/beautiful-soup/projects/chocolate-scraping-with-beautiful-soup

I am stuck on step 7. I have been successful in creating the loop to get the text for all ratings within the table, but I cannot find any way to exclude the first element (the title of the column, “Rating”) from my ratings list. I have tried a remove command, tried excluding the element via “float(s([1:])”, and other wild ideas. I can create a list full of text strings, but need to convert all numerical strings into float for the next steps of the project.

My current code for step 7 is:
for a in rating_links:
s = a.get_text()
s.remove(‘Ratings’) #this line will not remove the string
ratings.append(float(s)) #Cannot convert to float due to “Rating” string, with or without remove command

Any help you can provide would be appreciated.

1 Like

Hi @rng009,

How about having the loop begin its iterations at index 1, in order to avoid the title of the column, as follows?:

r = soup.find_all(attrs={"class": "Rating"})
ratings = []
for rating in r[1:]:
  ratings.append(float(rating.get_text()))

Please let us know how this works out.

Edited on September 2, 2019 to add the following:

Your posted code is not formatted, so it does not exhibit its indentation. See How to ask good questions (and get good answers) for advice on formatting code for posting.

3 Likes

This worked very well. Thank you!

3 Likes

The hint for step 7 might lead some users astray. It reads, in part:

The first element of your tags list probably contains the header string "Ratings"

It is actually "Rating".

Since each of the columns on the Cacao Ratings page has a header, a good general approach for iterating through the actual data would be to skip the item at index 0.

The hint also includes this, which may be more helpful:

Start your loop at element 1 of the list instead.

3 Likes

I am trying the approach mentioned above but for some reason it is not adding anything to the ratings list. There has to be something I’m missing but just cannot figure out what it is. My code is below:

import codecademylib3_seaborn
from bs4 import BeautifulSoup
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

webpage = requests.get('https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/cacao/index.html')

soup = BeautifulSoup(webpage.content, 'html.parser')

#print(soup)

chocolate_ratings = soup.find_all(attrs={"class": "rating"})

ratings = []

for r in chocolate_ratings[1:]:
  ratings.append(float(r.get_text()))

print(ratings)

This just prints out a blank list.

Hi @method3835719473,

The name of the class with the data that you are attempting to access actually begins with an uppercase letter, but you have it in lowercase.

1 Like

Wow thanks. I’m an idiot.

@ method,
Experienced the same problem. Took me nearly an hour to figure out I had left an ‘s’ when declaring the attrs variable.

Excuse me guys, Why is it necessary to use the [1:] in the for loop ?
I noticed this code didn’t work without it

Thank you.

The first item within our list is the column title (“Rating” was one of them). [1:] allows us to skip over the column title and get to the data. You can see for yourself by printing rating_links[0] for example.