Chocolate Scraping with Beautiful Soup

https://www.codecademy.com/paths/data-science/tracks/learn-web-scraping/modules/beautiful-soup/projects/chocolate-scraping-with-beautiful-soup

In step 17 the prompt asks to copy and paste a code snippet into the script most likely to show a visualization enhancement of some kind. I have no problems up to this point but receive a TypeError after.
the error is as follows:
“Traceback (most recent call last):
File “script.py”, line 46, in
z = np.polyfit(df.CocoaPercentage, df.Rating, 1)
File “<array_function internals>”, line 6, in polyfit
File “/usr/local/lib/python3.6/dist-packages/numpy/lib/polynomial.py”, line 592, in polyfit
x = NX.asarray(x) + 0.0
TypeError: must be str, not float”

I understand why it is giving me an error, At the time I forgot to wrap my cocoa percents in a float(). but this would change it from a string to float value, and the type error is asking for a string? just curious why the errors seems to be a contradiction of what is needed.

import codecademylib3_seaborn
from bs4 import BeautifulSoup
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

webpage = requests.get('https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/cacao/index.html')

soup = BeautifulSoup(webpage.content, "html.parser")

#print(soup)
rating_attrb = soup.find_all(attrs={'class','Rating'})
ratings = []
for text in rating_attrb[1:]:
  ratings.append(float(text.get_text()))

plt.hist(ratings)
plt.show()

company_attrb = soup.find_all(attrs={'class','Company'})
company = []
for text in company_attrb[1:]:
  company.append(text.get_text())
d = {
  'Company': company, 
  'Rating': ratings
}
df = pd.DataFrame.from_dict(d)

company_rating = df.groupby('Company')['Rating'].mean()

print(company_rating.nlargest())

cocoa_attrb = soup.find_all(attrs={'class', 'CocoaPercent'})
cocoa_percents = []
for text in cocoa_attrb[1:]:
  cocoa_percents.append(text.get_text())
cocoa_percents = [x.split('%')[0] for x in cocoa_percents] #adding float(x.split('%')[0]) alleviates Type Error

plt.clf()
d.update({'CocoaPercentage':cocoa_percents})
df = pd.DataFrame.from_dict(d)

plt.scatter(df.CocoaPercentage, df.Rating)

#problem Code
z = np.polyfit(df.CocoaPercentage, df.Rating, 1)
line_function = np.poly1d(z)
plt.plot(df.CocoaPercentage, line_function(df.CocoaPercentage), "r--")

plt.show()

The interpreter didn’t know what you were actually trying to do. When you provided an unconverted str, instead of a float, it led to numpy’s attempting to combine a str and a float, and that message got generated based on an incorrect assumption regarding your intention.

You can do this in a cleaner way:

for text in cocoa_attrb[1:]:
  cocoa_percents.append(text.get_text())
cocoa_percents = [x.split('%')[0] for x in cocoa_percents] #adding float(x.split('%')[0]) alleviates Type Error

Use the str.strip() method instead, then convert to float, as follows:

for text in cocoa_attrb[1:]:
  percent = float(text.get_text().strip('%'))
  cocoa_percents.append(percent)
2 Likes