Solution: Chocolate Scraping with Beautiful Soup (Project)

I was doing this project and just noticed that unlike all projects there is no video walkthrough available for this one. So thought it might help someone who is having trouble completing it. :blush:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

webpage = requests.get(‘https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/cacao/index.html’)
soup = BeautifulSoup(webpage.content, “html.parser”)

ratings_data = soup.find_all(attrs={‘class’: ‘Rating’})
ratings =
for rating in ratings_data[1:]:
ratings.append(float(rating.string))
print(ratings)

plt.hist(ratings)
plt.show()

company_data = soup.select(’.Company’)
companies =
for company in company_data[1:]:
companies.append(company.string)
print(companies)

dict = {
‘Company’: companies,
‘Rating’: ratings
}
df = pd.DataFrame.from_dict(dict)
df.head()

avg_ratings = df.groupby(‘Company’).Rating.mean()
top_ten = avg_ratings.nlargest(10)
print(top_ten)

cocoa_data = soup.select(’.CocoaPercent’)
cocoa_pcts =
for cocoa_pct in cocoa_data[1:]:
cocoa_pcts.append(int(float(cocoa_pct.string[:-1])))
print(cocoa_pcts)

df[‘CocoaPercentage’] = cocoa_pcts
df.head()

plt.cla()
plt.scatter(df.CocoaPercentage, df.Rating)
z = np.polyfit(df.CocoaPercentage, df.Rating, 1)
line_function = np.poly1d(z)
plt.plot(df.CocoaPercentage, line_function(df.CocoaPercentage), “r–”)
plt.show()

13 Likes

Definitely love that! Thanks!

3 Likes

Thank you - much appreciated. This wasn’t the best-written of modules.

4 Likes

Add my thanks! When you do that, it helps give us a reference point to check our own efforts and promotes better learning.

1 Like

Does this run super slow for anyone else? It like breaks my browser at the histogram part. I really wish this module had a video solution : (

I had to go through the same issue, it was super slow…
I don’t know maybe it’s something to do with the backend or the database where the data is being fetched from.

hey all,
If you guys run this module on jupyter notebook, it’s quick and you don’t need to go around to look at your graphs!

Life saving is what this is

It doesn’t run. LOL
Returns an error for the line:
ratings_data = soup.find_all(attrs={‘class’: ‘Rating’})