Chocolate Scraping with BeautifulSoup scatter issue

Well, I can’t finish the exercise and I’m almost there.

Here’s the exercise:

https://www.codecademy.com/courses/learn-web-scraping/projects/chocolate-scraping-with-beautiful-soup

So, I couldn’t create a scatterplot of ratings, somthing is wrong with my code. The step 16 is saying:

plt.scatter(df.CocoaPercentage, df.Rating)
plt.show()

And here’s my whole code:

from bs4 import BeautifulSoup
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

webpage = requests.get("https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/cacao/index.html")

soup = BeautifulSoup(webpage.content, "html.parser")

ratings = []

chocolate_ratings = soup.find_all("td", {"class":"Rating"})
for rating in chocolate_ratings[1:]:
    ratings.append(float(rating.get_text()))
    
plt.hist(ratings)

company_tags = soup.select(".Company")
companies_names = []
for company in company_tags[1:]:
    companies_names.append(company.get_text())

cocoa_percents = []
cocoa_percent_tags = soup.select(".CocoaPercent")
for td in cocoa_percent_tags[1:]:
    percent = float(td.get_text().strip('%'))
    cocoa_percents.append(percent)

d = {"Empresa": companies_names, "Avaliação": ratings, "Porcentagem Cacau": cocoa_percents}
dataframe = pd.DataFrame.from_dict(d)

print(dataframe)

mean_vals = dataframe.groupby("Empresa").mean()

ten_best = mean_vals.nlargest(10, "Avaliação")
print(ten_best)

How can I do it since if I try:

plt.scatter(dataframe.cocoa_percents, df.ratings)
plt.show()

It doesn’t work!

Thanks!

I have zero experience with what you are doing, but I know that your references to dataframe don’t look right. Have you tried:

plt.clf()
plt.scatter(dataframe["Porcentagem Cacau"], dataframe.Avaliação)
plt.show()

I ran your code with the above 3 lines added, and got this:
image

2 Likes

Hi @pauloeduardo52835740,

One of the problems with your using the code suggested in step 16 is that it uses some names that differ from ones that are established by your code.

Note that step 15 offers this hint:

You can add the pairing "CocoaPercentage":cocoa_percents to the dictionary you used to create the DataFrame.

However, you have this

d = {"Empresa": companies_names, "Avaliação": ratings, "Porcentagem Cacau": cocoa_percents}

Step 16 also assumes that your DataFrame is named df, but you have defined dataframe instead here:

dataframe = pd.DataFrame.from_dict(d)

@midlindner’s code succeeds by replacing names from the hint in step 16 with ones used in your code.

3 Likes

Thank you so much! It works now.

1 Like

Actually I have another question: why should I write inside the scatter function

dataframe.Avaliação

Shouldn’t I write like this:

dataframe.["Avaliação"] 

I ask you that because the first argument is dataframe[“Porcentagem Cacau”] and the second one is dataframe.Avaliação.

Thanks

dataframe.Avaliação works because it is a single term with no spaces.
dataframe["Avaliação"] should work as well, but the syntax is not required as it is with dataframe["Porcentagem Cacau"] since “Porcentagem Cacau” contains a space.

A post was split to a new topic: Chocolate scraping