BeautifulSoup - Chocolate Project - Step 12

Hi,

I’m a bit stuck on step #12 on https://www.codecademy.com/courses/learn-web-scraping/projects/chocolate-scraping-with-beautiful-soup and I’m hoping someone can help break me loose. I cannot tell if I am missing a step or if I am simply missing on exact syntax.

I am looking to output a dataframe with columns ‘Company’ and ‘Rating.’ The solution below shows only the company index and the corresponding average.

Any help would be much appreciated. Thank you!

import codecademylib3_seaborn
from bs4 import BeautifulSoup
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#make a request to get the webpage
webpage = requests.get("https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/cacao/index.html")

#turn that webpage into BeautifulSoup content
content = BeautifulSoup(webpage.content,"html.parser")

#find all of the names and ratings of the companies
company_names = content.find_all(attrs={"class": "Company"})
chocolate_ratings = content.find_all(attrs={"class": "Rating"})

#create the empty lists 
ratings = []
companies = []

#build the companies list, skipping the first value in the list
for x in range(1,len(company_names)):
  companies.append((company_names[x].get_text()))

#build the ratings list, skipping the first value in the list
for x in range(1,len(chocolate_ratings)):
  ratings.append(float(chocolate_ratings[x].get_text()))

#create a dictionary out of the lists - map the columns to the lists above
chocolate_dict = {"Company": companies, "Rating": ratings}

#load the dictionary into a data frame
companies_and_rating_df = pd.DataFrame.from_dict(chocolate_dict)

#group the list by company name and get the average rating
#this produces a dataframe with 416 values with proper Company and Rating columns 
rating_by_company = companies_and_rating_df.groupby('Company').Rating.mean().reset_index()

#get the highest rated companies 
#this produces a list of indexes and values - no longer company names and values
ten_best = rating_by_company.Rating.nlargest(10)

print(ten_best)
1 Like

…if you have a column but meant to have the whole thing, then where did you pick out just the column? Don’t, right?
If you work your way backwards then you can at the very least find out which operation behaved differently from what you expected.

Hi, I think your question has already been solved. Since I could not see any answer here, just attached my code until print out top10 here and hope it can help if anyone reviews this later.

import codecademylib3_seaborn from bs4 import BeautifulSoup import requests import pandas as pd import matplotlib.pyplot as plt import numpy as np #request to get html webpage = requests.get('https://content.codecademy.com/courses/beautifulsoup/cacao/index.html') #create soup object soup = BeautifulSoup(webpage.content, 'html.parser') #print(soup) # get all of the tags that contain the ratings rating_tags = soup.find_all(attrs={'class':'Rating'}) #loop all tags, get rating by using get_text ratings=[] for tg in rating_tags[1:]: ratings.append(float(tg.get_text())) #print(ratings) # get all of the tags that contains company company_tags = soup.find_all(attrs={'class':'Company'}) companies = [] for tg in company_tags[1:]: companies.append(tg.get_text()) #create a dataframe with two columns company and rating df = pd.DataFrame({'company':np.array(companies),'rating':np.array(ratings)}) #print(df.head()) #find top10 high rating companies top10 = df.groupby('company').rating.mean().nlargest(10).reset_index() print(top10)
1 Like