I am doing the ‘Chocolate Scraping with Beautiful Soup’ project:
Web Scraping | Codecademy
Currently on step 7 and attempted to make a more general code to scrape the table doing all columns at once then calling a specific column like so,
ratings = (df['Rating']) (see code below)
First of all, is this a good way of doing it?
Unfortunately I can’t test it quite yet because I’m getting this error:
File “script.py”, line 15, in
table = soup.select(‘table.cacaoTable’)
IndexError: list index out of range
I thought starting at index 1 would work since it skips over the column title.
Also, how would I apply the float() function to the ratings data if I am doing it this more general way?
import codecademylib3_seaborn from bs4 import BeautifulSoup as bs import requests import pandas as pd import matplotlib.pyplot as plt import numpy as np #leave out columns REF(3),Review Date (4),Company Location (6) url = 'https://content.codecademy.com/courses/beautifulsoup/cacao/index.html' r = requests.get(url) soup = bs(r.content, 'html.parser') table = soup.select('table.cacaoTable') columns = table.find('tbody').find_all('td') column_names = [c.string for c in columns] table_rows = table.find('tr') l =  for tr in table_rows: td = tr.find_all('td') row = [str(tr.get_text()).strip() for tr in td] #converting to python string object then use .strip() to clean l.append(row) df = pd.DataFrame(l, columns=column_names) ratings = (df['Rating']) print(ratings)
Thank you for any help!!