Web Scraping Chocolate Webpage

Hello,
I am doing the ‘Chocolate Scraping with Beautiful Soup’ project:
Web Scraping | Codecademy

Currently on step 7 and attempted to make a more general code to scrape the table doing all columns at once then calling a specific column like so, ratings = (df['Rating']) (see code below)

First of all, is this a good way of doing it?
Unfortunately I can’t test it quite yet because I’m getting this error:

File “script.py”, line 15, in
table = soup.select(‘table.cacaoTable’)[1]
IndexError: list index out of range

I thought starting at index 1 would work since it skips over the column title.

Also, how would I apply the float() function to the ratings data if I am doing it this more general way?

import codecademylib3_seaborn

from bs4 import BeautifulSoup as bs

import requests

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

#leave out columns REF(3),Review Date (4),Company Location (6)

url = 'https://content.codecademy.com/courses/beautifulsoup/cacao/index.html'

r = requests.get(url)

soup = bs(r.content, 'html.parser')

table = soup.select('table.cacaoTable')[1]

columns = table.find('tbody').find_all('td')

column_names = [c.string for c in columns]

table_rows = table.find('tr')

l = []

for tr in table_rows:

  td = tr.find_all('td')

  row = [str(tr.get_text()).strip() for tr in td] #converting to python string object then use .strip() to clean 

  l.append(row)

   

df = pd.DataFrame(l, columns=column_names)

ratings = (df['Rating'])

print(ratings)

Thank you for any help!!

This issue is now solved, here is my updated code:

import codecademylib3_seaborn
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

#leave out columns REF(3),Review Date (4),Company Location (6)

url = 'https://content.codecademy.com/courses/beautifulsoup/cacao/index.html'

r = requests.get(url)
soup = bs(r.content, 'html.parser')



table = soup.find("table", { "id" : "cacaoTable" })
# print(table)

columns = table.find('tr').find_all('td')
column_names = [c.string for c in columns]

table_rows = table.find_all('tr')
l = []

for tr in table_rows:
  td = tr.find_all('td')
  row = [str(tr.get_text()).strip() for tr in td] #converting to python string object then use .strip() to clean 
  l.append(row)
   
df = pd.DataFrame(l, columns=column_names)
ratings = (df['Rating'])
print(ratings)

print(df)


companies = (df['Company'])

print(companies)
1 Like