Help for Markov_Chain(fetch_data)


#1

Hello, I have started on the final project and ran into a roadblock. I have successfully gotten the data I want but still has html tags:

from bs4 import BeautifulSoup
import re
import requests

website = requests.get("https://www.techradar.com/news/computing-components/graphics-cards/best-graphics-cards-1291458").text
soup = BeautifulSoup(website, 'html.parser')
stats = soup.find_all('div', class_ = 'icon icon-plus_circle _hawk')
for review in stats:
	print review

Result:

<div class="icon icon-plus_circle _hawk">Solid 4K performance </div>
<div class="icon icon-plus_circle _hawk">Easy to overclock </div>
<div class="icon icon-plus_circle _hawk"> High fps 4K gaming </div>
<div class="icon icon-plus_circle _hawk"> Spearheading ray tracing revolution </div>
<div class="icon icon-plus_circle _hawk">Masters 1440p gaming</div>
<div class="icon icon-plus_circle _hawk">Vastly improved overall performance</div>
<div class="icon icon-plus_circle _hawk"> Maxed out 1080p performance</div>
<div class="icon icon-plus_circle _hawk">Impressive benchmark results</div>
<div class="icon icon-plus_circle _hawk">GPU tuning control</div>
<div class="icon icon-plus_circle _hawk">WorldΓÇÖs smallest 1080 Ti</div>
<div class="icon icon-plus_circle _hawk">SLI support</div>
<div class="icon icon-plus_circle _hawk">Affordably priced</div>
<div class="icon icon-plus_circle _hawk">Small form factor for tiny cases</div>
<div class="icon icon-plus_circle _hawk">Solid 1080p performer</div>
<div class="icon icon-plus_circle _hawk">Good overclocking potential</div>

However When I tried using regrex with this code:

from bs4 import BeautifulSoup
import re
import requests

website = requests.get("https://www.techradar.com/news/computing-components/graphics-cards/best-graphics-cards-1291458").text
soup = BeautifulSoup(website, 'html.parser')
stats = soup.find_all('div', class_ = 'icon icon-plus_circle _hawk')
for review in stats:
	print review
	txt = re.sub('[\t+]', ' ', review)
	print txt

It gives me

Traceback (most recent call last):
  File "fetch_data.py", line 12, in <module>
    txt = re.sub('[\t+]', ' ', review)
  File "C:\Python27\lib\re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or buffer

I have tried researching this bug don’t I still dont understand how to fix this.
Thank you for your time.


#2
stats = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['icon icon-plus_circle _hawk'])

Replace your current stats line with this…