Could anyone tells me what is the difference between using .select() and .find_all when doing Web Scraping? And what are the pros and cons. of each one.
Web Scraping | Codecademy
Welcome to the forums.
If you think of the structure of an HTML document, which is like a tree (from top to bottom)…you have parent tags, children, siblings, descendants.
.find_all
method will find all instances of whatever you’re searching for. You can pass filters through that method as well (strings, regular expressions, lists for example).
Ex: soup.find_all("p", "title")
Will find all instances of p
tags in title
See the documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all
You can the .select()
method to locate all elements of a particular CSS class
. You can find elements by attr, ID, etc.
ex:
soup.select('a[href]')
which will find all <href>
from a specified class.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
It’s basically a way to be more specific for whatever it is you’re looking for in your search of the html document. It can be a little confusing but I think the documentation really helps.