FAQ: Web Scraping with Beautiful Soup - Find All

This community-built FAQ covers the “Find All” exercise from the lesson “Web Scraping with Beautiful Soup”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise Find All

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

1 Like

The lesson shows that you can pass a function to soup.find_all() with the following example:

def has_banner_class_and_hello_world(tag):
    return tag.attr('class') == "banner" and tag.string == "Hello world"

soup.find_all(has_banner_class_and_hello_world)

How does the function get its “tag” argument?

Is it because “tag” is an arg for .find_all() so it automatically supplies “tag” for the function? If that’s correct, would it work if the “tag” in .find_all() were a kwarg instead of an arg?

4 Likes

If I try to use

def has_banner_class_and_hello_world(tag):
return tag.attr(‘class’) == “banner” and tag.string == “Hello world”

soup.find_all(has_banner_class_and_hello_world)
in my own code or in the Code Academy example I receive the following error:

Traceback (most recent call last):
File “script.py”, line 12, in
soup.find_all(has_banner_class_and_hello_world)
File “/usr/local/lib/python3.6/dist-packages/bs4/element.py”, line 1376, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/bs4/element.py”, line 616, in _find_all
found = strainer.search(i)
File “/usr/local/lib/python3.6/dist-packages/bs4/element.py”, line 1781, in search
found = self.search_tag(markup)
File “/usr/local/lib/python3.6/dist-packages/bs4/element.py”, line 1737, in search_tag
or (markup and self._matches(markup, self.name))
File “/usr/local/lib/python3.6/dist-packages/bs4/element.py”, line 1813, in _matches
return match_against(markup)
File “script.py”, line 10, in has_banner_class_and_hello_world
return tag.attr(‘class’) == “banner” and tag.string == “Hello world”
TypeError: ‘NoneType’ object is not callable

1 Like

There seems to be a slight mistake in the example code. First of all, I think tag.attr is a typo for tag.attrs. Also, tag.attrs will be a dictionary, so we should use square brackets [] instead of parentheses () to get a value. But it would result in a KeyError for a tag that doesn’t have a class attribute, so we should use the .get() method instead.

In summary, I think the example code should be modified as follows:

def has_banner_class_and_hello_world(tag):
    return tag.attrs.get('class') == "banner" and tag.string == "Hello world"

soup.find_all(has_banner_class_and_hello_world)

I tried a bit more, and found the tag.attrs.get('class') will be a list or None. So the code in the previous post is still inaccurate and we need to modify it as follows:

def has_banner_class_and_hello_world(tag):
    return "banner" in tag.attrs.get('class', []) and tag.string == "Hello world"

soup.find_all(has_banner_class_and_hello_world)

Hi!
I had the very same question regarding how they are passing a function as an argument. Were you able to find any answers? Like you insinuated, my guess is that the tag they’re passing will be the soup variable that the other methods call on.

Why hasn’t Codecademy provided a proper answer after a year? I also wonder how this is possible.

3 Likes

I think we can use find_all() function without any argument at all. It means that we get HTML code for every tag in soup object. In this particular exercise we have list with 51 element.
Our custom function get tag argument and return True if conditions inside the function are met. If custom func. returns True, then find_all() return HTML code with that particular tag.
We put custom func. inside find_all() as it was a variable, pass the function object as an argument to another function. It’s like a lambda function.
I don’t know what attr() function is… We have to use tag.get(‘name of attr’) or tag.atters[‘name of attr’] because tag.atters return dict.

For example

def find_all_condition(tag):
  return tag.get('href') == 'hal.html' or tag.string == 'Spyro'
print(soup.find_all(find_all_condition))` works.

I’m assuming the find_all method uses the has_banner… function to iterate over all of the tags in the soup object.

When soup.find_all(has_banner…) is called, it iterarively passes each tag to the has_banner… function which in turn returns either True of False, efectively selecting whether or not that tag should be included in the data returned by the soup.find_all(…) call (which can be stored in a variable or not).

.find_all() searches for all descendent elements that you specify and returns a list. You can pass through a number of arguments----tag name, text, attrs, etc. to be more specific in your search.

You can also always check the documentation:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all