FAQ: Web Scraping with Beautiful Soup - Select for CSS Selectors

This community-built FAQ covers the “Select for CSS Selectors” exercise from the lesson “Web Scraping with Beautiful Soup”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise Select for CSS Selectors

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

1 Like

Looks like this issue was due to a browser based bug. Refreshing the page allowed me to proceed.

when writing turtle_name = turtle.select(".name")[0]
why are we using the [0]?

Thanks

Hi @teoxd,

Link to Exercise: Learn Web Scraping with Beautiful Soup: Select for CSS Selectors

The expression turtle.select(".name") without the [0] gives us a list of all the tags with the class "name" on the page to which we have linked. It turns out that each page that we access within the for loop has only one tag assigned to the "name" class, but still, that single tag is contained within a list. To retrieve that tag from the list, we index the result of the expression with [0].

Edited on August 1, 2019 to add a link to the exercise

2 Likes

Hello all,

I am writing a response to this step:
(
First, before the loop that goes through the turtle_links , create an empty dictionary called turtle_data .
)

I am wondering if others are catching the exception that running the code more than once will throw.

I like this very pythonic looking approach:
(
def mkdir_p(path):
try:
os.makedirs(path)
except OSError as exc: # Python >2.5
if exc.errno == errno.EEXIST and os.path.isdir(path):
pass
else:
raise
)

The weird part is that when I run not catching the exception, just calling os.mkdir, then the error still pops up…even when I specifically cleared the dir away before running.

No worries. I have my work-around…NOPE…just checked…

  1. Using the exception handler, when the dir already exists…I see:

If I clobbered the dir and rerun i see:


(note that there was no output to the console…the dir was not found to already exist…and it was indeed created, as I can see, but I still see that error and cannot proceed.

Well, all apologies for one this mess!

Oh goodness…

I got a wild hair and thought that maybe it just wants to see something called turtle_data to be created…i.e. a variable name…

So I tried running:

#Define turtle_data:
turtle_data = “turtle_data”
mkdir_p(turtle_data )

but that errd with “expected turtle_data to be a dict” (!)

So, I added:

#Define turtle_data:
turtle_data = {}
mkdir_p(“turtle_data”)

…and I am All Green. LOLOL!

How does one typically approach an error that does not happen when one runs code outside of the web interface?

ie. this error “invalid syntax (”

But the code runs fine from Visual Studio and the command line.

chahn@DESKTOP-GE7BAOA MINGW64 /c/ROOT/study/python/scratch
$ python soupy4.py

|AGE: 1.5 Years Old|
|WEIGHT: 4.5 lbs|
|SEX: Female|
|BREED: African Aquatic Sideneck Turtle|
|SOURCE: found in Lake Erie|

chahn@DESKTOP-GE7BAOA MINGW64 /c/ROOT/study/python/scratch
$

“We have taken the links you found in the last exercise and turned them into links we can follow by prepending a prefix to them.”
I didn’t really grasp the notion of prefixes and why we need to add them? Why can’t we get the real links? What is the general rule? Are these prefixes same for all sates? If not, how do I know what to add exactly?

On question no.3 the answer is to check for a “.name” but I can’t find this tag in the inspector, I only found “.more-info”.

Where is the .name tag located ?

`for link in soup.select(".recipeLink > a"):
  webpage = requests.get(link)
  new_soup = BeautifulSoup(webpage)`

Sorry could anyone help me with why we are using ‘> a’ here in ‘for link in soup.select(".recipeLink > a"):’ ?

3 Likes

Hello all,

Thank you for reading this. In this intro, how come we never learn what the " > a" portion accomplishes?

Thank you.

I’ve been wondering the exact same thing.

Why do I have to use ‘.name’ can anyone explain?

Why this solution doesn’t works for the check?

With for loop you .get to different turtles ( name in the link changes) page.
For example https://s3.amazonaws.com/codecademy-content/courses/beautifulsoup/aesop.html there aesop is the turtle name
And on each individual turtle page you have the class “.name”

Maybe I don’t remember learning or skipped over it but I’m coming from the “Data Science Career Path” and am very confused at the last questions statement of " Add turtle_name to the dictionary as a key, and for now set the value of that key to an empty list". Could anyone give me an easy breakdown of what a dictionary is? I know it’s a stupid question but I’m 70% into this ■■■■■■■ course and I feel like I haven’t heard the term at all

1 Like

Same scenario, same question. And I looked at the solution to add the the data into the dictionary, and it doesn’t make any sense to me-

turtle_data[turtle_name] = []

How does this work?

1 Like

I had the same question, and did some search. According to the documentation, > is used to find tags directly beneath other tags. So soup.select(".recipeLink > a") will return the list of a tags directly beneath tags which have the class recipeLink.

Sorry to say that I don’t have an answer for you…just feel the need to share in the frustration. I, too, am following the Data Science Career Path and feel like out of all of the information and work presented so far, the Beautiful Soup unit is poorly constructed. Perhaps it is a part of another course path that contains additional background information, but it seems to neglect full explanations on many different components of the coursework. This has been a very frustrating and confusing portion of the career path.