FAQ: Creating, Loading, and Selecting Data with Pandas - Select Columns

This community-built FAQ covers the “Select Columns” exercise from the lesson “Creating, Loading, and Selecting Data with Pandas”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Data Science

Data Analysis with Pandas

FAQs on the exercise Select Columns

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

In the Select Columns exercise it is said that one way to select columns is selecting as if we were selecting a value from a dictionary using a key (number 1). I have finished Python-related exercises prior to this one but I’ve never encounter any lesson telling me about dictionary before.

Can someone please show me where is that lesson at?

Thank you.

Never mind, I’m sorry. I just found that the lesson is in the Python 3 syllabus but it seems not available in Data Science curriculum.

I had completed the Pandas path, then I reset the step prior to “select columns” in the “Creating, Loading, and Selecting Data with Pandas” (within the “learn data analysis with pandas”) and now in “select columns” I am receiving the following error (see below). My code is below the error. I don’t understand why I am getting this error.

Traceback (most recent call last):
File “script.py”, line 1, in
import codecademylib
File “/var/codecademy/runner_contexts/python/codecademylib.py”, line 1, in
import StringIO
ModuleNotFoundError: No module named ‘StringIO’

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)

clinic_north = df.clinic_north
print(type(clinic_north))
print(type(df))
1 Like

I’m also having this issue! any luck?

1 Like

Replace:

import codecademylib

For:

import codecademylib3
1 Like

Hello Python experts!
I was just playing around a bit and noticed that when I assign a variable like df = pd.DataFrame and view the type it says it is a class pd.core.frame.DataFrame. When I look at the dir of pd I see the DataFrame class method and when I look at the dir of the pd.core.frame.DataFrame I also see a DataFrame class method, and when I view the source they seem to be the same. So why (or how, by what mechanism) does an instance of class DataFrame defined as pd.DataFrame actually become pointed to the pd.core.frame.DataFrame class instead of the pd.DataFrame class? Some source of namespace thing? Just curious why the class is defined in 2 places. Thanks for any insight!!!

1 Like

Yeah you’re on the right lines, it would be a namespace thing rather than a redefinition. So more akin to multiple names assigned to the same object rather than a separate object, closer to the following example-

lst_a = [1, 2,] lst_b = lst_a print(lst_a is lst_b) lst_a.append("boo") # print from B... print(lst_b)

If you have a look at the .__module__ attribute of the DataFrame type you’ll probably see pandas.core.frame (may potentially differ by version). If you went looking for this module in site-packages or otherwise you’d be able to find an actual definition, class DataFrame(... in the module .../pandas/core/frame.py.

But this name is imported in other places too so something like the following evaluates to True
pd.pandas.core.api.DataFrame == pd.DataFrame == pd.pandas.core.frame.DataFrame. You could take that a step further and use is if you like which directly checks if the names refer to the same object (rather than just equivalence).

But it’s really not much different to doing something like DF = pd.DataFrame and using DF({'column': values}) (even the pd bit is already just a method of referring to a loaded module abstracted into a module object).

If you’re curious about imports and namespaces the python docs tutorial goes into the basic set-up of modules and packages: https://docs.python.org/3/tutorial/modules.html

But in short, the class is defined once in .../pandas/core/frame.py and is simply imported into other namespaces from there.

Thanks! Perfect answer. I did:

print(pandas.DataFrame.module)
pandas.core.frame

Now I’ll read the info you recommended to fully understand modules.
Thanks again. (Wow, learning Python is like peeling an onion…)

Found it. pandas.core.api imports DataFrame from pandas.core.frame. then pandas imports pandas.core.api.

1 Like

Thanks that worked, I hadn’t noticed that I was trying to import StringIO.

Thanks , that works!!!

When would it make sense to use dot notation to select columns instead of using bracket notation, after all bracket notation always works?

Type(file_name) refers to the series, the selected part, while type(df) refers to the whole DataFrame.