Understanding drop_column

I am trying to remove columns from my database to demonstrate that I understand how to drop columns. I look at the syntax and I followed it. I have not been able to solve the errors above the code and have already posted a question for each of them. While I am awaiting help I have tried to do the drop commands.

Link to Codecademy Forums: https://www.codecademy.com/paths/learn-python-for-data-science/tracks/intro-to-python-for-data-science-lpfds/modules/cleaning-and-transforming-columns/lessons/cleaning-and-transforming-columns/exercises/renaming-and-removing-columns

Link to my repository: GitHub - strikeouts27/jupyter-data_scientist_salary_projects: An analysis about how data scientist have been compensated in 2022

What steps have I taken to solve the problem?

I have looked up the error message to start. According to real python a Key Error is when a value does not exisit in the dictionary.

I look at my data set and it clearly shows that those columns exisit. So I am confused as to why python cannot see it.

Link to real python : Python KeyError Exceptions and How to Handle Them – Real Python

Traceback:

KeyError                                  Traceback (most recent call last)
File ~/anaconda3/lib/python3.11/site-packages/pandas/core/indexes/base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:

File ~/anaconda3/lib/python3.11/site-packages/pandas/_libs/index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File ~/anaconda3/lib/python3.11/site-packages/pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('salary_currency', 'worthless_column')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[285], line 1
----> 1 drop_columns = df['salary_currency', 'worthless_column']
      2 df = df.drop(labels = drop_columns,
      3             axis=1)
      4 print(df)

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py:3761, in DataFrame.__getitem__(self, key)
   3759 if self.columns.nlevels > 1:
   3760     return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
   3762 if is_integer(indexer):
   3763     indexer = [indexer]

File ~/anaconda3/lib/python3.11/site-packages/pandas/core/indexes/base.py:3655, in Index.get_loc(self, key)
   3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.
   3660     self._check_indexing_error(key)

KeyError: ('salary_currency', 'worthless_column')


For df.drop() , one parameter, in_place = the default value of that parameter is set to False, so when you use df.drop() w/o specifying, it returns a copy of the df (the original is not changed).
If you used inplace = True, then that alters the original df and those cols will be removed but nothing is returned.

See:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

A key error is raised if the label isn’t found in the axis. I don’t see the column, 'worthless_column' in your notebook. (then again, I might need more coffee).

So, did you alter the original df and then try to use df.drop() again, or did you rename the columns in a cell before that?

1 Like

in the box beforehand on cell 72 I had done a rename of the column. I thought that would have changed the dataframe.

I look at the documenation link you gave me and it looks like the syntax is completley different than what codecademy is teaching.

Codecademys example;

drop_columns = ['ParkType']
parks = parks.drop(
  labels=drop_columns, 
  axis=1)

I managed to use the docs and omg I am data scientist pro now! lol.

it seems I need to go ahead and specify all the values. It also learned that the renaming of salary to worthless column did happen on the last cell and it is impacting things.

Thank you for pointing out the documentation to me. I can utilize this as part of my troubleshooting method! Thank you lisalisaj.

1 Like

Yep, always check the documentation.

.rename() also has the inplace=False parameter as the default. It makes a copy of the df and returns it. So, when you made that change in the previous cell, you didn’t alter the original df.
Which is why you got the “column doesn’t exist” error later on. You can spot check this by running a df.columns.

1 Like

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.