Pandas series.rename not updating dataframe.column names

I’m working on the jeopardy task (https://www.codecademy.com/journeys/data-scientist-aly/paths/dsalycj-22-data-science-foundations-ii/tracks/dsalycj-22-pandas-for-data-science/modules/dsf-data-manipulation-challenge-project-6cc2e59c-5bbc-46e7-bf81-3c88ace67247/projects/this-is-jeopardy) which starts with loading in a provided csv that is “unclean”

The first issue i started to tackle was the white space in the column headers… and i’ve come across something that has baffled me.

df = pd.read_csv('jeopardy.csv')

for col in df.columns:
    df[col].rename(str.strip(col), inplace = True)

print(df.columns)
for col in df.columns:
    print("|"+ df[col].name +"|")

output

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')
|Show Number|
|Air Date|
|Round|
|Category|
|Value|
|Question|
|Answer|

renaming each of the series has done nothing to change the dataframe columns, despite the individual series names changing when you output them ( “|” added to the prints to show the white space removed clearly)…

are the df.columns and series names not the same thing??

whilst trying to find an answer i found a much simpler way of clearing white space from the column headers, which appears to update the df.columns and each series name at the same time, but it doesn’t help me understand the relationship between df.columns and series.name…

df.columns = df.columns.str.strip()
print(df.columns)
for col in df.columns:
   print("|"+ df[col].name +"|")

output

Index(['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')
|Show Number|
|Air Date|
|Round|
|Category|
|Value|
|Question|
|Answer|

to remove white space from column names you can use lstrip() or rstrip(), depending on where the whitespace is. Something like:

jeopardy_data.columns = jeopardy_data.columns.str.lstrip()
print(jeopardy_data.head())

See:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.lstrip.html

To rename columns, you’d use .rename() No need to use a for loop.
The general syntax is:
jeopardy_data.rename(columns = {'old name': 'new name', 'old name': 'new name', etc}, inplace = True)

See the docs:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html

Thanks for the reply. As i originally posted i found the strip that can be used for the dataframe which achieves the desired result.

The hard coded rename feels “wrong” since you have explicitly hard code it for each dataframe you want to clean up. The thinking behind the for was to clean any given dataframe (before i found the one line that rules them all :smiley: ).

But neither address the question raised.

Why did renaming the series not alter the df.columns? what is the relationship between series.name and dataframe.columns?

You’re asking two questions and I’m sorry, I didn’t separate out the two questions you’d asked.

Series vs. DF:

  • A series is one-dimensional object (or I guess list of any type of data) and the first col. of a series is the index.
  • A df is a two-dimensional structure, or I guess “container” of sorts for series, consisting of multiple rows and columns.

But that’s not why the renaming of the columns didn’t “stick”.

Making changes to the DF:
If you want to modify the original dataframe say, for example using .rename( ) you have to use the parameter inplace = True which changes the df. If that parameter is set to False or not included at all, then a copy is made and the original df is not changed.
Always check the docs if you’re unsure about what parameters exist for methods.

I guess you have to decide if you want to make a copy of the df or change the original. It’s up to you.