I’m working on the jeopardy task (https://www.codecademy.com/journeys/data-scientist-aly/paths/dsalycj-22-data-science-foundations-ii/tracks/dsalycj-22-pandas-for-data-science/modules/dsf-data-manipulation-challenge-project-6cc2e59c-5bbc-46e7-bf81-3c88ace67247/projects/this-is-jeopardy) which starts with loading in a provided csv that is “unclean”
The first issue i started to tackle was the white space in the column headers… and i’ve come across something that has baffled me.
df = pd.read_csv('jeopardy.csv')
for col in df.columns:
df[col].rename(str.strip(col), inplace = True)
print(df.columns)
for col in df.columns:
print("|"+ df[col].name +"|")
output
Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
' Question', ' Answer'],
dtype='object')
|Show Number|
|Air Date|
|Round|
|Category|
|Value|
|Question|
|Answer|
renaming each of the series has done nothing to change the dataframe columns, despite the individual series names changing when you output them ( “|” added to the prints to show the white space removed clearly)…
are the df.columns and series names not the same thing??
whilst trying to find an answer i found a much simpler way of clearing white space from the column headers, which appears to update the df.columns and each series name at the same time, but it doesn’t help me understand the relationship between df.columns and series.name…
df.columns = df.columns.str.strip()
print(df.columns)
for col in df.columns:
print("|"+ df[col].name +"|")
output
Index(['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question',
'Answer'],
dtype='object')
|Show Number|
|Air Date|
|Round|
|Category|
|Value|
|Question|
|Answer|