Hello all!
The question I have it is about the exercise of analysing a survey of developers in stack overflow. This project is for improving your all that wrangling and cleaning data. Here you have the link:
https://www.codecademy.com/paths/data-analyst/tracks/dsf-data-wrangling-cleaning-and-tidying/modules/dsf-handling-missing-data/articles/missing-data-project-stack-overflow-survey-trends
I have understood all the proccess, but suddenly, something wasn’t right for me.
# Sort by ID and Year so that each person's data is carried backwards correctly
df = df.sort_values(['RespondentID','Year'])
df['UndergradMajor'].bfill(axis=0, inplace=True)
In this bit of code, we’re sorting values for each respondent and the year, then we create a NOCB for replacing Nan values. But my question is: You’re assuming that you’ll fill backwards to the same person, but you might not if that person did the survey once with a Nan undergradMajor, therefore that person will fill with a value that is from another person, am I in the right thought?
Thank you!