The question I have it is about the exercise of analysing a survey of developers in stack overflow. This project is for improving your all that wrangling and cleaning data. Here you have the link:
I have understood all the proccess, but suddenly, something wasn’t right for me.
# Sort by ID and Year so that each person's data is carried backwards correctly df = df.sort_values(['RespondentID','Year']) df['UndergradMajor'].bfill(axis=0, inplace=True)
In this bit of code, we’re sorting values for each respondent and the year, then we create a NOCB for replacing Nan values. But my question is: You’re assuming that you’ll fill backwards to the same person, but you might not if that person did the survey once with a Nan undergradMajor, therefore that person will fill with a value that is from another person, am I in the right thought?