Stack overflow survey project. NOCB technique

Hello all!

The question I have it is about the exercise of analysing a survey of developers in stack overflow. This project is for improving your all that wrangling and cleaning data. Here you have the link:

I have understood all the proccess, but suddenly, something wasn’t right for me.

# Sort by ID and Year so that each person's data is carried backwards correctly
df = df.sort_values(['RespondentID','Year'])
df['UndergradMajor'].bfill(axis=0, inplace=True)

In this bit of code, we’re sorting values for each respondent and the year, then we create a NOCB for replacing Nan values. But my question is: You’re assuming that you’ll fill backwards to the same person, but you might not if that person did the survey once with a Nan undergradMajor, therefore that person will fill with a value that is from another person, am I in the right thought?

Thank you!


Before sorting you look at the missing values for each year and see that the year 2020 has no missing values. This means that there always is a value for each individual person to fill backwards with.