Hi, I want to ask regarding python index and Data Frame
Given DataFrame:
A B C
0 1 NaN 3
1 4 5 NaN
2 7 8 9
if we do this: diabetes_data.isnull()
it will:
A B C
0 False True False
1 False False True
2 False False False
However, this is the part that I dont understand when we index it to the original DF.
diabetes_data[diabetes_data.isnull()]
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
can someone please explain the logic or flow of this. I dont really understand why suddenly all values become NaN. dont understand the part when isnull being index to original Df, suddenly the result become NaN all.
Thank you
The syntax for .isnull()
is:
df.isnull()
OR,
df.isnull().sum()
OR,
df.isnull().sum().sum()
Or, on a particular column:
df.col_name.isnull()
Which detects NaN (which are actually floating point values) values in a df and returns a boolean value. .isnull()
is the method and you’re calling it on the pandas series, in this case a data frame. As noted above it also works on columns.
What would be the purpose of writing: ‘diabetes_data[diabetes_data.isnull()]’ and pass the data frame object back to the data frame itself?
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isnull.html
I am trying to understand the logic of diabetes_data[diabetes_data.isnull()]. like what happened? why can suddenly the value become all NaN
.isnull()
is the method being called on the df and it then creates a boolean object…and then you’ve used brackets, so it’s being used as an argument on the df and the results are NaN. The brackets [ ]
create a series object. The brackets are an index operator as well.
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html