Error in Introduction to Data Wrangling and Tidying

Hello, I believe I spotted an error in the description for the Introduction to Data Wrangling and Tidying introduction.

https://www.codecademy.com/paths/data-science/tracks/dscp-data-wrangling-and-tidying/modules/dscp-fundamentals-of-data-wrangling-and-tidying/articles/intro-data-wrangling-and-tidying

The topic that is being discussed is changing the missing Longitude and Lattitude (0.000) with NaN. How ever I believe the code block is incorrectly using the less then and greater than symbols. You can see in the dataframe all the values are greater then 40 for latitude would be replaced with NaN. This is also true for longitude -73 is < -70 and 0 is greater then -70. I believe these need to be switched in the introduction.
Below is an image of the data frame

This is the example code that is given:

# here our .where() function replaces latitude values greater than 40 with NaN values restaurants['latitude'] = restaurants['latitude'].where(restaurants['latitude'] > 40, np.nan) # here our .where() function replaces longitude values less than -70 with NaN values restaurants['longitude'] = restaurants['longitude'].where(restaurants['longitude'] < -70, np.nan) # .sum() counts the number of missing values in each column restaurants.isna().sum()

If you view the docs for pandas.Series.where: pandas.Series.where — pandas 1.3.1 documentation it actually keeps all the values which evaluate to True and replaces all the values that evaluate to False. That would not have been my first thought either but that’s how it works :neutral_face:.

1 Like