Python : Using np.where() with conditions

Hello! I am attempting to practice some hypothesis testing on a data set where one of the columns I’m trying to clean is gender. It looks like it was an open field where individuals were allowed to type in their gender. I am not too familiar with replacing 45+ different unique values, so I am making the task more manageable for myself by focusing on “Female”, “Male”, and anything else I’ll replace with “Other”. I have the following going on :
``
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.pyplot as plt
import numpy as np

data = pd.DataFrame(pd.read_csv(‘AnonymousSalarySurvey.csv’)

gender_labels = data[“Gender (optional)”].unique()

#Conditions for the column, not F or not M —> Other
conditions = (data[“Gender (optional)”] != “Female”) | (data[“Gender (optional)”] != “Male”)
New_Gender = data[“Gender (optional)”] = np.where( conditions , “Other”, data[“Gender (optional)”])
data[“Gender (optional)”] = New_Gender
``
Although it works and runs, I am not getting the desired output. Rather, my whole column becomes “Other”, instead of a mixture of “Female”, “Male”, and “Other”. I attempted using .replace, but it looks like it is not valid since I have my data in a pd.dataframe. How can I de-bug this line of code to make any string that is not Female or Male to be replaced with “Other” ?

Honestly, this is something that I wouldn’t personally clean. First, there’s no reason to and more so, it would show bias. I mean, to label a column “Other” is not appropriate.

Why not leave them as is? That would be a more accurate analysis of the data that you have. Or, maybe give the different categorical values a numerical value instead. Or, eliminate the column altogether(?)

I know you didn’t design the dataset. someone else did.
Some background on sex/gender questions in surveys:

https://gender.stanford.edu/news-publications/gender-news/more-inclusive-gender-questions-added-general-social-survey

1 Like

That is some pretty good insight ! Thank you Lisa for the article and the tips on ethics. Super important to consider and I had focused too much on making this line work, and too little on the ethics side.

1 Like