Hello! I am attempting to practice some hypothesis testing on a data set where one of the columns I’m trying to clean is gender. It looks like it was an open field where individuals were allowed to type in their gender. I am not too familiar with replacing 45+ different unique values, so I am making the task more manageable for myself by focusing on “Female”, “Male”, and anything else I’ll replace with “Other”. I have the following going on :
``
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.pyplot as plt
import numpy as np
data = pd.DataFrame(pd.read_csv(‘AnonymousSalarySurvey.csv’)
gender_labels = data[“Gender (optional)”].unique()
#Conditions for the column, not F or not M —> Other
conditions = (data[“Gender (optional)”] != “Female”) | (data[“Gender (optional)”] != “Male”)
New_Gender = data[“Gender (optional)”] = np.where( conditions , “Other”, data[“Gender (optional)”])
data[“Gender (optional)”] = New_Gender
``
Although it works and runs, I am not getting the desired output. Rather, my whole column becomes “Other”, instead of a mixture of “Female”, “Male”, and “Other”. I attempted using .replace, but it looks like it is not valid since I have my data in a pd.dataframe. How can I de-bug this line of code to make any string that is not Female or Male to be replaced with “Other” ?