OK Cupid project - Need help creating a new dataframe column, based on conditional values of strings

I’m on the OK Cupid project. I’m spinning my wheels for days now. Regarding the diet column, which is made up of strings, such as:

print(profiles.diet.value_counts())

other 24726
mostly anything 16585
anything 6183
strictly anything 5113
mostly vegetarian 3444
mostly other 1007
strictly vegetarian 875
vegetarian 667
strictly other 452
mostly vegan 338
strictly vegan 228
vegan 136
mostly kosher 86
mostly halal 48
strictly halal 18
strictly kosher 18
halal 11
kosher 11

I’d like to clean this column, and create a new column ( called data_cleaned ). I do NOT want all of these 18 values in my new column. Is that possible to just bring some of these values to my new column ( data_cleaned) , but not all ? I want to only grab the values where the diet is stricter, … “kosher”, “halal”, “strictly kosher”, “strictly halal”, “mostly kosher”, “vegan”, “strictly vegan”, “mostly vegan”, “vegetarian” and “strictly vegetarian” and just put those in to my new column (data_cleaned).

Anyone have any suggestions?

Thank you.

I ended up doing this. Copy the old column in to the new column (diet_cleaned), then changing all strict diets to halal, kosher, vegan, or vegetarian. And also changing the “non-strict diets” to “other”. Here is the solution to my above question:

Now the names of the values will be changed, in the new diet_cleaned column, to represent stricter diets vs other, or no special diets.

#The stricter diets

profiles[‘diet_cleaned’] = profiles[‘diet’]
profiles.replace ({‘diet_cleaned’: {‘mostly halal’: ‘halal’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘mostly kosher’: ‘kosher’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘mostly vegan’: ‘vegan’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘mostly vegetarian’: ‘vegetarian’}}, inplace = True)

profiles.replace ({‘diet_cleaned’: {‘strictly halal’: ‘halal’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘strictly kosher’: ‘kosher’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘strictly vegan’: ‘vegan’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘strictly vegetarian’: ‘vegetarian’}}, inplace = True)

The non-strict diets

profiles.replace ({‘diet_cleaned’: {‘mostly anything’: ‘other’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘mostly other’: ‘other’}}, inplace = True)

profiles.replace ({‘diet_cleaned’: {‘strictly anything’: ‘other’}}, inplace = True)
profiles.replace ({‘diet_cleaned’: {‘strictly other’: ‘other’}}, inplace = True)

profiles.replace ({‘diet_cleaned’: {‘anything’: ‘other’}}, inplace = True)

#There are now only 5 values, instead of 18 for the data_cleaned column.
profiles.diet_cleaned.value_counts()
diet_cleaned
other 54066
vegetarian 4986
vegan 702
kosher 115
halal 77

#Then exclude “other” while creating the graph.

plt.figure(figsize=(10,7))

!= “other” will exclude any value with the string “other” from the diet_cleaned column

sns.countplot(data=profiles.loc[profiles[‘diet_cleaned’]!=“other”], x=‘diet_cleaned’, color=‘cyan’)

ax.set_xlabel(‘users with stricter diets’, fontsize = 16)
ax.set_ylabel(‘count’, fontsize = 16)
ax.set_xticklabels(ax.get_xticklabels(), fontsize=9, rotation=40, ha=“right”)
plt.tight_layout()
plt.show()

After excluding the string “other”, the graph shows only 4 values in the x : vegetarian, vegan, halal, kosher,

which is exactly what I want.

This is in progress work for the OK Cupid python project. Start at about line 168.

I do not understand how I need to “reshape” the data. Anyone know how to fix this? The error says that my “array shape” is wrong.

Is the dimensionality of the data the same as the training data (x)? (same number of rows, cols).

Use, print(test.shape) to see the number of rows, cols to verify.