How to create the "Others" label in a pie chart more efficiently?

In the 10th step on Startup Transformation Project (Data Scientist Career Path), I am asked to change a previous pie chart creating the “Others” label for values that represent less than 15% of the total. I initially thought of creating a mask by the following code:

mask = expense_overview.isin(expense_overview['Proportion'][expense_overview['Proportion'] < 0.15 ].index)

expense_overview['Ohter'] = mask

Just like Codecademy showed in the previous lesson. However, I keep getting a big error message on line 58 and I have no clue what is happening (I have already changed my code and also have used StackOverflow without success).

Then I tried to apply the where function to see if I could get what a wanted: replace the values of the first column of my data frame (labels) with “Others” when they match the condition < 0.15. But then the reverse occurred: values above 0.15 were replaced and the ones below 0.15 were not. I tried to perform other changes like using the negative condition (> 0.15) but it didn’t work out.

I completed my task simply changing my labels and values manually, like this:

expense_categories = ['Salaries', 'Advertising', 'Office Rent', 'Other']
proportions = [0.62, 0.15, 0.15, 0.08]

But I am not happy at all with this code since it is far from being automated and intelligent. Any tips on how I can create the “Others” label cleverly?

1 Like

Hi,
I calculated the sum of those that is less than 5%
made a cut for those that is over 5% and added the sum

sum = expenses_overview.Proportion[expenses_overview.Proportion < 0.05].sum()
df_new = expenses_overview[expenses_overview.Proportion > 0.05].append({‘Expense’: ‘Others’,‘Proportion’: sum}, ignore_index=True)

Agreed! I think the official solution offered by hard coding “Others” is not a responsible solution. The reason why we learn coding is to avoid as much as “hardcoding” (manual process) as possible. I hope the program editor could really incorporate previous sessions so the learnings are more progressive vs independent block-based. I incorporated lambda and groupby functions but glad to see if someone else has more elegant solution.

#create a new column "New Category" that groups categories making up less than 5% into "Others"
expense_overview["New_Category"]=expense_overview.apply(lambda row: "Others" \
if row["Proportion"]<=0.05 else row["Expense"], axis=1)

#create a new dataframe grouped by the "New Category""
new_expense=expense_overview.groupby(["New_Category"])['Proportion'].sum().reset_index()

This should output a new dataframe as below:

New_Category Proportion
0 Advertising 0.15
1 Office Rent 0.15
2 Others 0.08
3 Salaries 0.62

1 Like