Data Transformation with Python

import codecademylib3

from sklearn import preprocessing

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sns

import numpy as np

load in financial data

financial_data = pd.read_csv(‘financial_data.csv’)

expense_overview = pd.read_csv(‘expenses.csv’)

#print(expense_overview.head(7))

expense_categories = expense_overview[‘Expense’]

proportions = expense_overview[‘Proportion’]

#expense_categories= [‘Salaries’, ‘Advertising’, ‘Office Rent’, ‘Other’]

mask = expense_overview.isin(proportions[proportions < 0.05].index)

expense_overview[mask] = “Other”

print(expense_overview[‘Proportion’].value_counts())

I am trying to collapse columns that have a proportion of less than 0.05 and name that category 'other but it’s not working.

I get this error:
Traceback (most recent call last):
File “script.py”, line 17, in
expense_overview[mask] = “Other”
File “/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py”, line 3482, in setitem
self._setitem_frame(key, value)
File “/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py”, line 3528, in _setitem_frame
self._check_inplace_setting(value)
File “/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py”, line 5305, in _check_inplace_setting
"Cannot do inplace boolean setting on "
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

I am also struggling with using standardization

import codecademylib3

from sklearn import preprocessing

from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt

import pandas as pd

import seaborn as sns

import numpy as np

employees = pd.read_csv(‘employees.csv’)

productivity = employees[‘Productivity’]

#standardization

salary = employees[‘Salary’]

#print(salary.describe())

#print(employees.head())

scaler = StandardScaler()

standardized_employees = scaler.fit_transform(employees)

print(standardized_employees.head())

plt.plot(standardized_employees)
plt.show()

This the code I had put but it is not working. Here is the url of the exercise
https://www.codecademy.com/paths/data-analyst/tracks/dacp-summary-statistics/modules/dacp-data-transformation/projects/data-transformation-project

The TypeError has an issue with your usage of .mask().

Is there another way you could accomplish this ?
Couldn’t you just update the proportions list value to update the pie chart? The question asks one to update the pie chart not amend the df.

1 Like