How can i standardize two columns of different dataframes?
I want to explore the relationship between Income
and Productivity
. But these two columns are from different dataframes and different scales
Here is the my code
<import codecademylib3
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np
load in financial data
financial_data = pd.read_csv(âfinancial_data.csvâ)
code goes here
print(financial_data.head())
storing each variables
months = financial_data[âMonthâ]
revenues = financial_data[âRevenueâ]
expenses = financial_data[âExpensesâ]
creating plot of revenue over past six months
plt.plot(months,revenues)
plt.xlabel(âMonthâ)
plt.ylabel(âAmount ($)â)
plt.title(âRevenueâ)
plt.show()
creating plot of expenses over last six months
plt.clf()
plt.plot(months, expenses)
plt.xlabel(âMonthâ)
plt.ylabel(âAmount ($)â)
plt.title(âExpensesâ)
plt.show()
load in expenses data
expenses_overview = pd.read_csv(âexpenses.csvâ)
print(expenses_overview.head(7))
expense_categories = [âSalariesâ, âAdvertisingâ, âOffice Rentâ, âOtherâ]
proportions = [0.62, 0.15, 0.15, 0.08]
creating piechart of different expense categories
plt.clf()
plt.pie(proportions, labels = expense_categories)
plt.title(âExpense Categoriesâ)
plt.axis(âEqualâ)
plt.tight_layout()
plt.show()
load in employee data
employees = pd.read_csv(âemployees.csvâ)
print(employees.head())
Sort dataframe by Productivity column
sorted_productivity = employees.sort_values(by = [âProductivityâ])
print(sorted_productivity)
Storing first 100 rows of sorted_productivity
employees_cut = sorted_productivity.head(100)
print(employees_cut)
calculating average commute time
commute_times = employees[âCommute Timeâ]
commute_times_log = np.log(commute_times)
print(commute_times.describe())
making histogram of commute time
plt.clf()
plt.hist(commute_times_log)
plt.title(âEmployee Commute Timesâ)
plt.xlabel(âCommute Timeâ)
plt.ylabel(âFrequencyâ)
plt.show()
exploring relationship between Income and Productivity
standardizing
productivities = employees[âProductivityâ]
scaler = StandardScaler()
standardized_productivity = scaler.fit_transform(productivities)
standardized_revenue = scaler.fit_transform(revenues)
But this code of standardization doesnât work.