Healthcare in Different States – Boxplot Project

Just finished this project in the Data Analyst Career Path:
https://www.codecademy.com/paths/data-analyst/tracks/dacp-summary-statistics/modules/dscp-quartiles-quantiles-and-interquartile-range/projects/healthcare-in-different-states

I found that sorting the datasets prior to creating all of the box-plots was really useful for exploring the data. Passing a lambda function to the sort method’s key keyword argument, you can sort by any return value from the function. This was my code for sorting the boxplots by their median value from lowest (left) to highest (right):

import codecademylib3_seaborn
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

healthcare = pd.read_csv("healthcare.csv")
print(healthcare["DRG Definition"].unique())

procedure = healthcare[healthcare['DRG Definition'] ==  '238 - MAJOR CARDIOVASC PROCEDURES W/O MCC']

states = procedure['Provider State'].unique()

datasets = []
for state in states:
  state_chest_pain_costs = procedure[procedure['Provider State'] == state][' Average Covered Charges '].values
  datasets.append(state_chest_pain_costs)

median = lambda x: np.median(x)
datasets.sort(key = median)

plt.figure(figsize = (20, 6))
plt.boxplot(datasets, labels=states)
plt.show()