U.S. Medical Insurance Project - Feedback Welcomed!

Thanks for coming to share your portfolio Project with other learners!

When posting your project for review, please be sure to include the following:

  • Your review of the Project. Was it easy, difficult, just right?
  • An estimate of how long it took you to complete
  • The link to your code repo

Hi all,

I completed the US Medical Insurance project recently. I thought it was fairly straight forward, albeit with some difficult spots along the way. I’ve attached my GitHub link to the project. Let me know what you think of the other class methods I added.


U.S. Medical Insurance Costs
As this is my first portfolio project, it will likely be barbaric and rough. I am only adding these markdown sections to practice documentation of projects.

In this first part, I simply am trying to view the data in the CSV file; thus, I import the csv library and use a function to load the data within a "with" context manager to bring the data into a manipulable format.

In [1]:
import csv
import statistics as stats

# List for data:
ages = []
sexes = []
bmis = []
children = []
smoker = []
regions = []
charges = []

# Loader function
def load_data(lst, csv_file, column_name):
    with open(csv_file, newline='') as file:
        reader = csv.DictReader(file)
        for row in reader:
            lst.append(row[column_name])
        return lst
            
Now I will populate my lists to then use them for analysis.

In [2]:
load_data(ages, 'insurance.csv','age')
load_data(sexes, 'insurance.csv','sex')
load_data(bmis, 'insurance.csv','bmi')
load_data(children, 'insurance.csv','children')
load_data(smoker, 'insurance.csv','smoker')
load_data(regions, 'insurance.csv','region')
load_data(charges, 'insurance.csv','charges')

print(len(regions))
1338
Class Creation for Analysis of Data

In [62]:
class PatientsInfo:
    # Init method taking in each list
    def __init__(self, patient_ages, patient_sexes, patient_bmis, patient_num_children, patient_smoker_statuses,
                 patient_regions, patient_charges):
        self.patient_ages = patient_ages
        self.patient_sexes =  patient_sexes
        self.patient_bmis = patient_bmis
        self.patient_num_children = patient_num_children
        self.patient_smoker_statuses = patient_smoker_statuses
        self.patient_regions = patient_regions
        self.patient_charges = patient_charges
        
    # Avgerage Age
    def avg_age(self):
        total_age = 0
        
        for age in self.patient_ages:
            total_age += int(age)
        
        return (f'The average age for all {len(self.patient_ages)} patients is {round(total_age/len(self.patient_ages),2)}.')
        

    # Male vs Female representation
    def males_vs_females(self):
        male_count = 0
        female_count = 0
        
        for sex in self.patient_sexes:
            if sex == 'male':
                male_count += 1
            elif sex == 'female':
                female_count += 1
        
        percent_male = round(male_count/(male_count + female_count), 2) * 100
        percent_female = round(female_count/(male_count + female_count), 2) * 100
        
        return (f'''Of the {len(self.patient_sexes)} patients, males account for {percent_male}% while females account for {percent_female}%.''')

    # Unique Regions
    def unique_regions(self):
        unique_regions = []
        
        for region in self.patient_regions:
            if region not in unique_regions:
                unique_regions.append(region)
        return (f'The regions in this data are: {unique_regions}.')
        

    # Average Cost
    def avg_cost(self):
        total_cost = 0
        
        for cost in self.patient_charges:
            total_cost += float(cost)
        
        return (f'Average cost for all patients: ${round(total_cost/len(self.patient_charges),2)}.')

    # Create Dictionary
    def create_dictionary(self):
        self.patient_dict = {}
        self.patient_dict['age'] = self.patient_ages
        self.patient_dict['sex'] = self. patient_sexes
        self.patient_dict['bmi'] = self.patient_bmis
        self.patient_dict['children'] = self.patient_num_children
        self.patient_dict['smoker'] = self.patient_smoker_statuses
        self.patient_dict['region'] = self.patient_regions
        self.patient_dict['charges'] = self.patient_charges
        
        return self.patient_dict

    # Create list of lists
    def list_of_lists(self):
        self.patient_super_list = [self.patient_ages, self.patient_sexes, self.patient_bmis, self.patient_num_children,
                             self.patient_smoker_statuses, self.patient_regions, self.patient_charges]
        return self.patient_super_list
        
    # Average cost by region*
    def avg_cost_by_region(self):
        dictionary = self.create_dictionary()
        southwest = 0
        northwest = 0
        southeast = 0
        northeast = 0
        
        sw_count = 0
        nw_count = 0
        se_count = 0
        ne_count = 0
        
        # Total Cost by Region
        for i in range(0,(len(self.patient_regions)-1)):
            if dictionary['region'][i] == 'southwest':
                southwest += float(dictionary['charges'][i])
            elif dictionary['region'][i] == 'northwest':
                northwest += float(dictionary['charges'][i])
            elif dictionary['region'][i] == 'southeast':
                southeast += float(dictionary['charges'][i])
            else:
                northeast += float(dictionary['charges'][i])
                
        # Total Number of Regions
        for i in dictionary['region']:
            if i == 'southwest':
                sw_count += 1
            elif i == 'northwest':
                nw_count += 1
            elif i == 'southeast':
                se_count += 1
            else:
                ne_count += 1
        
            
        #counter = 0
        #while counter <= len(self.patient_regions):
            #if dictionary['region'][counter] == 'southwest':
             #   southwest += float(dictionary['charges'][counter])
              #  counter += 1
            #elif dictionary['region'][counter] == 'northwest':
             #   northwest += float(dictionary['charges'][counter])
              #  counter += 1
            #elif dictionary['region'][counter] == 'southeast':
             #   southeast += float(dictionary['charges'][counter])
              #  counter += 1
            #else:
             #   northeast += float(dictionary['charges'][counter])
              #  counter += 1
        
        
        return print(f'''The average cost by region is:
        Southwest: ${southwest/sw_count:.2f}
        Northwest: ${northwest/nw_count:.2f}
        Southeast: ${southeast/se_count:.2f}
        Northeast: ${northeast/ne_count:.2f}''')
    
    # Average cost by sex
    def avg_cost_by_sex(self):
        # dictionary to loop through
        dictionary = self.create_dictionary()
        
        # variables to hold integer numbers
        male_total_cost = 0
        male_count = 0
        
        female_total_cost = 0
        female_count = 0
        
        # loop to populate total cost by sex and male/female counter
        for i in range(0,(len(self.patient_sexes)-1)):
            if dictionary['sex'][i] == 'male':
                male_total_cost += float(dictionary['charges'][i])
                male_count += 1
            else:
                female_total_cost += float(dictionary['charges'][i])
                female_count += 1
        
        male_avg_cost = round(float(male_total_cost/male_count),2)
        female_avg_cost = round(float(female_total_cost/female_count),2)
        
        return print(f'''The average cost by sex is:
        Male: ${male_avg_cost}
        Female: ${female_avg_cost}
        
        Males pay on average {round(male_avg_cost/female_avg_cost - 1,2)*100}% more than females.''')
        
        
    # Average cost by BMI quartile
    def avg_cost_bmi_quartile(self):
        # Dictionary to loop through
        dictionary = self.create_dictionary()
        
        # Get bmis as integers
        int_bmis = []
        for bmi in self.patient_bmis:
            int_bmis.append(float(bmi))
        
        # Min and Max BMI
        min_bmi = min(int_bmis)
        max_bmi = max(int_bmis)
        
        # Quartiles
        quartiles = stats.quantiles(int_bmis, n=4)
        q1_total = 0
        q1_count = 0
        
        q2_total = 0
        q2_count = 0
        
        q3_total = 0
        q3_count = 0
        
        q4_total = 0
        q4_count = 0
        
        for i in range(0,(len(dictionary['bmi'])-1)):
            if float(dictionary['bmi'][i]) < quartiles[0]:
                q1_total += float(dictionary['charges'][i])
                q1_count += 1
            elif quartiles[1] > float(dictionary['bmi'][i]) >= quartiles[0]:
                q2_total += float(dictionary['charges'][i])
                q2_count += 1
            elif quartiles[2] > float(dictionary['bmi'][i]) >= quartiles[1]:
                q3_total += float(dictionary['charges'][i])
                q3_count += 1
            else:
                q4_total += float(dictionary['charges'][i])
                q4_count += 1
        
        print(f"""
        Quartiles: {quartiles}
        Min: {min_bmi}
        Max: {max_bmi}
        
        Quartile 1 Average: ${q1_total/q1_count:.2f}
        Quartile 2 Average: ${q2_total/q2_count:.2f}
        Quartile 3 Average: ${q3_total/q3_count:.2f}
        Quartile 4 Average: ${q4_total/q4_count:.2f}""")
        
        


# INITIALIZE INSTANCE
patients_info = PatientsInfo(ages, sexes, bmis, children, smoker, regions, charges)
Average Age Class Function:

In [27]:
patients_info.avg_age()
Out[27]:
'The average age for all 1338 patients is 39.21.'
Male/Female Representation Class Function:

In [28]:
patients_info.males_vs_females()
Out[28]:
'Of the 1338 patients, males account for 51.0% while females account for 49.0%.'
Unique Region Class Function:

In [29]:
patients_info.unique_regions()
Out[29]:
"The regions in this data are: ['southwest', 'southeast', 'northwest', 'northeast']."
Average Cost Class Function:

In [30]:
patients_info.avg_cost()
Out[30]:
'Average cost for all patients: $13270.42.'
In [31]:
patients_dictionary = patients_info.create_dictionary()

#print(patients_dictionary['age'][1])
Patient Dictionary Class Function:

Create List of Lists (for further analysis). This will preserve the order so I can more easily compare across different metrics.

In [32]:
patients_super_list = patients_info.list_of_lists()

#print(patients_super_list[1][1])
Average Cost by Region Class Function

In [33]:
patients_cost_by_region = patients_info.avg_cost_by_region()
The average cost by region is:
        Southwest: $12346.94
        Northwest: $12327.91
        Southeast: $14735.41
        Northeast: $13406.38
Average Cost by Sex

In [34]:
patients_info.avg_cost_by_sex()
The average cost by sex is:
        Male: $13956.75
        Female: $12544.51
        
        Males pay on average 11.0% more than females.
Average Cost By BMI Quartile:

In [63]:
patients_info.avg_cost_bmi_quartile()
        Quartiles: [26.2725, 30.4, 34.7]
        Min: 15.96
        Max: 53.13
        
        Quartile 1 Average: $10308.42
        Quartile 2 Average: $11394.36
        Quartile 3 Average: $14323.55
        Quartile 4 Average: $16987.94
In [ ]: