Please Review my Portfolio Project: US Medicaul Insurance Cost

It would be lovely, if you would like to suggest new ideas for it
Portfolio Project GitHub

import csv

# Initialization of the empty dictionary for each column
data_dict = {
    "ages": [],
    "sexes": [],
    "bmis": [],
    "childrens": [],
    "smokers": [],
    "regions": [],
    "charges": []
}

# Open the CSV file using DictReader
with open("insurance.csv", "r") as insurance_file:
    insurance_reader = csv.DictReader(insurance_file)
    
    for row in insurance_reader:
        data_dict["ages"].append(int(row["age"]))
        data_dict["sexes"].append(row["sex"])
        data_dict["bmis"].append(float(row["bmi"]))
        data_dict["childrens"].append(int(row["children"]))
        data_dict["smokers"].append(row["smoker"])
        data_dict["regions"].append(row["region"])
        data_dict["charges"].append(float(row["charges"]))

# Function for calculating the length of any list
def len_of_list(lst):
    return len(lst)

# Function for calculating the average value of a list
def avg_value(lst):
    return sum(lst) / len(lst)

# Printing the average values for each numerical column
avg_ages = round(avg_value(data_dict["ages"]))
print("Average Age:", avg_ages)

avg_bmis = round(avg_value(data_dict["bmis"]), 2)
print("Average BMI:", avg_bmis)

avg_children = round(avg_value(data_dict["childrens"]))
print("Average Number of Children:", avg_children)

avg_charges = round(avg_value(data_dict["charges"]), 2)
print("Average Insurance Charge:", avg_charges)

# Function to count the occurrences of values in a list
def count_occurrences(lst, value):
    return lst.count(value)

# Counting smokers and non-smokers
total_smokers = count_occurrences(data_dict["smokers"], "yes")
total_non_smokers = count_occurrences(data_dict["smokers"], "no")
print("Total Smokers:", total_smokers)
print("Total Non-Smokers:", total_non_smokers)

# Counting males and females
total_males = count_occurrences(data_dict["sexes"], "male")
total_females = count_occurrences(data_dict["sexes"], "female")
print("Total Males:", total_males)
print("Total Females:", total_females)

# Function to create a dictionary with the count of each unique value
def create_value_count_dict(lst):
    value_count_dict = {}
    for item in lst:
        if item in value_count_dict:
            value_count_dict[item] += 1
        else:
            value_count_dict[item] = 1
    return value_count_dict

# Creating dictionaries for regions, smokers, and sexes
region_counts = create_value_count_dict(data_dict["regions"])
print("Region Counts:", region_counts)

smoker_counts = create_value_count_dict(data_dict["smokers"])
print("Smoker Counts:", smoker_counts)

sex_counts = create_value_count_dict(data_dict["sexes"])
print("Sex Counts:", sex_counts)

# Function to calculate the difference in average charges between two groups
def calculate_charge_difference(group1, group2, charges):
    total_group1_charges = sum(charges[i] for i, val in enumerate(group1) if val)
    total_group2_charges = sum(charges[i] for i, val in enumerate(group2) if val)
    avg_group1_charge = total_group1_charges / len(group1)
    avg_group2_charge = total_group2_charges / len(group2)
    return round(avg_group1_charge - avg_group2_charge, 2)

# Calculating the difference in average charges between smokers and non-smokers
charge_difference = calculate_charge_difference(data_dict["smokers"], ["no"] * len(data_dict["smokers"]), data_dict["charges"])
print("Average Charge Difference (Smokers - Non-Smokers):", charge_difference)

# Function to calculate the average age of individuals with a specified number of children
def average_age_with_children(children_count, ages, childrens):
    relevant_ages = [ages[i] for i, count in enumerate(childrens) if count == children_count]
    return round(avg_value(relevant_ages))

# Calculating the average age of individuals with one child
average_age_one_child = average_age_with_children(1, data_dict["ages"], data_dict["childrens"])
print("Average Age of Individuals with One Child:", average_age_one_child)

Can you add the notebook file to a GitHub repo so people can review it and see the output of the code?

1 Like

I just added the link to my GitHub. It’s the start of my portfolio. I saw one with pandas and so on. And I’m going to add the visualization to the end of the next module.

1 Like

HI,

A nicely performed project.

I liked how you used the enumerate() function.

I think it will add value to put a comment at the top of the file, describing what you are doing or about to do in this whole file.

In the calculate_charge_difference() function it is not clear what do you mean by the word “group”. There is a need to explain this term, in your file.

1 Like

Congrats on finishing the project.

Some thoughts/considerations:

  • It would be nice to see some of the output of the code–like, the mean/median age, charges, how many smokers, how many men vs. women, how many people in each region, etc. Basic descriptive stats, etc. For example, is mean the most useful stat when it comes to the column charges? What is the spread of the data? Are there any outliers (min/max) that would pull the mean? Perhaps median is a better option.

  • This project isn’t just about writing the code to look at the data to analyze it (which you did well!), it’s also a presentation of the data and it should read like that. Consider adding a brief intro (cite where the data came from, etc.) and a conclusion to wrap up what you found and any potential shortfalls, and/or future questions you might have. Remember, you’re telling a story to an audience (who may or may not be technically-inclined).

  • In re: the section where you have two lists of first and last names: Health info like this-even if it’s just a made up dataset- wouldn’t have any names attached to it b/c of HIPAA.

  • Keep in mind: you’re just sifting through the data to analyze it and can’t positively claim that there are any correlations between the variables b/c you haven’t done any statistical tests.

Keep at it! :technologist:

2 Likes

Thanks! I will try to explain it better! I really value the time you took to take a look to my code. And provide me with some insight!

Thanks a lot!

Thanks for taking your time! I will try to do as you suggested!

Regarding about the two new list, my idea was to test, if I was able to create a “Random” column for a first_name and last_name, just to test the code. And yes you are right, if real-life case scenario, there will not be any personal information about the people!

Again thanks!

1 Like