Data Science Path: Python Portfolio Project US Medical Insurance Cost

For this project, I was trying to create a dictionary with the keys ranges from 1 to n number of rows in the excel file, the values being a dict of all columns pertaining to each person (ex. {1: {‘Age’: ‘19’, ‘Sex’: ‘female’, ‘bmi’: ‘27.9’, ‘Children’: ‘0’, ‘Smoker’: ‘yes’, ‘Region’: ‘southwest’, ‘Charge’: 16884.924}, {2:{…}}

def insured_dict_func():
insured_dict = {}
for i in range(1, len(rows) +1):
insured_dict[i] = ind_insured_lst[i-1]
return insured_dict
print(insured_dict_func())

Then I wanted to create a function that organize the costs by regions so then I can find the highest average cost by region. ex. {‘southwest’: [16884.924, charge2, charge3…chargen],…‘last region’: [charge1, charge2…chargen]}
After running my code, I realize all the charges are being added to only one region, so the dictionary only has 1 key/region.
I have been looking at this all day and can’t figure out what is wrong. Any help would be greatly appreciated. Thank you!

def charges_by_region(insured_dict_func):
charges_by_region_dict = {}
for ind in insured_dict_func.keys():
#create an empty lst to store the charges by region
lst_of_charges_by_region =
current_charge = insured_dict_func[ind][‘Charge’]
current_region = insured_dict_func[ind][‘Region’]
if current_region not in charges_by_region_dict.keys():
lst_of_charges_by_region.append(current_charge)
charges_by_region_dict[current_region] = lst_of_charges_by_region
elif current_region in charges_by_region_dict.keys():
charges_by_region_dict[current_region].append(current_charge)
return charges_by_region_dict
print(charges_by_region(insured_dict_func()))

Hi, can you please format your code?

2 Likes

For this project, I was trying to create a dictionary with the keys ranges from 1 to n number of rows in the excel file, the values being a dict of all columns pertaining to each person (ex. {1: {‘Age’: ‘19’, ‘Sex’: ‘female’, ‘bmi’: ‘27.9’, ‘Children’: ‘0’, ‘Smoker’: ‘yes’, ‘Region’: ‘southwest’, ‘Charge’: 16884.924}, {2:{…}}

def insured_dict_func():
insured_dict = {}
for i in range(1, len(rows) +1):
insured_dict[i] = ind_insured_lst[i-1]
return insured_dict
print(insured_dict_func())

Then I wanted to create a function that organize the costs by regions so then I can find the highest average cost by region. ex. {‘southwest’: [16884.924, charge2, charge3…chargen],…‘last region’: [charge1, charge2…chargen]}
After running my code, I realize all the charges are being added to only one region, so the dictionary only has 1 key/region.
I have been looking at this all day and can’t figure out what is wrong. Any help would be greatly appreciated. Thank you!

def charges_by_region(insured_dict_func):
   charges_by_region_dict = {}
   for ind in insured_dict_func.keys():
         #create an empty lst to store the charges by region
         lst_of_charges_by_region = []
         current_charge = insured_dict_func[ind][‘Charge’]
         current_region = insured_dict_func[ind][‘Region’]
         if current_region not in charges_by_region_dict.keys():
             lst_of_charges_by_region.append(current_charge)
             charges_by_region_dict[current_region] = lst_of_charges_by_region
         elif current_region in charges_by_region_dict.keys():
             lst_of_charges_by_region.append(current_charge)
     return charges_by_region_dict
print(charges_by_region(insured_dict_func()))

Do you have a link to this project? There are several medical insurance projects on the DS path and I’m not sure what point this is at.
Rather than writing a function you could just use Pandas to break out the regions and then analyze the charges by region that way. (and matplotlib or seaborn to plot).
Ex:

southwest = insurance.iloc[(insurance['region']=='southwest').values]
southwest.head()
1 Like

Hi Lisa,
Thank you for replying. This is the project link: https://www.codecademy.com/paths/data-science/tracks/dscp-python-portfolio-project/modules/dscp-us-medical-insurance-costs/kanban_projects/us-medical-insurance-costs-portfolio-project
I haven’t gotten to the Panda module yet, I just know some basic dataframe. Is there a way to do it with python?

I modified my function a bit and got all the regions added to the dictionary, but only one charge each. It does not append the remaining charges to the value list and I am not sure why.

charges_by_region_dict = {}
def charges_by_region(insured_dict_func):
    for ind in insured_dict_func.keys():
        #create an empty lst to store the charges by region
        lst_of_charges_by_region = []
        current_charge = insured_dict_func[ind]['Charge']
        current_region = insured_dict_func[ind]['Region']
        if current_region not in charges_by_region_dict.keys():
            lst_of_charges_by_region.append(current_charge)
            charges_by_region_dict[current_region] = lst_of_charges_by_region
        elif current_region in charges_by_region_dict.keys():
            lst_of_charges_by_region.append(current_charge)
    return charges_by_region_dict
print(charges_by_region(insured_dict_func()))

Output:
{‘southwest’: [16884.924], ‘southeast’: [1725.5523], ‘northwest’: [21984.47061], ‘northeast’: [6406.4107]}

Hmm. :thinking:
Your parameter for the function is a function?
Just by eyeballing it…have you looked at/tweaked the indentation in that function to see if charges are appended to the list? Isn’t the output just one row of the file?

Maybe look at your code in the Hurricane project as a guide. (Did you do that one?)
Specifically the part where you create a (function) dictionary of all the hurricanes’ names as keys and the hurricane info as values.