Medical Insurance portfolio project - any feedback welcome / particularly on using dictionaries

Hi, Everyone -

My name is Eric and I’m a new student with Codecademy’s Data Science career path. I want to share my Medical Insurance portfolio project for feedback. Mostly I was able to do what I wanted with the functions, and I feel like I’ve learned a lot already.

One sticking point: I was hoping to figure out a way to use a for loop and iterate through the dictionary, pulling the averages of each set of values and using the key name in a printed output sentence. I was only able to get one of my age categories to print properly. If anyone has any comments on how to improve my code, I will be thankful!

Here is a link to my project on Github: GitHub - eric-mosher/us_insurance_codecademy: Project for Codecademy Data Science course

Thanks,
Eric

Can we be sure the data does not include field names?

ins_dict = csv.DictReader(insurance_dict)

It would make sense to use them as keys, if they exist, as opposed to making our own arbitrary keys.

Hi – the field names in the csv are age, sex, bmi, etc. But I wanted to partition the data in terms of four age groups – 50-65, 40-50, 25-40, under 25 – and then analyze the avg cost for each age group. That’s why I made the new keys.

There is probably a more elegant way to simply use the data in the original dictionary without creating a new dictionary…

Eric

Oh, that makes sense.

In [25]

return print("The average insurance cost for individuals aged " + key + " is " + str(avg))

The return is interfering with the loop.

Thanks - I took out return and indented the print statement so it’s inside the for loop – and now it works! I will keep playing around with functions and classes with this data set for some more practice.

Eric

1 Like

You’re welcome. Please post a link to this exercise so a person might try a hand at it. Thanks.

Here’s a link to my updated project: GitHub - eric-mosher/us_insurance_codecademy: Project for Codecademy Data Science course and to the project in Codecademy: https://www.codecademy.com/paths/data-science/tracks/dscp-python-portfolio-project/modules/dscp-us-medical-insurance-costs/kanban_projects/us-medical-insurance-costs-portfolio-project

1 Like

Segue

Coming at this from all angles, I’ve arrived at this question: Why do we need a full host data structure to work on a filtered segment? Why not just read in the data and filter what we hang on to?

import csv
with open("insurance.csv", 'r') as file:
  tab = csv.DictReader(file)  
  try:
    while True:
      x = tab.__next__()
      if int(x['age']) < 26:
        print (x)        # x is what we hang on to for this session
  except StopIteration:
    print ('Done')

Given we have a file, there is no need to store any more data than needs be. Rather than having a bunch of functions that act upon a data structure, have a bunch a functions that act upon a file. Filter, then work. The file is always there, and it doesn’t take up any memory.

Iterator objects tend not to take up too much memory, either. This is not an intensive operation, and optimally rather efficient.

The key is to query the data, rather than replicate it. Let it sit as a store, and glean from it in small, discrete, and meaningful packages.

under_26