Why my dictionary collects only 47 out of 1338 entries

Hello!

After filling my lists with all the data from the insurance.csv file, I made a dictionary to order the data for each patient and it shows me up that the dict only have 47 patients out of 1338. (I checked every single list lenght and they all have 1338 objects).
What am I doing wrong? Or it’s just that it shows 47 but it has 1338 patients inside.
Thanks in advance!
This is the code (I also posted a photo where you can see both prints with 47 and 1338 lenght.

ages = load_list_data(ages, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "age")
sexes = load_list_data(sexes, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "sex")
bmi = load_list_data(bmi, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "bmi")
num_of_children = load_list_data(num_of_children, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "children")
smoker_status = load_list_data(smoker_status, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "smoker")
regions = load_list_data(regions, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "region")
insurance_cost = load_list_data(insurance_cost, "C:/Users/rodri.DESKTOP-RAT8004/Desktop/insurance.csv", "charges")

def create_dictionary(ages, sexes, bmi, num_of_children, smoker_status, regions, insurance_cost):
    patients = dict()
    num_patients = len(ages)
    for i in range(num_patients):
        patients[ages[i]] ={"Age" : ages[i],
                            "Sex" : sexes[i],
                             "BMI" : bmi[i],
                             "Number of children" : num_of_children[i],
                             "Smoker" : smoker_status[i],
                             "Region" : regions[i],
                             "Insurance charge" : insurance_cost[i]                            
            
        }
    return patients

patients = create_dictionary(ages, sexes, bmi, num_of_children, smoker_status, regions, insurance_cost)

len_patients = len(patients)
len_ages = len(ages)
print(len_patients)
print(len_ages)
# here it shows 47 for len_patients and 1338 for len_ages

What do you want as the keys of your patients dictionary?

You wrote:

patients[ages[i]] = {"Age" : ages[i],
                    "Sex" : sexes[i],
                     ...

Did you mean to write?

patients[i] = {"Age" : ages[i],
               "Sex" : sexes[i],
               ...

In the code you posted, the keys of your dictionary are going to be the ages of the individuals. Since, different people can have same age, so in your loop, you will end up overwriting records.

For example, suppose there are three people who are 25 years old. Your loop will create an entry with 25 as the key and the first person’s data as the value associated with the key. Then, the second 25 year old’s record will overwrite the existing entry. Finally, the third person’s record will overwrite the existing entry. When the loop finishes, only the third person’s data will persist. The data of the first two persons will have been over-written.

Since the length of original lists is 1338, but the length of the patients dictionary is only 47, it suggests that there are 47 unique ages in the list.

2 Likes

Oh!
Didn’t know it overwrite the data, that’s super helpfull!
I laso didn’t know that I can create a dict without keys, that’s why I set the age as the key!
Thanks!

You do need keys for your dictionary.

patients[i] = {"Age" : ages[i],
               "Sex" : sexes[i],
               ...

In this snippet, the keys are going to be the numbers 0 through 1337. If you want to use something else as the key, then make sure that the key is unique for each record. If the key is not unique, then you will end up overwriting records.

patients = {  0: {"Age": 19, "Sex": "female", ...}.
              1: {"Age": 18, "Sex": "male", ...},
              ... 
              1337: {"Age": 61, "Sex": "female", ...} 
            }
2 Likes

Okay, solved!
Now I understand it better, thanks!

2 Likes