Insurance Costs Project: How to iterate through a sequence of dictionaries?

One of the initial parts of the U.S. Medical Insurance Costs Project is to import the dataset in order to make the analysis. To do so, I imported the CSV file and my initial goal was to create n lists for each header in the dataset to get all values of each variable. However, when I try to do this iterating for each header and row in the CSV file, I do not get the expected result: only the content of the first header (“column”) is iterated- the other ones are not.

For example: when I run the code below, only the values of the first header (“age”) are printed. The values of the other 6 headers are not.

with open("insurance.csv", newline = "") as insurance_data:
    
    dataset = csv.DictReader(insurance_data, delimiter = ",")
    headers = dataset.fieldnames 
    
    for header in headers:
        print(header)
        for row in dataset:
            print(row[header])   

The beginning of the output is, as expected:

age
19
18
28
[...]

But the end is:

[...]
21
61
sex
bmi
children
smoker
region
charges

An easy solution to that would be adding lists manually for each header in the dataset. Although this is okay with a dataset with only 7 columns, this can be quite tiring for a dataset with more variables - and that’s why I’m avoiding this strategy.

Does anyone know who I can iterate through this sequence of dictionaries in order to create a list for each variable?

Which Medical Ins. Project one is this? Is it on the DS path? Do you have a link?

You could use Pandas.

Oh, I forgot to post the link. Here it is: U.S. Medical Insurance Cost. And yes, it is on the DS career path.

I want to avoid a solution with Pandas because I’m interested in understanding how to iterate through a sequence of dictionaries, as is the case with DictReader. I do not understand why the values of the first key (“age”) are iterated, but the values of other keys are not.

.

You should check out the documentation for the csv module and DictReader: csv — CSV File Reading and Writing — Python 3.9.1 documentation

It behaves a lot like a standard reader object only each row is assigned to a dictionary (instead of a list) as you read through them. So one row would be{age: 30, sex: m, bmi: 20 …} and so on. If you loop through every row in that object then you wind up at the end of the file so your next iteration doesn’t go through any further rows. This is why your last statements are just the print(header) lines being run.

2 Likes

This might also be useful:
https://thispointer.com/python-read-a-csv-file-line-by-line-with-or-without-header/

1 Like

Thank you for the elucidating answers, @lisalisaj and @tgrtim. After reading the two documents that you posted, it still took me some time to figure out a solution, but I eventually thought of one:

with open("insurance.csv", newline = "") as insurance_data:
    dataset = csv.DictReader(insurance_data, delimiter = ",")
    headers = dataset.fieldnames 
    listed_headers = list(headers)
    dict_parameters = {}
    
    for header in headers:
        dict_parameters.update({str(header): []})
    
    for row in dataset:
        for header, value in zip(headers, dict_parameters.values()):
            value.append(row[header])

I do not think this is the simplest solution (without using pandas), but it works.